# Pandoc Notes related to Pandoc. ## Contents ## Generating Pandoc ids without Pandoc {#pandoc-ids} It can be useful to generate a [Pandoc identifier](https://pandoc.org/MANUAL.html#extension-auto_identifiers) outside Pandoc. For example, I do this in the static site generator for this site. I use it to link each [tag](#bottom) to the list of pages that have the tag on the [tag page](/tags). While the algorithm is documented in prose, I have not found official pseudocode for it. I looked up the [original Haskell code](https://github.com/jgm/pandoc/blob/e0e60871458329679c81cf8c589199d4b52922f8/src/Text/Pandoc/Shared.hs#L447) to make sure everything was correct. ### Python The algorithm in Python. This version supports Unicode. ```python import re def pandoc_id(s: str) -> str: s = s.lower() # The character class `\w` includes underscores. s = re.sub(r"[^\s\w.-]", "", s) s = re.sub(r"\s+", "-", s) s = re.sub(r"^[\d\W_]+", "", s) return s or "section" ``` Note that this is the default identifier algorithm in Pandoc. GitHub Flavored Markdown, which Pandoc also support, uses a different algorithm. If a generated identifier is the same as one that already exists in the document, append `-` and *n* to the new identifier, where *n* is an integer starting with 1. ### POSIX shell and BRE {#posix-shell} This is a POSIX shell and [POSIX Basic Regular Expressions](https://en.wikibooks.org/wiki/Regular_Expressions/POSIX_Basic_Regular_Expressions) implementation. It works correctly only on ASCII text. ```shell tr '[:upper:]' '[:lower:]' \ | sed ' s/[^[:space:]a-z0-9._-]//g; s/[[:space:]][[:space:]]*/-/g; s/^[^a-z]*//; s/^$/section/ ' ``` The regular expressions do not use `+`, because `+` isn't part of BRE. ## Page metadata URL: Published 2023-08-26, updated 2025-01-14. Tags: - algorithm - POSIX shell - programming - Python - shell