Pandoc

Notes related to Pandoc.

It can be useful to generate a Pandoc identifier outside Pandoc. For example, I do this in the static site generator for this site. I use it to link each tag to the list of pages that have the tag on the tag page.

While the algorithm is documented in prose, I have not found official pseudocode for it. I looked up the original Haskell code to make sure everything was correct.

The algorithm in Python. This version supports Unicode.

import re


def pandoc_id(s: str) -> str:
    s = s.lower()
    # The character class `\w` includes underscores.
    s = re.sub(r"[^\s\w.-]", "", s)
    s = re.sub(r"\s+", "-", s)
    s = re.sub(r"^[\d\W_]+", "", s)
    return s or "section"

Note that this is the default identifier algorithm in Pandoc. GitHub Flavored Markdown, which Pandoc also support, uses a different algorithm.

If a generated identifier is the same as one that already exists in the document, append - and n to the new identifier, where n is an integer starting with 1.

This is a POSIX shell and POSIX Basic Regular Expressions implementation. It works correctly only on ASCII text.

tr '[:upper:]' '[:lower:]' \
| sed '
    s/[^[:space:]a-z0-9._-]//g;
    s/[[:space:]][[:space:]]*/-/g;
    s/^[^a-z]*//;
    s/^$/section/
'

The regular expressions do not use +, because + isn’t part of BRE.

The code on this page is distributed under the MIT No Attribution license. It does not require crediting me. The text of the license follows.

MIT No Attribution

Copyright 2023 D. Bohdan

Permission is hereby granted, free of charge, to any person obtaining a copy of this
software and associated documentation files (the "Software"), to deal in the Software
without restriction, including without limitation the rights to use, copy, modify,
merge, publish, distribute, sublicense, and/or sell copies of the Software, and to
permit persons to whom the Software is furnished to do so.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED,
INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A
PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.