-
-
Notifications
You must be signed in to change notification settings - Fork 3.4k
Pandoc Filters
Amy de Buitléir edited this page Nov 16, 2023
·
140 revisions
Pandoc provides an interface for users to write programs (known as filters) which act on the intermediate AST. For more info see the filter tutorial and the Lua filter tutorial.
This page collects together third party filters which can be used to add functionality to pandoc.
Filters can be written in any programming language. Pandoc wrappers and interfaces are available in the following programming languages to facilitate modification of the AST:
language | link | description |
---|---|---|
Python | pandocfilters | a library for writing pandoc filters in python. |
Python | panflute | a pythonic alternative to pandocfilters , with batteries included. It reconstructs pandoc AST in an internal panflute AST which makes it more seamless in interacting with the AST. (@jgm recommended this in pandoc discuss) |
Python | pantable | specialized in writing filter for tables based on panflute, which provides a lossless conversion between an internal structure and panflute AST. |
PHP | pandocfilters-php | a port of the python pandocfilters module to PHP to make writing filters in PHP easier. |
Node.js | pandoc-filter-node | a Node.js module for writing pandoc filters in JavaScript. |
Perl | Pandoc::Elements | a CPAN module for writing pandoc filters in Perl. |
Groovy | groovy-pandoc | a library for writing Pandoc filters in Groovy. |
Ruby | paru | a Ruby gem to write pandoc filters in Ruby. |
Lua | pandoc's official documentation | Pandoc includes a lua interpreter by default so is quite lightweight |
Elixir | Panpipe | a library for writing pandoc filters in Elixir |
.NET | PandocFilters | a NuGet package for writing Pandoc filters in .NET languages |
OCaml | ocaml-pandoc | An OCaml library for writing pandoc filters. |
Other tools:
- vimhl, a vim plugin that makes vim syntax highlighting engine available in pandoc.
- pandoc-jats, a Lua custom writer for Pandoc generating JATS XML.
- 2bbcode, a Lua custom writer for BBCode.
- pandocmeta.lua, a simple Lua package that converts Pandoc metadata types into a, possibly multi-dimensional, table.
See github.com/pandoc/lua-filters and https://github.com/pandoc-ext for some select filters written in Lua. Some other known 3rd party filters:
- Because DOCX and ODT files cannot use templates, we are limited in how we can transform metadata into document content. Several paru filters can help to solve this, given a metadata format involving authors with affiliation/correspondence fields and institute information: README; and individual filters: simplifyMetadata, prependInstitute, prependKeywords, prependAbstract, prependComments --- filters combined: prependAll.
- pandoc-odt-filters: filters that improve ODT output --- creates sequences in image and table captions (for automatic list-of-figures and list-of-tables), corrects links to images and tables, corrects bibliography style, custom styles to headers and spans, better list styles and real smallcaps. Some of the filters are configurable.
- commentary: a Pandoc filter and command line tool that preserves native-style comments + metadata between Markdown/docx conversions.
- pandoc-svg, a pandoc filter to convert svg files to pdf by Jerome Robert.
- diagrams-pandoc for inserting images expressed in the Haskell diagrams DSL.
- mermaid-pandoc for inserting images expressed in mermaid syntax
- r-pandoc for inserting plots expressed in the R language
- paru-screenshot.rb for automatically taking a screen shot of a web page and including that shot as an image in a markdown file.
- pandoc-plot to generate and embed figures based on code blocks in documents, using a variety of toolkits (e.g. Matplotlib, MATLAB, gnuplot, ggplot2, etc.). Easy integration with Haskell libraries (e.g. Hakyll)
- pandoc-figure to transform specific div to complex pandoc>=3.0 figures
-
Numerical reference to
sections, using a
specified sign (by default
#
) in internal links. Metadata can configure special sign and whether links should be preserved or converted to plain text. - pandoc-fignos, for numbering figures and figure references.
- pandoc-eqnos, for numbering equations and equation references.
- pandoc-tablenos, for numbering tables and table references.
- pandoc-crossref, for numbering and cross-referencing figures, equations and tables
- pandoc-numbering, for numbering and cross-referencing any kinds of things such as examples, theorems, exercises and so on
- pandoc-ling, for formatting, numbering and cross-referencing linguistic examples
- pandoc-listof, for creating lists of any kinds (deprecated)
- pandoc-amsthm: a pandoc amsthm package to define the use of amsthm through YAML front matter, target at HTML and LaTeX outputs. For HTML, CSS counter is used and defined in a template (by the YAML variables). For LaTeX amsthm package is used and defined in a template (by the YAML variables). - definitionlist-filter.lua, for converting some definition lists to theorem-like (amsthm) Environments and some references to cref tags in LaTeX
- mathjax-pandoc-filter rendering math to SVG using mathjax-node
- asciimathml-pandocfilter: to add read support for AsciiMathML syntax through conversion into LaTeX
-
pandoc-unicode-math
replaces Unicode math symbols and greek letters like ∀, ∈, →, λ, or
Ω in math environments by equivalent Latex commands like
\forall
,\in
,\rightarrow
,\lambda
, or\Omega
. - SugarTeX is a more readable LaTeX language extension and transcompiler to LaTeX. Fast Unicode autocomplete in Atom editor via SugarTeX Completions for Atom.
- pandoc-logic-proof provides a way to write logic proofs in pandoc markdown and produce attractive output.
- Using markdown inside raw latex commands
-
pandoc-latex-environment,
for adding LaTeX environment on specific HTML
div
tags -
latexdivs.py:
define a syntax to turn any native pandoc Divs into a LaTeX
environment: if
latex="true"
is in the attribute of the Div, the first class is used to define the LaTeX environment. -
pandoc-latex-tip, for
decorating specific
span
,code
,div
andcodeblock
elements by icons taken from popular icon collections. -
pandoc-latex-admonition,
for decorating specific
div
andcodeblock
elments by admonitions - pandoc-latex-barcode: insert a barcode or a QR code into a latex/PDF document.
-
pandoc-latex-fontsize,
for specifying LaTeX font size on
span
,code
,div
andcodeblock
elements -
pandoc-latex-color,
for specifying X11 color on
span
anddiv
elements - pandoc-latex-unlisted, for unlisting some specific headers in the table of contents (deprecated since it is now in the core of pandoc)
- pandoc-latex-newpage, for converting horizontal rules into new page
- pandoc-latex-french-spaces, for dealing with french spaces around some punctuation marks
-
pandoc-latex-margin,
for setting left and right margins on
div
andcodeblock
elements -
pandoc-beamer-block,
for using
block
,alertblock
andexampleblock
environment defined in beamer. - pandoc-beamer-multigraphics, for using defining animated graphics in beamer.
- pandoc-beamer-arrow, for adding arrows between elements in beamer.
- Include Files: finds all the inline code blocks with attribute include, and replaces their contents with the contents of the file given
- code-includes.lua Include code from source files. Keep your examples and documentation compiled and in-sync. Similar to the above except you don't have to install Haskell and you can select by line number.
- transclude.lua Include content from another file just like in AsciiDoc and ReST.
- include-files.lua Filter to include other files in the document.
-
include.py
: Panflute filter to allow file includes. See doc. -
pandoc-include-plus
is another pandoc filter which supports "include" files. Key features:
- Included files can include other files, recursively.
- Paths to images are adjusted as needed to ensure that everything "just works".
- Option to automatically promote or demote headings in included files.
- pandoc-dot2tex-filter - a filter that converts dot notation to PGF/TikZ graphics for latex/pdf rendering.
- HTML comment to LaTeX comment: a filter that converts HTML comment to LaTeX comments
- pandoc-csv2table for including referenced csv files in markdown as markdown rendered tables.
-
pandoc-placetable
lightweight implementation of the idea behind the above
pandoc-csv2table
(e.g. doesn't necessarily require pandoc as a cabal dependency) - ickc/pantable: CSV Tables in Markdown: Pandoc Filter for CSV Tables: a Python alternatives to the above 2 filters, using panflute, with some enhancements (e.g. auto-width, fractional width, etc.)
- Creating a link table at the end of your document.
- pandocsql run SQL queries on tables, generating other tables
- pandoc-linear-table Creating tables with cells that contain a lot of content can be difficult to do in standard Markdown. This Pandoc filter extends Markdown syntax to make the job easier.
- pandoc-abbreviations allows the use of arbitrary abbreviations, defined in an abbreviations file or in the source document's YAML header, which are replaced on processing. Useful for maintaining consistency of terminology etc.
- pandoc-acronyms is a filter for managing acronyms. It replaces acronyms like "FAQ" at first use with the full text "frequently asked questions (FAQ)". It is installed using pip.
- count-para.lua add numbering to paragraphs to allow for detailed citation (in scientific context). Proposal to replace page-number referencing, which does not work with adaptive design.
- pandoc-lang automatically detects the (natural) language of text, as well as the programming language of code blocks
-
pandoc-mustache
replaces variables like
{{varname}}
in a pandoc document with their values, which are stored in a separate YAML file. - pandoc-quotes.lua and the older pandoc-quotes replace non-typographic, quotation marks with typographic ones for languages other than US English.
- columns provides multiple columns support in HTML and LaTeX/PDF output.
- first-line-indent provides smart first-line indents in HTML and LaTeX/PDF output.
- R-pandoc for generating R plots
- filter_pandoc_run_py for executing python codes written in code blocks and also embedding print output and pyplot figures
- pandoc-plot to generate and embed figures based on code blocks in documents, using a variety of toolkits (e.g. Matplotlib, MATLAB, gnuplot, ggplot2, etc.). Easy integration with Haskell libraries (e.g. Hakyll)
- Knitty: is a Pandoc filter for reproducible reports via Jupyter and Pandoc (Stitch's fork that is a Knitr-RMarkdown-like lib). Insert Python code (or other Jupyter kernel code) to the Markdown document or write in plain Python/Julia/R/any-kernel-lang with block-commented Markdown and have code's results in the Pandoc output document.
- pandocsql which uses an in-memory SQLite database. It creates tables from tables in the document and executes queries in code blocks, showing the results as tables.
- pannb, a pandoc filter to control the output from ipynb input, this includes metadata block, filter out Python code, and converting all raw-blocks to native pandoc AST. The 3 can be mixed and matched.
-
pandoc-query, a pandoc filter that
- defines a simple language for querying a collection of Pandoc documents and formatting the output, and
- provides a way to embed queries in a Pandoc document, so that when the document is converted to a new format, the query is replaced with the results. Not a standalone filter, but a filter you can use as part of an application.
- run-code-inline reads text (e.g., Markdown or Asciidoc) from stdin, echoes it to stdout, simultaneously running any commands and inserting the output immediately after the command. Useful for writing tutorials and software documentation. Not Pandoc-specific, but useful as part of a Pandoc toolchain.
-
pandoc-manubot-cite
allows citing persistent identifiers directly like
@doi:10/c7np
or@pubmed:29618526
. Removes the need for a reference manager by supporting DOIs, PubMed IDs, URLs, ISBNs, Wikidata IDs, and the hundreds of other ID types registered with https://identifiers.org. Written in Python. Available on PyPI. - pandoc-url2cite allows citing certain persistent identifiers directly (URLs, ISBNs, and DOIs). Basically a less opinionated and simpler version of pandoc-manubot-cite. Written in TypeScript. Available on npm.
- pandoc-zotxt.lua looks up sources for citations in Zotero.
- recursive-citeproc handles self-citing bibliographies.
-
Adding
support
for indexing with the syntax
(# term, subterm)
in html and latex - Adding non-breaking spaces inside a URL to preserve formatting
- toc-css Lua filter changing the appearance of the Pandoc basic HTML table of contents by some CSS and vanilla Javascript.
- lablinkfix updates links to the Swedish Labour Movement Archives and Library catalogues.
-
second-date
changes
date
metadata to a different strftime format using python's dateutil. - pandoc_abnt allow to specify the source of images and tables, and automatically corrects Alineas according to Brazilian's standard for Academic writings (ABNT NBR 14724:2011).
- nheengatu provides several resources for publishing multimedia content through formats such as LaTeX, HTML and EPUB.
-
pandoc-select-links is a Pandoc filter
that takes an input document and returns a new document that contains only the links from the input document.
The implementation is just a few lines of code, and provides a simple example of how to use the
query
function. - pandoc-select-code is a Pandoc filter that extracts just the code blocks from an input document. You might use this, for example, to extract sample code from a tutorial.