- Sun 28 March 2021
- programming
- Gaige B. Paulsen
- #server admin, #programming, #pelican
Putting pre-commit to use
I mentioned in a previous post about pre-commit, a tool for maintaining code consistency through simple management of pre-commit checks.
The first place I decided to give this a whirl was on my blog sites. As you may be aware, I moved my blog sites (both Gaige's Pages and The Cartographica Blog) to static sites some time back.
Pelican markdown files have a preamble that is set apart by a blank line. Basically, a set of colon-delimited key-value pairs that are rudimentarily parsed and then passed to the interpreter. Basically it looks like this:
Title: My Blog Post
Date: 2021-03-28 07:48
# Some bloggy stuff
Content text is here... Oh, see my [previous post]({filename}previous-post.md)
In addition to the formatter, there are also some replacement items that can be used to
reference generated data. For example: {filename}
indicates that the path to the
stored file should be substituted.
I had noticed there was a Markdown plugin for pre-commit using
mdformat, and so I figured I'd give that a
try. Initial results were good. It provided a lot of clean-up for free. On the downside:
it also quoted all of the {filename}
and similar references, such that they would no
longer work as references. And, it also eliminated my footnotes.
My initial .pre-commit-config.yaml
looked like this:
# See https://pre-commit.com for more information
# See https://pre-commit.com/hooks.html for more hooks
repos:
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v3.2.0
hooks:
- id: trailing-whitespace
exclude: ^.*\.md$
- id: end-of-file-fixer
- id: check-yaml
- id: check-added-large-files
- id: check-json
- repo: https://github.com/executablebooks/mdformat
rev: 0.5.7
hooks:
- id: mdformat
# optional
args:
- '--number'
additional_dependencies:
- mdformat-tables
Note here a couple of items:
- I have excluded
^.*\.md$
fromtrailing-whitespace
, this was specifically to deal with the fact that I had some two-space-at-end-of-previous-line implementations for handling forced line-breaks. This is one of a few ways of doing this, but was required for use with thepython-markdown
module that's used by default with pelican - I have added the
mdformat
plugin with a number of options and dependencies --number
as an argument tomdformat
forces it to number ordered list items. I prefer that for readibility.mdformat-tables
adds table handling to mdformat (by default it uses a strict version of Markdown called Commonmark), so any extensions must be enabled with intention
mdformat plugins
With things mostly working, I looked at the mdformat
documentation to see if I could
make changes to the way it operated. Fortunately, there was a plug-in architecture that
allowed for the modification of both parsing and output behavior.
Footnotes
Although there's support for footnotes in the underlying markdown parser that's used
by mdformat
, that parser (markdown-it-py,
based on the Javascript-based markdown-it),
that support wasn't built-in to the mdformat code. So, I decided that I'd take a look
at mdformat-tables
and see if I could do something similar for footnotes, since
the code for both tables and footnotes are included in the underlying package as options.
The result is the mdformat_footnote plugin, which uses the existing parser (the hard part) and formats the footnotes appropriately.
This plugin can be installed using pip install mdformat_footnote
or by adding mdformat_footnote
to the list of items in the additional_dependencies
list in the .pre-commit-config.yaml
file.
Pelican-specific items
In this initial case,
all I needed to do was change the output so that it didn't replace the {}
characters
inside of links. The code was straightforward, and after some playing around, I created
the mdformat_pelican plugin for use with
mdformat and pelican.
You can look at the code above, or install it with pip mdformat_pelican
to get
the latest version from pypi.org.
Implementing the initial code was
straightforward. Effectively, the code hijacks the render_token
function and modifies
the token.attrs
just before they're rendered, correcting any erroneously-quoted
URLs.
This worked great, across nearly all of my files. Except for a couple that had square brackets in their metadata fields. For example, a post about Queen guitarist Brian May receiving his doctorate in Astrophysics had this front matter:
Date: 2007-08-03 07:26
Alias: /node/4836,/article.php?story=20070803092627654
Tags:
Category: general news
Title: [He's] a killer... astrophysicist?
which mdformat
dutifully turned into \[He's\] a killer... astrophysicist?
,
which pelican
didn't know how to interpret, so the backslashes ended up in my content
pages...not desired.
Since I already had a Pelican plugin for mdformat
, I decided to make it a bit more
pelican-y, by marking the front matter as off-limits. This was a little trickier, but
had good results. As you can see in the plugin source,
understanding the frontmatter required adding a parser by putting in a new block rule and
then putting in the parser as well as the code to render that later in render_token
.
Since the format is very rigid (basically, collect everything until you reach the first blank line), it was easy to implement.
So, now my current working .pre-commit-config.yaml
looked like this:
# See https://pre-commit.com for more information
# See https://pre-commit.com/hooks.html for more hooks
repos:
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v3.2.0
hooks:
- id: trailing-whitespace
exclude: ^.*\.md$
- id: end-of-file-fixer
- id: check-yaml
- id: check-added-large-files
- id: check-json
- repo: https://github.com/executablebooks/mdformat
rev: 0.5.7
hooks:
- id: mdformat
# optional
args:
- '--number'
additional_dependencies:
- mdformat-tables
- mdformat-black
- mdformat_footnote
- mdformat_pelican
exclude: |
(?x)(
^output/|
^themes/|
^venv/|
^content/NewZealand/
)
This adds my new plugins (both the mdformat_footnote
and the mdformat_pelican
) and
also adds an exclusion for some files in my pre-commit hooks. The ones that aren't actually
committed (output
, venv
) wouldn't be included, but I have a set of badly-formatted
HTML files in content/NewZealand
that I don't want to fix yet.
This turned out well, but I had a couple of items that the parser in Pelican and the parser in mdformat could not agree on. In particular, things like indentation requirements for items with newlines within ordered lists that have multiple paragraphs in them.
In the end, that would lead me to write a new plugin for Pelican to replace the Markdown parser.
Markdown parser plugin for Pelican
The plugin architecture for mdformat
is pretty good, but the one for Pelican is very
mature and well thought-out. I've created plugins for Pelican before, notably the
Nginx alias maps plugin.
Also, there already existed plugins to replace the Markdown reader in Pelican. As such the lift was pretty light:
- Get a base plugin working
- Parse the metadata (simple
:
split of each line before the first blank line) - Load the
MarkdownIt
package and configure with a few settings (tables, footnotes, and definition lists) - Add hooks to rewrite the
\{filename\}
items back to{filename}
- Finally, add a new
fence
formatter, to use Pygments to format code
The code is available on GitHub in the markdown-it-reader
repository and can be installed using pip install pelican-markdown-it-reader
.
This plugin must be enabled on your site by adding it to the list of PLUGINS
in
your pelican.py
file.