Blogging with LaTeX
1 Introduction
I have always said to myself that I should start blogging to keep track of things (projects, side-projects, side-projects of side-projects), but I have a terrible memory and end up forgetting that.
This time around I was thinking: - Hey, I should start blogging about something.
But you know what, I really suck at managing this kind of stuff, Jekyll helps quite a lot but I still need to write stuff with markdown.
I am not a huge fan of markdown either, patterns are weird, formatting tables is a mess, no equations, no bibtex (I also miss that when working with Word), and the list goes on.
Every engineer knows the best way to solve problems is to solve the problem for themselves by overengineering a solution (this is meant as a joke... even though it sounds too accurate to be just that...).
So, how can we make things better for ourselves and get some decent syntax for writing somewhat large texts?
Here is a tip: it is in the title of this post.
Yes, we can translate LaTeX into HTML, do some stripping, add a new Jekyll header and make things work nicely.
But how do we do that? Pandoc (Section 2), Jekyll (Section 3) and some script glue (Section 4) written in the best programming language ever: Python.
2 Pandoc
Of course each one of us could write a LaTeX parser and do the conversion for ourselves (if we had unlimited time, big brains and were determined to do it).
I do not know how about you, but I am of the kind that start writing programs right away just to realize how grand things will end up being.
As usual when doing these things, you realize "ain’t nobody got time for that" (NobodyGotTimeForThis, n.d.).
Started looking for alternatives. Found latex2html and make4ht, which are pretty cool. Tried setting them up, but did not manage to make it work as I wanted. Started looking for more alternatives.
Found a post in Dr. Joyce’s blog.
Pandoc: An universal markdown translator, which can magically translate a subset of LaTeX+BibTex into HTML+MathJax (yay /o/, equations). It is not perfect, but it also does not need to be.
While I did not quite manage to make equation numbering nor manage citations style/cross-references working correctly, pandoc-xnos could make it work. I think it has to do with the glue in Section 4. Maybe I will try again in the future, but for now it is doing what I needed.
The magic command ended up being
pandoc --number-sections --mathjax -f latex -t html -s --bibliography=file.bib -o file.html file.tex
--number-sections
for numbered sections--mathjax
to process equations and render using javascript-f latex
to process from LaTeX-t html
to output in html-s
standalone html--bibliography=file.bib
to get references from the bib file-o file.html
to indicate the path to the output filefile.tex
to indicate the input file
3 Jekyll
You probably have heard of Jekyll at this point. It is a nice static site generator used by GitHub Pages, which made a ton of people used to it by default. Jekyll does not support LaTeX documents, but it does accept HTML files as inputs.
By placing the post header in the HTMLs, it will treat the HTML contents as the post content.
---
layout: post # could be a different layout
author: "authorname"
title: "postname"
---
4 Script glue
The final piece is the code glue written in Python. It just scans for .tex files, uses pandoc to convert them, strips out unnecessary HTML header/footer and include the post header expected by Jekyll containing the post author names, title and layout.
If there is a .bib file along with the .tex, it is used as the bibliography source file.
def latex_to_html_via_pandoc(source_file, source_dir="latex_posts", target_dir="_posts"):
output_file = source_file.replace(".tex", ".html").replace(source_dir, target_dir)
# Latex to html
command = """pandoc --number-sections --mathjax -f latex -t html -s"""
command = command.split(" ")
# Load references from bib file if it exists
bibliography = source_file.replace(".tex", ".bib")
if os.path.exists(bibliography):
command.append("--bibliography=%s" % bibliography) # use external bib file or not
command.append("-o")
command.append(output_file) # output path
command.append(source_file) # source path
# Run pandoc
try:
subprocess.check_output(command)
except Exception as e:
raise Exception("Error during pandoc conversion of %s: %s" % (source_file, e))
# Open the html file and get only contents to let jekyll handle style, links, etc
with open(output_file, "r", encoding="utf-8") as f:
contents = f.read()
# extract authors and title
authors = re_match_html_author.findall(contents)
author = authors[0]
authors.pop(0)
while len(authors) > 0:
author += ", " + authors[0]
authors.pop(0)
del authors
title = re_match_html_title.findall(contents)[0]
# Remove unnecessary header and trailer
contents = contents.split("</header>")[1]
contents = contents.split("</body>")[0]
# Rewrite file with markdown header
with open(output_file, "w", encoding="utf-8") as f:
# Write markdown header
f.write("""---\nlayout: post\nauthor: "%s"\ntitle: "%s"\n---""" % (author, title))
# Write stripped down html
f.write(contents)
After that, you can manually run jekyll build command or jekyll serve (which starts the webserver).
Or run Latex2Markdown.py --serve
to run the LaTeX to HTML, Jekyll build and start service at once.
Sources for this blog are available here.
5 Conclusions
It works surprisingly well and I am pretty happy with the results.
References
NobodyGotTimeForThis. n.d. “Ain’t Nobody Got Time for That (Original + Autotune).” https://youtu.be/waEC-8GFTP4.