The Challenge of Translating LaTeX to HTML

March 28, 2014

The idea that LaTeX documents, as they have been found in circulation during the period 1985–2014, could be translated to a formally structured SGML document type is, in a certain sense, folly just as it has been folly to imagine that more than 5% of the HTML documents in circulation during the period 1995–2014 are formally correct. Beyond that, the problem in translating LaTeX is compounded by the fact that the principal LaTeX engine implements LaTeX, the language, as a macro package under TeX, which is a Turing-complete programming language. So far in the development of LaTeX, there has been no reliably enforced boundary between LaTeX and TeX. In view of all of this it is remarkable, even astounding, that there has been any substantial degree of success with any translation project.

That said, because there is such a large legacy of documents written in LaTeX source, there have been a number of valiant projects mounted since the late 1990s for the purpose of attempting to translate LaTeX to HTML. Because math is an important part of LaTeX, one wants an automated translation of LaTeX to HTML to include provision for math, i.e., to generate MathML for math. Even where MathJax will be used to facilitate web browser rendering of math and even though MathJax will accept LaTeX-like input (not actual LaTeX but close), the providers of MathJax have clearly stated that automated translations should use MathML.

There are two of these translators that I have found useful when finding myself faced with the task of translating legacy LaTeX to HTML. In most cases such documents come from arXiv, and I have gathered some examples from arXiv that were handled by both of these translators.

LaTeXML
Tex4ht

No translator that I know of has a success rate that is sufficient for fully automated translation of arbitrary LaTeX documents to HTML.

What is true is that a high degree of reliability may be had by using profiled LaTeX and configuring the translator to accommodate the profile.

For my documents since the fall of 1998 I have been using my own project “Generalized Extensible LaTeX-Like Markup (GELLMU)”, which, however, does not attempt to provide translation for LaTeX documents but rather provides a didactic formalized LaTeX profile. For more on formally profiled LaTeX see my talk at TUG 2010 on LaTeX Profiles.