HTML Math Examples from arXiv via Tex4ht

March 28, 2014


“arXiv” is the large e-print archive founded by Paul Ginsparg located at Cornell University (originally at Los Alamos National Laboratory). “tex4ht” is the software for conversion of well-structured LaTeX to HTML originated by the late Eitan Gurari of Ohio State.

For most of the e-prints at arXiv there is TeX source available, which is often LaTeX source. In the case where an author has provided carefully structured LaTeX source there is the possibility of conversion to HTML using a translation tool with support for math such as tex4ht.

The relatively recent arrival of HTML, version 5, and the web-served software provided by MathJax, make it possible for ordinary HTML web pages with math to be viewed in most current major web browsers.


These examples were selected based on my personal mathematical interests and based on the technical consideration that each LaTeX source file lent itself reasonably well to automatic translation via tex4ht. The mathematical merit of these examples has not been reviewed.

In each instance the “xhmlatex” configuration of Tex4ht was used to generate an XHTML document. A small amount of manual editing on that was performed (1) to convert to HTML5 (the text/html serialization) and (2) to invoke MathJax. (This last step should not be required with a rather recent version of tex4ht.)

Be aware that MathJax can take some time. For example, depending on your platform, the example by Funke & Millson, which represents about 35 pages printed on letter-sized paper and has nearly 2000 math zones, might take 2–5 minutes to load fully.

alg-geom/9304003: D. Bayer & D. Mumford,
     “What Can Be Computed in Algebraic Geometry”.
1104.2804: T. Ohira & H. Watanabe,
     “A Conjecture on the Collatz-Kakutani Path Length for the Mersenne Primes”.
1108.5305: J. Funke & J. Millson,
     “The Geometric Theta Correspondence for Hilbert Modular Surfaces”
1109.1881: Th. Bauer, B. Harbourne, A. L. Knutsen, A. Küronya, S. Müller-Stach, T. Szemberg,
     “Negative curves on algebraic surfaces”.
1207.5765: Joseph H. Silverman,
     “An oft cited letter from Tate to Serre on computing local heights on elliptic curves”


The idea that LaTeX documents, as they have been found in circulation during the period 1985–2014, could be translated to a formally structured SGML document type is, in a certain sense, folly just as it has been folly to imagine that more than 5% of the HTML documents in circulation during the period 1995–2014 are formally correct. Beyond that, the problem in translating LaTeX is compounded by the fact that the principal LaTeX engine implements LaTeX, the language, as a macro package under TeX, which is a Turing-complete programming language. So far in the development of LaTeX, there has been no reliably enforced boundary between LaTeX and TeX. In view of all of this it is remarkable, even astounding, that there has been any substantial degree of success with translation projects such as tex4ht.

Many examples found at arXiv are not easily made to run through tex4ht. I continue to believe that steps taken by the community to formalize suitable profiled usage of LaTeX will go a long way toward making better online versions of mathematical documents easily available. For more on this see my talk at TUG 2010 on LaTeX Profiles

Processing Details

These translations were made using the TeXLive 2010 version of tex4ht. With a more recent version of tex4ht, modifications for obtaining HTML5 with MathJax rather than XHTML may not be necessary. The modifications made here for the text/html serialization of HTML5 are these:

  1. Remove the XML declaration and any XML processing instructions at the front of the document.
  2. Replace the document type declaration with the simple declaration for HTML5: <!DOCTYPE html>
  3. At the end of the HTML head add the following MathJax invocation:
     <script type="text/javascript"
  4. (Unnecessary for new browsers?) Defined-empty MathML elements must be written with full closetags rather than with self-closing opentags. For example, replace <mspace width="1em"/> with <mspace width="1em"></mspace>. (This is not necessary with HTML elements.)
  5. (Temporary?) After the MathJax invocation add:
     <style type="text/css">
       .MathJax_MathML {text-indent: 0;}
  6. (Optional) Replace any <meta> for http-equiv with <meta charset="UTF-8">.
  7. Where auxiliary XML files are generated by xhmlatex, those files should, for consistency, be written as text/html serializations of HTML5 and references to their names should be adjusted accordingly.
  8. Note that “id” values should not begin with a numeral.

I have no doubt that a few other issues will arise. The basic point, however, is that correct application/xhtml+xml documents generated by xhmlatex need very little modification to fit into the text/html serialization of HTML5.

Provisional Scripting


% From Michal Hoftich <> on 2 Aug 2012
% by email with cc to
% Hoftich suggests the name mathjax.cfg
% His suggested command line:
%   htlatex 1109.1881v2.tex "mathjax, charset=utf-8,NoFonts" " -cunihtf -utf8"
% Mods by William F. Hammond
\Configure{DOCTYPE}{\HCode{<!DOCTYPE html>\Hnewline}}
\Configure{@HEAD}{\HCode{<meta charset="UTF-8" />\Hnewline}}
\Configure{@HEAD}{\HCode{<meta name="generator" content="TeX4ht
(\string~gurari/TeX4ht/)" />\Hnewline}}
         rel="stylesheet" type="text/css"
         href="\expandafter\csname aa:CssFile\endcsname" />\Hnewline}}
\Configure{@HEAD}{\HCode{<script type="text/javascript"\Hnewline
\Configure{@HEAD}{\HCode{<style type="text/css">\Hnewline
  .MathJax_MathML {text-indent: 0;}\Hnewline


mk4ht xhmlatex $1 "ht5mathjax"