HTML Math Examples from arXiv via LaTeXML

March 28, 2014


“arXiv” is the large e-print archive founded by Paul Ginsparg located at Cornell University (originally at Los Alamos National Laboratory). “LaTeXML” is the software for conversion of well-structured LaTeX to HTML originated by Bruce Miller of the U.S. National Institute of Standards and Technology (NIST).

For most of the e-prints at arXiv there is TeX source available, which is often LaTeX source. In the case where an author has provided carefully structured LaTeX source there is the possibility of conversion to HTML using a translation tool with support for math such as LaTeXML.

The relatively recent arrival of HTML, version 5, and the web-served software provided by MathJax, make it possible for ordinary HTML web pages with math to be viewed in most current major web browsers.


These examples were selected based on my personal mathematical interests and based on the technical consideration that each LaTeX source file lent itself reasonably well to automatic translation. The mathematical merit of these examples has not been reviewed.

Be aware that MathJax can take some time. For example, depending on your platform, the example by Funke & Millson, which represents about 35 pages printed on letter-sized paper and has nearly 2000 math zones, might take 2–5 minutes to load fully. The alternative versions labeled “for Firefox with mathfonts” are for any web browser capable of directly rendering XHTML with MathML; they may also take a while to load but should be quicker than the MathJax-ed versions.

alg-geom/9304003: D. Bayer & D. Mumford,
     “What Can Be Computed in Algebraic Geometry”
     (or the version for Firefox with mathfonts).
1104.2804: T. Ohira & H. Watanabe,
     “A Conjecture on the Collatz-Kakutani Path Length for the Mersenne Primes”
     (or the version for Firefox with mathfonts).
1108.5305: J. Funke & J. Millson,
     “The Geometric Theta Correspondence for Hilbert Modular Surfaces”
     (or the version for Firefox with mathfonts).
1109.1881: Th. Bauer, B. Harbourne, A. L. Knutsen, A. Küronya, S. Müller-Stach, T. Szemberg,
     “Negative curves on algebraic surfaces”
     (or the version for Firefox with mathfonts).
1207.5765: Joseph H. Silverman,
     “An oft cited letter from Tate to Serre on computing local heights on elliptic curves”
     (or the version for Firefox with mathfonts).


The idea that LaTeX documents, as they have been found in circulation during the period 1985–2014, could be translated to a formally structured SGML document type is, in a certain sense, folly just as it has been folly to imagine that more than 5% of the HTML documents in circulation during the period 1995–2014 are formally correct. Beyond that, the problem in translating LaTeX is compounded by the fact that the principal LaTeX engine implements LaTeX, the language, as a macro package under TeX, which is a Turing-complete programming language. So far in the development of LaTeX, there has been no reliably enforced boundary between LaTeX and TeX. In view of all of this it is remarkable, even astounding, that there has been any substantial degree of success with translation projects such as LaTeXML.

Many examples found at arXiv are not easily made to run through LaTeXML. In fact, there was a large effort mounted by the arXMLiv Project to translate most of the LaTeX source documents at arXiv to HTML using LaTeXML. I understand the success rate as of March 2014 to be around 70%. While that is not sufficient for a fully automated production system, it is nonetheless astounding. Moreover, the problem, as explained above, largely lies with source documents failing to match the standard.

I continue to believe that steps taken by the community toward formalizing suitable profiled usage of LaTeX would go a long way toward making better online versions of mathematical documents easily available. For more on this see my talk at TUG 2010 on LaTeX Profiles

A suitable command line invocation of LaTeXML will lead to HTML, version 5, output that is configured for MathJax. I found it convenient (in Linux, OSX, or Windows with Cygwin) to use a small (Bourne) shell script, which was this:

pname=`basename $0`
if [ "$#" != "1" ] ; then
  echo "Usage:    ${pname}  stem-name"
  exit 1
if [ ! \( -f "${stem}.tex" \) ] ; then
  echo "${pname}: Cannot find ${stem}.tex"
  exit 2
latexml "--destination=${stem}.xml" "${stem}.tex"
if [ "$?" != "0" ] ; then
  echo "${pname}: latexml did not finish cleanly on ${stem}.tex"
  exit 3
if [ ! \( -f "${stem}.xml" \) ] ; then
  echo "${pname}: Cannot find latexml output file ${stem}.xml"
  exit 4
latexmlpost --format=html5 "--destination=${stem}.html" --presentationmathml "${stem}.xml"
if [ "$?" != "0" ] ; then
  echo "${pname}: latexmlpost did not finish cleanly on ${stem}.xml"