“arXiv” is the large e-print archive founded by Paul Ginsparg located at Cornell University (originally at Los Alamos National Laboratory). “tex4ht” is the software for conversion of well-structured LaTeX to HTML originated by the late Eitan Gurari of Ohio State.
For most of the e-prints at arXiv there is TeX source available, which is often LaTeX source. In the case where an author has provided carefully structured LaTeX source there is the possibility of conversion to HTML using a translation tool with support for math such as tex4ht.
The relatively recent arrival of HTML, version 5, and the web-served software provided by MathJax, make it possible for ordinary HTML web pages with math to be viewed in most current major web browsers.
These examples were selected based on my personal mathematical interests and based on the technical consideration that each LaTeX source file lent itself reasonably well to automatic translation via tex4ht. The mathematical merit of these examples has not been reviewed.
In each instance the “xhmlatex” configuration of Tex4ht was used to generate an XHTML document. A small amount of manual editing on that was performed (1) to convert to HTML5 (the text/html serialization) and (2) to invoke MathJax. (This last step should not be required with a rather recent version of tex4ht.)
Be aware that MathJax can take some time. For example, depending on your platform, the example by Funke & Millson, which represents about 35 pages printed on letter-sized paper and has nearly 2000 math zones, might take 2–5 minutes to load fully.
The idea that LaTeX documents, as they have been found in circulation during the period 1985–2014, could be translated to a formally structured SGML document type is, in a certain sense, folly just as it has been folly to imagine that more than 5% of the HTML documents in circulation during the period 1995–2014 are formally correct. Beyond that, the problem in translating LaTeX is compounded by the fact that the principal LaTeX engine implements LaTeX, the language, as a macro package under TeX, which is a Turing-complete programming language. So far in the development of LaTeX, there has been no reliably enforced boundary between LaTeX and TeX. In view of all of this it is remarkable, even astounding, that there has been any substantial degree of success with translation projects such as tex4ht.
Many examples found at arXiv are not easily made to run through tex4ht. I continue to believe that steps taken by the community to formalize suitable profiled usage of LaTeX will go a long way toward making better online versions of mathematical documents easily available. For more on this see my talk at TUG 2010 on LaTeX Profiles
These translations were made using the TeXLive 2010 version of tex4ht. With a more recent version of tex4ht, modifications for obtaining HTML5 with MathJax rather than XHTML may not be necessary. The modifications made here for the text/html serialization of HTML5 are these:
I have no doubt that a few other issues will arise. The basic point, however, is that correct application/xhtml+xml documents generated by xhmlatex need very little modification to fit into the text/html serialization of HTML5.
#!/bin/sh mk4ht xhmlatex $1 "ht5mathjax"