HTML Math Examples from arXiv via Tex4ht

Introduction

“arXiv” is the large e-print archive founded by Paul Ginsparg located at Cornell University (originally at Los Alamos National Laboratory). “tex4ht” is the software for conversion of well-structured LaTeX to HTML originated by the late Eitan Gurari of Ohio State.

For most of the e-prints at arXiv there is TeX source available, which is often LaTeX source. In the case where an author has provided carefully structured LaTeX source there is the possibility of conversion to HTML using a translation tool with support for math such as tex4ht.

The relatively recent arrival of HTML, version 5, and the web-served software provided by MathJax, make it possible for ordinary HTML web pages with math to be viewed in most current major web browsers.

Examples

These examples were selected based on my personal mathematical interests and based on the technical consideration that each LaTeX source file lent itself reasonably well to automatic translation via tex4ht. The mathematical merit of these examples has not been reviewed.

In each instance the “xhmlatex” configuration of Tex4ht was used to generate an XHTML document. A small amount of manual editing on that was performed (1) to convert to HTML5 (the text/html serialization) and (2) to invoke MathJax.

1104.2804: T. Ohira & H. Watanabe,
     “A Conjecture on the Collatz-Kakutani Path Length for the Mersenne Primes”.
1108.5305: J. Funke & J. Millson,
     “The Geometric Theta Correspondence for Hilbert Modular Surfaces”
1109.1881: Th. Bauer, B. Harbourne, A. L. Knutsen, A. Küronya, S. Müller-Stach, T. Szemberg,
     “Negative curves on algebraic surfaces”.
1207.5765: Joseph H. Silverman,
     “An oft cited letter from Tate to Serre on computing local heights on elliptic curves”

Comments

Many examples found at arXiv are not easily made to run through tex4ht. I continue to believe that steps taken by the community to formalize suitable profiled usage of LaTeX will go a long way toward making better online versions of mathematical documents easily available. For more on this see my talk at TUG 2010 on LaTeX Profiles

The modifications made in these instances for the text/html serialization of HTML5 are these:

  1. Remove the XML declaration and any XML processing instructions at the front of the document.
  2. Replace the document type declaration with the simple declaration for HTML5: <!DOCTYPE html>
  3. At the end of the HTML head add the following MathJax invocation:
     <script type="text/javascript"
       src="http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML"
     ></script>
  4. (Unnecessary for new browsers?) Defined-empty MathML elements must be written with full closetags rather than with self-closing opentags. For example, replace <mspace width="1em"/> with <mspace width="1em"></mspace>. (This is not necessary with HTML elements.)
  5. (Temporary?) After the MathJax invocation add:
     <style type="text/css">
       .MathJax_MathML {text-indent: 0;}
     </style>
  6. (Optional) Replace any <meta> for http-equiv with <meta charset="UTF-8">.
  7. Where auxiliary XML files are generated by xhmlatex, those files should, for consistency, be written as text/html serializations of HTML5 and references to their names should be adjusted accordingly.
  8. Note that “id” values should not begin with a numeral.

I have no doubt that a few other issues will arise. The basic point, however, is that correct application/xhtml+xml documents generated by xhmlatex need very little modification to fit into the text/html serialization of HTML5.

Provisional Scripting

~/texmf/tex/generic/tex4ht/ht5mathjax.cfg

% From Michal Hoftich <michal.h21@gmail.com> on 2 Aug 2012
% by email with cc to tex4ht@tug.org
% Hoftich suggests the name mathjax.cfg
% His suggested command line:
%   htlatex 1109.1881v2.tex "mathjax, charset=utf-8,NoFonts" " -cunihtf -utf8"
% Mods by William F. Hammond
%
\Preamble{xhtml,mathml}
\Configure{VERSION}{}
\Configure{DOCTYPE}{\HCode{<!DOCTYPE html>\Hnewline}}
\Configure{HTML}{\HCode{<html>\Hnewline}}{\HCode{\Hnewline</html>}}
\Configure{@HEAD}{}
\Configure{@HEAD}{\HCode{<meta charset="UTF-8" />\Hnewline}}
\Configure{@HEAD}{\HCode{<meta name="generator" content="TeX4ht
(http://www.cse.ohio-state.edu/\string~gurari/TeX4ht/)" />\Hnewline}}
\Configure{@HEAD}{\HCode{<link
         rel="stylesheet" type="text/css"
         href="\expandafter\csname aa:CssFile\endcsname" />\Hnewline}}
\Configure{@HEAD}{\HCode{<script type="text/javascript"\Hnewline
src="http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML"\Hnewline
></script>\Hnewline}}
\Configure{@HEAD}{\HCode{<style type="text/css">\Hnewline
  .MathJax_MathML {text-indent: 0;}\Hnewline
</style>\Hnewline}}
\begin{document}
\EndPreamble

~/bin/ht5mjlatex

#!/bin/sh
mk4ht xhmlatex $1 "ht5mathjax"