Math on the Web

William F. Hammond

Dept of Mathematics & Statistics
University at Albany
Albany, New York 12222

Monday, August 17, 1998

Michael Hamm <[email protected]> writes to [email protected]:

Do any browsers (esp. any versions of Mozilla or MSIE) read the HTML3 MATH tag and the tags that go in it? Which? Thanks.

In a single word, the answer is no.

HTML 3.0 was a 1994 W3C draft that never got beyond draft stage and was quickly superseded by HTML 3.2 and, later, HTML 4, which contain no provision for mathematics. (Well, one may use "<applet>" or, better, "<object>"; but that does not really give mathematics fully reasonable access to the web.)

Subsequent to the demise of math-in-html W3C formed an HTML Math Working Group whose work led to the creation of MathML, which is now a W3C recommendation with principal rendering implementations available currently through (1) WebEq applets under mass market browsers, (2) the W3C testbed browser (and point-and-click authoring tool) Amaya, and maybe (I am not up to date) IBM's TechExplorer. I believe that the source code for Amaya is available for those who wish to amend it. (For that matter I believe that all of the relevant source code of Mozilla, the public version of NetScape, is available now, too. I believe that WebEq and TechExplorer are proprietary with temporary free trials.)

While I understand and accept the reason for the exclusion of the HTML 3.0 math tags from HTML, we have been left with a situation that still presents a serious barrier to the efficient flow of (unstyled) content-level mathematical information through the web to robots, small-screen displays, audio streams, and Braille streams.

For mathematics on the web, there is a sense in which one can say that there has been very little progress in the last 5 years since it became possible to have network browsing tools, both under "http" and "gopher", quickly spawn external applications based on ``mimetype''.

It is unclear how much improvement will arise as things evolve from the dawn of MathML. My guess is that MathML will serve the needs of the mathematical, scientific, and engineering communities, while still permitting the loss of much of what we understand as ``content'' from many resources on the web when that ``content'' is mathematical in nature. Of course, provision for these considerations exists in MathML. The question is how much attention will be paid to it due to the fact that it is more expensive to handle.

For example, I think that it could very well develop to be at least 10 years before mathematical content can be searched through major web indexing and cataloging sites in any remotely robust way, while a great deal more would be possible more cheaply if a few additional arrangements were made for dealing crudely but faithfully with mathematical content in basic HTML.

The arrival of the ``bazaar'' model of development in the Mozilla Project gives one hope that this will happen.

The early long term plan, as I have understood it, of the MathML group was to rely on the implementation in mass market browsers of the type of client-side processing that is associated with eXtensible Markup Language (XML), and, in particular, a type of XML that might be called ``HTML extended by MathML (presentation tags)''.

The idea of XML is to make up your own HTML. The author or publishing house makes up a set of tags. Then he, she, or they work very hard to create ``rendering information'' about these tags in a ``style sheet'' language. A web-served XML document contains a reference to the corresponding style sheet, which is also available, under a style sheet mimetype, on the web. Browsers are supposed to be able quickly to digest the style sheet information and then quickly render the XML document. (The style sheet information may already be cached.) This is the XML dream.

The first rendering efforts with MathML were applet-based and, I believe, early MathML planning envisioned the creation of a mimetype for ``HTML extended by MathML'' and the creation of an independent rendering application (whether plugin or external) with specific knowledge of this markup language. W3C's Amaya appears to have ``HTML extended by MathML'' as its default language. (I don't know the details of Amaya.)

The "<object>" tag approach to MathML probably is more sensible for the long run than ``HTML extended by MathML'' if only because MathML is so much more granular than HTML. If I think about type-setting MathML, I tend to perceive that task as not any easier than that of local direct setting of Geoffrey Tobin's DTL (printable ascii equivalent of DVI). The point here is that setting MathML is probably too much to ask of native rendering by mass market browsers though it is certainly in scale for plugins and external apps.

There is still an issue in the eyes of some, on which I am neutral, of whether there is, or will be, a widely used style sheet language that is rich enough to provide the desired level of rendering of MathML presentation tags.

We need all of the good relevant plugins and external apps that the community has the energy to provide. Still, because these make more demands on the client side (than do ordinary browsers) -- demands that are not reasonable in some places and situations that are and will continue to be important -- we need to have a way to handle math on the web in formats that are very different from paper or "windowing" terminal displays without loss of ``content''. This is possible and really not that difficult.

Even if one wishes to set aside the need for audio, Braille, indexing, and searching streams, envision, for example, going as a visitor to look up something on the web in the San Francisco public library. All of the windowing stations are tied up. But you find simple terminal (vt100) access to the network via the browser "lynx" at a station that is available. It may be that the savvy library administrator has that station there because he knows that it will give you a way to avoid waiting. (In fact, if its processor is fast, that is almost certainly true.)

In ``windowing'' situations it is not too much to ask for the ``mathematical typewriter emulation'' (MTE) standard in mass market browser native rendering as part of native HTML. MTE is just emulation of the mathematical typewriter prevalent in all mathematics departments during the period 1960-1980. One had lots of symbols (in a fixed font), one could underline, one could move the paper for crude cursor positioning, one could make make something bold by re-striking after a slight horizontal displacement. It was crude, but it preserved content. Photocopy images of MTE documents were widely circulated as informal publications.

MTE is more ``in scale'' with ordinary HTML than is MathML, which is much closer to fussy typesetting.

All that needs to be added to basic HTML is:

  1. the horde of character entities that we need (in scalable fonts with algorithmic styling for bold, emphasis, and perhaps also several forms of alternate-emphasis). Algorithmic styling is desirable for efficiency even though it is less beautiful than separate fonts; but, for that matter, rendered HTML is already less beautiful than TeX rendered by "xdvi".

  2. a simple element "<lg> ... </lg>" (logical group) with attributes for horizontal and/or vertical cursor motion, described by a numerical multiplier relative to the size of the current font, prior to the display of the contents of the element and also with attributes for horizontal or vertical stretching, again described by a numerical multiplier relative to the size of the current font. Client rendering support for stretching should be optional. Client rendering support for positioning should be mandatory in windowed displays and where that is not appropriate the protocol should be to replace the opentag "<lg>" by the ascii character "{" and the closetag "</lg>" by the ``balancing'' character "}". (An attribute of the "lg" tag could be used to change the crude rendering strings "{" and "}" to other ordinary string values including empty ones. Attributes could also be used to furnish hints to computer-algebra systems or to furnish the identity of a MathML tag from which the current "lg" was fabricated. So MathML could be reconstructed. Of course, all of this would be authored in generalized LaTeX. :-))

  3. elements "<math>" (paragraph level) and "displaymath" (block level) in which

My understanding is that eventually the horde of characters and cursor movement will be possible with "w3-mode" in GNU Emacs under a windowing display. (I do not know about algorithmic styling.)

Inasmuch as there are very few "vt100" terminals extant that are not running in displays under local platform windowing systems, it is reasonable that the scientific and text-processing communities join in an effort to promote a broader collection of characters, cursor positioning, and algorithmic styling in enhanced "vt100" terminals.


AUTHOR

[Processed from GELLMU to HTML: Tue Aug 3 17:43:37 EDT 1999]