What is a Reasonable Authoring DTD
under SGML or XML for MathML?

William F. Hammond

Email: hammond@math.albany.edu

1.  Introduction

There has been a recent resurgence of interest in MathML, the rather granular XML language developed by the World Wide Web Consortium (W3C) HTML Math Working Group during the period 1996-2000, due to the availability of MathML-capable builds of the browser Mozilla, the open-source development version of the popular browser NetScape.

2.  A Few Examples

  1. Compound fractions: abcd=adbc.

  2. The formula for solving the quadratic equation ax2+bx+c=0 (in a field of characteristic 2): x=b±b24ac2a.

  3. Mixed function application and multiplication: sinaxcosbx.

  4. Newton's binomial series: 1+tr=k=0rr1r2rk+1k!tk.

  5. A differential equation: D2y3xDy2=xcosx.

  6. Stokes's Theorem in space: ScurlF·Ndσ=SF·Tds.

  7. The continued fraction for the golden mean: 1+52=11+11+11+11+.

  8. The representation GalQ¯Q of a centrally important object that one might choose to declare as the symbol “galQ”. In this instance, however the expression is formed using the following three declared symbols:1

    GELLMU expansion
    Q Q \regch{\bold{Q}}
    Qbar Q¯ \ovbar{\regch{\bold{Q}}}
    Gal Gal \mbox{Gal}
    Here the example is repeated GalQ¯Q with the same presented appearance but this time as the declared symbol galQ, which is defined without using other declared symbols in its definition.

3.  Generating MathML

There is a serious issue surrounding how one might migrate from traditional TeX-like mathematical markup, which uses reasonably succinct mathematical notation based on the long tradition of Western mathematical notation, to an authoring markup that is fully adequate for translation to MathML. For example, how can we automatically translate, with full confidence, the XML versions of the above mathematical examples into MathML? Or, if we cannot, what additional information needs to be added?

One possibility is offered by my draft on mathematical notation at the URL

http://www.albany.edu/~hammond/gellmu/notation .

It attempts to explain what additional information is needed in this document to eliminate the need for guessing by an automated rendering system at work on these examples, as marked up in the XML version of this document. Note that no guessing is needed to render this document in either HTML, with mathematics set crudely but reasonably, nor to render it in LaTeX. (Perhaps one may not fully appreciate this latter point without examining the XML version of this document.)

For the purpose of assistance in automated rendering to MathML as well as for the purpose of supplying semantic information for computer algebra systems, GELLMU provides a metacommand mathsym2 for the formal declaration of mathematical symbols with the usage:

\mathsym{symbol-name}{symbol-rendering}[symbol-meta-info] .

Here symbol-name is an alphanumeric string (case-sensitive) beginning with a letter. The second argument is the presentation rendering of the symbol in GELLMU markup. It is like the definition of a newcommand except that it may not involve arguments.3 The optional third argument symbol-meta-info is an alpha-numeric string that might also include possibly a few other string characters such as ‘/’, ‘-’, ‘,’, ‘.’, ‘*’, etc. Its exact structure depends on the production system. For example, it might consist of (name, value) pairs for conveying meta-information about the symbol.

The syntactic translator replaces each invocation of a given mathsym with the specified rendering and writes for each mathsym definition a corresponding element in the SGML output whose content consists solely of the declared symbol name if there is no meta information but otherwise consists of the symbol name followed by a blank space and then whatever string of meta information is provided in the optional argument. Additionally, each invocation is wrapped in a rendering-inert Sym element whose key attribute reveals the name given to the symbol at the point of declaration (and by which the symbol is invoked). This makes it possible for a downstream authoring platform processor that has remembered the list of declared symbol names to match each invocation of a declared symbol with its associated meta information, if any, provided by the author in the symbol declaration.

A related feature in the didactic GELLMU document type is the mlg tag for marking mathematical logical groups. This is somewhat akin to the lgg tag for TeX-like logical groups, traditionally created in TeX markup with braces that are not attached to a command.4 As with lgg there is no obvious evidence of an mlg tag in a typeset rendering, but the presence of such a tag is intended as a signal to downstream mathematical parsers that the contents of the tag be given grouping priority as, say, with visible parentheses. Furthermore, the mtype and mml attributes of the mlg tag may be used to pass semantic information about the tag's contents to a processor.

The reader is invited to do one or more of the following:


  1. * The command regch is a variant of mbox that is intended to denote the normal version of a “regular” character found in a mathematical context when that character is suitable for a hypothetical algorithmic application of an accent such as ovbar. A general mbox is regarded as not suitable for hypothetical algorithmic accenting.
  2. * The name mathsym is the default value of the variable gellmu-mathsym-name in the syntactic translator.
  3. * However, a declared math symbol may be invoked in a newcommand that takes arguments.
  4. * Such unattached braces in GELLMU markup lead to an lg0 tag in the output of the syntactic translator that is translated to an lgg tag in the XML version of the didactic document type.