W. F. Hammond: The GELLMU Archive

2. Introduction

GELLMU is an acronym for “Generalized Extensible LaTeX-Like MarkUp”. It admits easy transliteration to an equivalent language under the umbrella of “Standard Generalized Markup Language (SGML), ISO 8879:1986”. As such, it is instantly amenable to many powerful SGML processors that appear not to be widely used in the community of mathematicians, scientists, and engineers as evidenced, for example, by the practice of the large preprint archive at Los Alamos National Laboratory (LANL).

I began to make early GELLMU materials available through the network on July 29, 1998. Long term the only tool that I am serious about providing, i.e., offering a full sketch on a didactic basis for public consumption, is the easy transliterator (syntactic translator) since so much by others that is good is already available and only needs configuration that is author or work group dependent.

The merit, if any, of the transliterator is that it provides input notation that is minimal and comfortable for those accustomed to writing LaTeX. An author who wishes to use it will need to change some past LaTeX practice. But the level of change will be much less than that required for writing with regular XML notation or even SGML notation. The level of change will be small enough that one might wonder about its eventual merging into standard LaTeX.

My provision of a LaTeX-like markup inteface for SGML and XML is intended to serve as a vehicle for coaxing and helping those who write for technical academic journals and who have used TeX and LaTeX to move into the new world of high technology documents.

The use of SGML has simply not “caught on” in the world of academic mathematics, science, and engineering because, first, there is no sufficiently acceptable interface with traditional TeX- or LaTeX- based legacy markup practice and, second, because, there are some subtle points involved in moving between something like LaTeX and a language under SGML that have not had sufficiently broad understanding among authors.

A GELLMU system is extensible in ways that go beyond the extension of classical LaTeX by imported or locally-grown styles and classical packages¹. Moreover, I do not see GELLMU as a replacement for LaTeX as a print formatter; indeed, LaTeX will play an important role in my personal GELLMU configuration for the forseeable future.

Most of the reading material on GELLMU in my web site was authored under this system. So far, I find it to be a system that works as I want.

3. What is Generalized Extensible LaTeX-Like MarkUp?

The main idea is to have a language for single source authoring toward multiple presentation formats (SSATMPF) that resembles LaTeX, that comprehends mathematics, and is archive-able. Ultimately, the concept of “presentation format” is very broad and includes not only obvious targets such as classical TeX or LaTeX but also search engine databases and indexing abstracts as well as intermediate SGML dialects for staged processing.

I suspect that existing languages under SGML are languages for SSATMPF that could be fairly easily extended to meet the needs of mathematicians, scientists, engineers, and others who traditionally have used systems under the general TeX umbrella, at least when these individuals are aggregated at the work group level. I believe also that there are existing SGML tools that will, after mere configuration, a level of configuration comparable to writing a package of functions for a computer algebra system, provide translation from suitable SGML languages to various presentation formats including, for example, classical TeX and HTML.

This is very much an object-oriented approach to authoring, regardless of whether mathematics is involved, and, at the very least, it needs to be explored by the mathematical community because it could offer the best chance for the future of having documents that are “smart” enough to allow an overall sound approach to searching the network for the current state of knowledge. (On the subject of “smart documents”, see the work of Richard Fateman, “More Versatile Scientific Documents …”.)

The issue of whether a mathematically-capable SGML is optimal for future mathematical authoring is entirely independent of the issue, essentially a personal preference, of whether generalized LaTeX-like markup is used.

It is fairly clear that neither plain TeX nor LaTeX, without some adjustment, is a language for multiple format authoring.

The main point of GELLMU is to support those who are accustomed to the convenience of LaTeX. With some effort each author or working group that is so inclined should be able to meet its markup needs using GELLMU with little change of habit and with the gain of having the ability to arrange for automatic translation to almost any desired output format, including TeX, using existing free SGML tools that are stable. These existing tools do require configuration.

For example, the TeX “backend” for James Clark's JADE, a general free SGML processor that is configured with a “style sheet” language gives one confidence that SGML renditions of LaTeX documents may be formatted in plain TeX (for subsequent typesetting by TeX the Program) without using TeX, the Program, for the LaTeX layer though the fineness of such formatting will be limited by the nature of style sheet processing. But there are free SGML tools that can accept more complicated formatting instructions than those expressed in a style sheet language.

The design of GELLMU envisions a three stage system.

Stage 1 uses an E-lisp program to convert GELLMU input into an SGML document. (E-lisp is the dialect of Lisp that underlies GNU Emacs.)

Stage 2 uses the SGML parser “nsgmls” of James Clark to produce fully parsed text streams from SGML. There are three steps in stage2.

A validating parse of the SGML output from stage 1.
A transformation of the parse output using a small author-provided collection of Perl codelets for David Megginson's sgmlspl SGML-processing program, actually an interface framework for his Perl library sgmlspm, to produce an XML representation of the document with some enhancements.
A validating parse of the XML representation of the document.

Stage 3 uses author-provided collections of Perl codelets for sgmlspl to make various formattings of the parsed XML.

Information about Perl may be found on the web at The Perl Home Page.

The point of this design is that each stage is highly transparent and highly configurable. There are many opportunities for user intervention. The idea, of course, is to tune the Perl-based transformations so that manual intervention is never necessary. There are no theoretical obstructions to achieving this goal.

4. The Materials

A fair amount of explanation is available at this time. More will be forthcoming. Items now available, all at early draft level, include:

Code
1. gellmu.el
  the stage 1 processor.
2. gellmu.decl
  an example SGML Declaration (required with “gellmu.dtd”).
3. gellmu.dtd
  a example SGML Document Type Definition, needed by a stage 2 (SGML parsing) processor such as “nsgmls”, greatly expanded since July 28. This changes slowly but steadily.
4. catalog
  an SGML Catalog for “nsgmls”, the stage 2 processor.
5. xmlgart.pl
  Perl codelet package for sgmlspl that performs the transformation of the parsed SGML version into XML.
6. htmlgart.pl
  a demo codelet package toward the target “HTML” for use with the stage 3 processor sgmlspl, greatly expanded since July 28 but still missing planned features. It would not be much too soon for an interested party to hack from this toward the target “HTML+MathML” or toward the target “Texinfo”. This changes constantly; all table (and tabular) handling is completely experimental and incomplete.
7. ltxgart.pl
  a demo codelet package toward the target “LaTeX” for use with the stage 3 processor sgmlspl, greatly expanded since July 28 but still missing planned features. This changes constantly; all table (and tabular) handling is completely experimental and incomplete.
Examples
1. A minimalist example of a GELLMU article.
2. GELLMU markup for A Short Story by A. U. Thor, a variant of the classical TeX benchmark “story.tex”.
3. Multiple forms of this, the current, document.
  1. igl/gellmum.glm
    the GELLMU source for this document.
  2. igl/gellmum.sgml
    the syntactic translation to SGML of the GELLMU source.
  3. igl/gellmum.xml
    the XML translation of the GELLMU source. This translation involves significant, but not extensive, knowledge of the document type.
  4. igl/gellmum.html
    the HTML version of this document made from the XML version.
  5. igl/gellmum.ltx
    the LaTeX version of this document made from the XML version.
  6. igl/gellmum.dvi
    a DVI file made from the LaTeX version of this document.
4. Authoring DTD's in GELLMU
5. notation
  a very preliminary text draft of my thoughts about the legacy of more than 200 years of typeset mathematics in an effort to advance the idea that legacy notation is not ambiguous once adequate “type” information has been added (once in a document) for each symbol. This is needed for MathML and to make possible substantial enhancement of online services such as the American Mathematical Society's MathSciNet.
6. to-do
  notes on the tasks that lie ahead.

Other items and a pointer to the live demo page, with a pre-release tarball, can be found on the GELLMU veterans web page.

5. Usage under *IX

Place “gellmu.el” either in your GNU Emacs library path or else in your working directory.
Place the declaration, the DTD, and the catalog in the same directory, possibly different from your working directory.
In your working directory open “myfoo.glm” file in GNU Emacs and then:
- M-x load-library <RET> gellmu.
- M-x gellmu-trans.
- After eliminating all reported input errors in the GELLMU, save the SGML file found in the new buffer.
Assuming that the file “myfoo.sgml” begins with a <!doctype...> tag, as it will if “myfoo.glm” began with a \documenttype command, use:

nsgmls -m pathname-of-catalog myfoo.sgml | sgmlspl codelet-package
to obtain the format that is the target of the sgmlspl codelet package on standard output.

Most of the code items in the list above will be under constant revision for at least the next six months. Except for the stage 1 transliterator that converts input GELLMU markup to SGML, they will always, necessarily by system design, be here as incomplete examples. That said, they may become more comprehensive examples as time passes. On the other hand, examples that are too comprehensive may be too complicated for reasonable digestion as illustrative materials.

The stage 1 processor is provided as a didactic example in E-lisp source code form only. I would expect a serious high volume authoring work group to seek a much faster transliterator. In its present form an author or work group is able to tinker with design, which might be a good idea before seeking a professional product. The stage 1 processor is not an essential ingredient of the overall design discussed here; it is merely intended for the convenience of academic authors who by habit prefer LaTeX-like markup to SGML tagging, which is similar to manual HTML tagging. The usefulness of the stage 1 processor is tied to the usefulness of its SGML output. That output is for authoring platform use only. Its usefulness will rest, for a given author or work group, on the soundness of the design of its SGML DTD (a task for a mathematician) and the soundness of the stage 3 processing, as configured by the author or work group, that is used to produce documents for presentation in various formats according to the needs of the author or work group.

6. The System Design

One of the criteria for archive-ability is that a document consists only of 75 column printable text. This makes it possible to have robustly scannable paper backup. (Your scanner may care about some of the details of actual printing.) The source documents found in the above list have been prepared with this in mind. The SGML images of those documents are somewhat more verbose; they are set with newline alignment for ease of diagnosing authoring errors. Consequently, the SGML images shown in the list would not be archivable as 75 column printable text. (Of course, manually authored SGML may be done within 75 columns of printable text.)

A DTD's design determines the set of commands (or tags) and, moreover, may be used to incorporate input markup preferences. It should not be used to define “macro”-level names.

It would be possible to provide for "\newcommand" as a reserved name in stage 1 only if the expansion names were either in a nearby DTD or were names defined by the same procedure in the same document. However, standard macro processing prior to stage 1 is a better design. (Recall the philosophy of Kernighan and Pike.) For one thing it is less prone to overburdening. More important, while the source document is the most valuable item to its author, the SGML transliteration is a nearly equivalent document that will be much easier for extant SGML searching and indexing tools to handle. Names in that context should be restricted to names that cannot receive absolutely uniform rendering in terms of other names under every imaginable presentation format including abstract formats.

Whether or not one is familiar with macro pre-processors, there is another approach to very personal command names. This approach is to use GELLMU input markup with all of your personal macro names as a “Personal MarkUp Language (PMUL)” and then use the GELLMU system to create the SGML form (or, if you want for co-authoring, the GELLMU input form) of your work group's markup. (You might then never again think that it is OK to let English professors co-author using different word processors.)

The DTD only serves to define what is “legal” for the markup names used in a language with SGML-style syntax: the command set in the DTD, and the placement rules for those commands.

In principle, one could live without a DTD; that would be formal XML. GELLMU does require a DTD, and all commands that are used in a document must be there, but some slack is possible in setting up the “placement rules”, the rules that say where a command may or may not be used. Sometimes a little deliberate slack is made necessary by the limitations of DTD language.

In this connection it should be noted that the parsed form of an SGML document contains both an opentag and a closetag for each element, even each empty element, regardless of whether such tags may be omitted under the rules of the DTD. If one wants to be able to have new paragraphs invoked with blank lines in GELLMU source, then an essentially faithful (not quite invertible) transliteration of that source to SGML should not write closetags for such paragraphs. (Remember that stage 1 does not really “know” what the blank line means.) On the other hand, it should write both tags for "\par{ . . .}". That is why "par" and "parb" both exist in the didactic DTD as different commands. For now the DTD gives them equivalent treatment, as do the didactic stage 3 codelet packages.

It is easier to write a DTD with lots of slack than to write a “tight” DTD. But the DTD is a one-time effort. Easier one-time writing of stage 3 codelets, and perpetually easier diagnosis of authoring errors are the reward for a “tight” DTD.

A work group in mathematics that could imagine writing LaTeX “packages” or “styles” should do its own DTD authoring. (There will be a time of learning.)

The DTD gives the author of stage 3 command processing codelets a complete view of the range of possible situations that must be dealt with when a command appears. sgmlspl is an event-driven framework that makes codelet writing for sgmlspl much easier than authoring for TeX, the program. (Just look at the example codelets.)

The event-driven nature of the framework means that, with care, there will be little need to “rip out” code as new commands are added.

An author or working group that wants to have the FOO table model as the command “Table” (upper case “t”) may enable it by (1) writing in the DTD what it may contain, (2) writing in the DTD where it may occur, and (3) writing codelets for the processing of the two tags "<Table>" and "</Table>".

If FOO is the simple early HTML table model, and if one is coding for the presentation target “text/plain”, (in this case a difficult target) one could for the opentag simply push all decisions forward to the processing of the closetag, where all element contents have been stored with all commands within those contents already rendered, thanks to the event driven framework. For the closetag one may compute widths, insert newlines, etc., as needed to obtain the desired result.

This is single pass SGML processing. If you need another pass, design your system to process first to an intermediate SGML target, and then reach the presentation target from the intermediate SGML. Of course, in principle, since a GELLMU document has a root tag, a “push” across the whole document means that it is really not a single pass. On a platform with unlimited resources that would not matter. In reality the codelet package writer needs to think “how big” a pushed element could be. Such thought would not be reasonable across the web, but this is an “inhouse” context.

The design of the above example DTD, “gellmu.dtd”, is slanted somewhat toward traditional LaTeX with a little attention toward markup about computer programming (like that in the current document). Little mathematics has been incorporated at this stage (August 24), because (1) it awaits another, non-legacy feature, relative to LaTeX, in first stage processing and (2) with the two particular didactic presentation targets (a) LaTeX (the language, Lamport, v. 2) and (b) math-less HTML there is little challenge in it. ²

7. Miscellaneous Comments on the Listed Items

The “.pl” items listed above are the didactic example SGML conversion packages for sgmlspl conversion of documents prepared under the didactic example “gellmu.dtd”.

The “t” documents listed above provide a very minimal example of (1) GELLMU input markup, (2) the SGML image document, (3) the text stream produced by “nsgmls”, (4) the target image in HTML made using sgmlspl with “htmlgart.pl”, and (5) the target image in LaTeX made using sgmlspl with “ltxgart.pl”.

The “anch” documents are similar, longer examples, without the intermediate steps, each with a small didactic message.

Almost any document format is a possible target for a document authored in GELLMU.

While I cannot “prove” it, I believe that almost any input markup that is reasonable for single-source authoring toward multiple presentation formats (SSATMPF, a noun acronym) with assurance of consistent content to the extent of possible content-compatibility among the different presentation formats — one would not expect an index engine scan to have as much content as an HTML version — may be implemented faithfully under an SGML DTD and the rendering to a given presentation format may be carried out with SGML processing, possibly after a finite number of intermediate translations to other SGML document types.

I expect that Texinfo, the language of the GNU Documentation System, is a language for SSATMPF.

While I think that Lamport, v. 2, LaTeX is not equivalent to a language under SGML, due, for example, to things like "\newtheorem", I believe that a legacy of sound markup practice, with attention to issues of “content” versus “style”, in standard LaTeX, even augmented by some packages, in a given work group, is likely to be very close to an SGML language for SSATMPF. Indeed, I think that Ulrik Vieth understood this, at least implicitly if not explicitly.

The design of GELLMU was inspired, in part, by the markup that I found in a single LaTeX document, the TeX Directory System (TDS) specification document by Ulrik Vieth of The TeX Users Group (TUG). What is very interesting in this package is that the input source is a LaTeX document which Vieth transformed into Texinfo using ad hoc E-lisp code. I found Vieth's raw markup elegant and extremely pleasant to read. The use of sound markup practice in the TDS document enabled Vieth to transform it to a language that I already believed to be, apart from its lack of real mathematics, a language for SSATMPF.

Experimental GELLMU Materials

William F. Hammond

Department of Mathematics & Statistics
The University at Albany
Albany, New York 12222

September, 2000
(this material is mostly obsolete)
Last minor update: January 10, 2010

© 1998-2000 by William F. Hammond

D R A F T

Table of Contents

1. Notice

2. Introduction

3. What is Generalized Extensible LaTeX-Like MarkUp?

4. The Materials

5. Usage under *IX

6. The System Design

7. Miscellaneous Comments on the Listed Items

8. Notes

Footnotes

Experimental GELLMU Materials

William F. Hammond

Department of Mathematics & Statistics The University at Albany Albany, New York 12222

September, 2000 (this material is mostly obsolete) Last minor update: January 10, 2010

© 1998-2000 by William F. Hammond

D R A F T

Table of Contents

1. Notice

2. Introduction

3. What is Generalized Extensible LaTeX-Like MarkUp?

4. The Materials

5. Usage under *IX

6. The System Design

7. Miscellaneous Comments on the Listed Items

8. Notes

Footnotes

Department of Mathematics & Statistics
The University at Albany
Albany, New York 12222

September, 2000
(this material is mostly obsolete)
Last minor update: January 10, 2010