Using the GELLMU Syntactic Translator to Write HTML

William F. Hammond

Basic

One can create HTML directly using LaTeX-like markup.

In so doing one is not writing LaTeX. Instead one is consciously writing HTML.

The basic ideas may be understood by examining an example such as this document and comparing its appearance as rendered HTML with the GELLMU source and possibly also with the HTML source.

The GELLMU syntactic translator, a program written in Emacs Lisp (casually known as Elisp) is used to make HTML from GELLMU source. It is not correct at all to view this program as a collection of editing macros. It is a program entirely parallel to a program written in a language such as C that may be run in batch mode on any platform where GNU Emacs is available. Such platforms include “Linux” and “Windows”. Alternatively, it may also be run interactively from an Emacs buffer.

It is desirable, of course, to run any HTML through a validator for error checking.

HTML Elements

There are three basic ways one may use LaTeX-like markup to make an HTML element with content.

1. The first basic way is that used with the “b”, “em”, and “h3” elements in this document. For example, the markup "\b{not}" appears previously in the source for this article.
2. The second basic way is that used with the “body” element of this document, the “ol” element for this ordered list, and the “table” element in the next part of this document. This usage has the form "\begin{body} . . . \end{body}".
3. The third basic way, which is a variant of the second basic way, is that used here for the element “ul” to make this list:
• bird
• cat
• dog .
This markup has the form "\ul ... \ul:" with "\ul" signifying the HTML opentag "<ul>" and "\ul:" signifying the HTML closetag "</ul>" almost directly.

Another optional way, available for some HTML elements (depending on the formal definition of the HTML language) is that used for the paragraphs (the “p” elements) of this document. This method relies on the language definition for the automatic determination of the end of the element's content. In HTML, for example, a paragraph is automatically terminated by the appearance of any number of elements including a (new) paragraph, a list, a table, and a header.

Elements Defined as Empty

Some HTML elements are defined as empty elements in the formal definition of HTML. The element “hr” for a horizontal rule is an example. In the GELLMU markup interface for HTML, one writes "\hr;".

The usage above is not correct for SGML elements that are not defined as empty, but, when consistent with the formal language definition, happen to be empty in a particular instance. Depending on whether OMITTAG is available in the SGML language (but never in XML) and further whether its use is appropriate for the particular container, one might be able to use the brief markup "\foo" but always may use "\foo{}" or "\foo\foo:". In HTML, for example, an empty paragraph may be indicated with "\p" if it is followed without content by something that forces a paragraph to end, but an empty title must be written as "\title{}" or as "\title\title:".

A Minor Point: By default the markup "\foo;" is translated to "<foo/>" since this is both correct and required for a element that is defined empty in any XML language and is correct and “preferred” in the default GELLMU language for LaTeX emulation. It is not correct for use with classical SGML languages such as HTML (as opposed to XHTML); the Elisp variable gellmu-sgml-emptytag-close is a string variable that has the default value "/>", but may easily be reset to ">", and must be reset to ">" for use with classical HTML. This may be done either interactively within Emacs using

       M-: (setq gellmu-sgml-emptytag-close ">")


or otherwise placed in a batch file that is used to launch the GELLMU syntactic translator.

Certain Characters

Name Appearance   Markup
Backslash \ \\
Left brace { \{
Right brace } \}
Percent % \%
Ampersand & \& (& is often OK)
Less than < <
Greater than > >
Left bracket [ [
Right bracket] ]

Note that in LaTeX itself the markup "\\" signifies a forced linebreak. In HTML, however, the empty tag "<br>" is used to make a linebreak. With this method of making HTML one uses "\br;" to signify the HTML tag "<br>" and one doubles the LaTeX-like command sequence introducer, i.e., the character "\", to “escape” the command sequence introducer.

Usage of markup for the characters backslash, the LaTeX-like command sequence introducer, and ampersand, the HTML entity introducer, is illustrated by using the markup "\\foo" to create for HTML a reference to the TeX command "\foo" and the markup "R\&R" to create the abbreviated phrase “R&R” when the special character is not immediately followed by a word boundary.

Attributes

Any HTML element may have a list of one or more attributes. For example, the “h1” element at the beginning of this document has the attribute “align”, which is given the value “center”.

An attribute list is introduced by the character "[" immediately following the element name and terminated by the character "]". There may be any number of attributes in an attribute list in accordance with the specification for the particular element in the formal definition of HTML.

HTML Anchors

Anyone familiar with HTML knows that the element “a” signifies an HTML anchor. Here is an anchor to the top of this document. And here is an (external) anchor to the

SGML Languages other than HTML

SGML languages other than HTML may be treated the same way once the author is familiar with the tag names and the rules for their deployment.

One may use standard SGML entity notation directly. For example, "&gt;" is an alternative way to write the character “>” in HTML.

XML Languages

XML languages are special instances of SGML languages.

There are several important restrictions with XML.

1. All element names are case-sensitive.
2. Only the three basic methods of writing a element, other than an element that is defined as empty in the language definition, may be used. That is, the precise opening and closing points for every container is required.
3. Every defined-empty element must be written in the form "\foo;".
4. Every attribute value must be quoted.

LaTeX Emulation

LaTeX emulation in an SGML (or XML) language is a very different thing than the use of LaTeX-like syntax to write in SGML.

The original idea in the GELLMU project of using LaTeX-like syntax for writing in an SGML language envisioned its use primarily with an SGML language that includes reasonably full markup for research-level mathematical and scientific articles, rich enough, as HTML is not, to admit useful and fully reliable automatic creation of LaTeX source and source for other standard formats including HTML (necessarily with crude representation of mathematics in HTML). In that context it makes sense to ask for additional services from the syntactic translator in the way of LaTeX emulation, still, however, with the goal of obtaining a valid document in the SGML language.

By default the GELLMU syntactic translator runs in LaTeX emulation mode. This incorporates a number of LaTeX-like features that do not apply in the case of its basic mode.

It is easy either interactively within Emacs or in a batch file to effect the emacs command

       M-: (setq gellmu-straight-sgml t)

for using LaTeX-like syntax to write in regular SGML (or XML) vocabularies such as HTML, TEI, or DocBook. Other variable settings permit the use of blank lines for new paragraphs under certain circumstances.

Because SGML attributes are not part of LaTeX, in LaTeX emulation (with gellmu-regular-sgml set to nil) the GELLMU syntactic translator reserves the characters "[" and "]" for LaTeX-like command options and an attribute list is introduced as an option immediately after the command name inside which the first character is the character ":".

Last change: 10-Jan-2006