%!latex-faq
\documenttype{article}
\latexcommand{\bsl;tolerance=999}
\newcommand{\gurl}{http://www.albany.edu/\tld;hammond/gellmu}
\newcommand{\murl}{http://math.albany.edu:8000/math/pers/hammond}
% This is a way to have appendices come up automatically
% with A, B, C, ... as sectional unit identifiers. The optional
% first argument of \appendix is a label key.
\newcommand{\appendix}[2][]{%
\section[][\label[:series="appdx"]{}Appendix %
][\series[:type="A"]{\evalref{\popkey}}]{\label{#1}#2}}
\newcommand{\appref}[1]{\ref{#1}}
\newcommand{\bmref}[2][]{%
\href{\murl/#1}{#2}}
\newcommand{\gref}[2][]{%
\href{\gurl/#1}{#2}}
\newcommand{\qhref}[3]{\term{\href{#1}{\quostr{#2}}}\desc{#3}}
\newcommand{\gpl}{%
\href{http://www.gnu.org/copyleft/}{\gnu General Public License}}
\newcommand{\siref}[2]{\iref{#1}{#2} (section \ref{#1})}
\newcommand{\sapref}[2]{\iref{#1}{#2} (appendix \ref{#1})}
\newcommand{\iref}[2]{\anch[iref="#1"]{#2}}
\newcommand{\upanch}[1]{\href{../#1}{\path{#1}}}
\newcommand{\dfitem}[1]{\term{#1}\desc\ \ }
\newcommand{\argopt}{\emph{argument}/\emph{option}}
\newcommand{\css}{\abbr{CSS}}
\newcommand{\ddt}{didactic document type}
\newcommand{\dps}{didactic production system}
\newcommand{\dtd}{\abbr{DTD}}
\newcommand{\dte}[1]{\term{\genv{#1}}\desc }
\newcommand{\genv}[1]{\quostr{GELLMU\_#1}}
\newcommand{\dvi}{\abbr{DVI}}
\newcommand{\elisp}{\softw{Elisp}}
\newcommand{\Emacs}{\abbr{GNU} \softw{Emacs}}
\newcommand{\emacs}{\softw{Emacs}}
\newcommand{\gellmu}{\abbr{GELL\-MU}}
\newcommand{\gnu}{\abbr{GNU}}
\newcommand{\gst}{\gellmu syntactic translator}
\newcommand{\html}{\abbr{HTML}}
\newcommand{\ll}{\latex;-like}
\newcommand{\mathml}{\abbr{MathML}}
\newcommand{\pdf}{\abbr{PDF}}
\newcommand{\perl}{\softw{Perl}}
\newcommand{\sgml}{\abbr{SGML}}
\newcommand{\st}{syntactic translator}
\newcommand{\texinfo}{\softw{Texinfo}}
\newcommand{\w3c}{\abbr{W3C}}
\newcommand{\wg}{\abbr{WG}}
\newcommand{\xhtml}{\abbr{XHTML}}
\newcommand{\mxhtml}{\xhtml+\mathml}
\newcommand{\xml}{\abbr{XML}}
\newcommand{\urib}{http://www.albany.edu/\tld;hammond/gellmu}
\newcommand{\href}[2]{\anch[href="#1"]{#2}}
\newcommand{\lsl}[2]{%
\latexcommand{\bsl;setlength\{\bsl;#1\}\{#2\}}}
\latexcommand{\bsl;hyphenation\{gell-mu meta-com-mand new-com-mand
Meg-gin-son doc-type doc-u-ment-type key-list mul-ti-byte
alpha-num-er-ic\}}
\surtitle{The GELLMU Manual}
\title{The GELLMU Manual}
\subtitle{for the current development version}
\author{William F. Hammond}
\address{Dept. of Mathematics \& Statistics\\
University at Albany\\
Albany, New York \ 12222 \ (USA)}
\email{hammond@math.albany.edu}
\date{Revised: \today}
\copynotice{Copyright \copyright; 2001--2014 William F. Hammond}
\compacttitle
\nobanner
\begin{document}
\begin{abstract}
This is the manual for Generalized Extensible \latex;-Like
Markup (\gellmu). The central focus in the \gellmu project is to tie
\latex to the worlds of \sgml and \xml by providing \ll markup
for writing documents under \sgml and \xml vocabularies (formally
known as document types).
The Manual explains the distinction between \emph{basic} and
\emph{advanced} use, provides a description of \emph{regular}
\gellmu as an instance of advanced \gellmu, and discusses
the use of the \dps, which is the project's suite of
processors for working with regular \gellmu.
The Manual also deals with the metacommands available when
writing \gellmu markup. One of these metacommands is the project's
emulation of \latex's \emph{newcommand}, which makes it possible to
have macros taking multiple arguments while writing for an \sgml or
\xml vocabulary.
\end{abstract}
\tableofcontents
\section[][\label{intro}]{Introduction}
\gellmu is an acronym for \quophrase{Generalized Extensible
\latex;-Like MarkUp}, which is the author's concept for using \ll
markup to write consciously for \sgml document types such as \html,
\abbr{DocBook}, \abbr{TEI}, or \gellmu's own didactic \ll document
type called \emph{article}.
It evolved from earlier thought about delineation of a coherent subset
of \latex; commands with the property that if a \latex; document used
only those commands then it could be translated with full reliability
to other formats including \html so that documents could be prepared
both for print and for the web from a single source.
Problems with this early thought during the years 1996--1997 included
the fact that there did not seem to be a community of \latex users
willing to focus on a narrow vocabulary and the fact then of a legacy
practice that mixed \latex commands freely with non-\latex \tex
commands.
The present idea was crystalized in the summer of 1998 while the author
was looking at Ulrich Vieth's \latex; markup for the \tex; Directory
System (\abbr{TDS}) specification from the \tex; User Group
(\abbr{TUG}), which now is physically realized in \abbr{TUG}'s
\tex;Live series of \tex;-related software distributions on
\abbr{CDrom}. The \html version of that specification was derived
through an intermediate ad hoc translation from \latex; to
\texinfo, the language of the \gnu Documentation System,
which is a robust hypertext system, pre-dating \html, that is driven by \tex;
the Program.
In thinking about generalizing Vieth's ad hoc translation, which used
\Emacs Lisp (\elisp), one of the most widely available
\href{http://www.gnu.org/}{free} cross-platform programming languages
for which there is a free robust engine, the same engine that
underlies the interactive editing interface of \emacs, the author
realized that the structure of \texinfo is very much like that of an
authoring level \sgml document type. From that idea it was a small
step to decide that one might profitably write \elisp code to use \ll
markup for the conscious writing of a new \ll \emph{article} document
type.
The idea, which by itself saves keystrokes, gains significant power
with the emulation of \latex;'s \emph{newcommand}. Although \sgml
markup offers both subdocument inclusions and macro substitutions
using the notion of \emph{entity}, there is no \sgml-standard macro
facility that takes arguments. Moreover, \gellmu's \emph{newcommand},
which is, unlike \latex;'s, a simple macro substitution facility,
also provides a base for experimenting with document type extensions
in a way that the \sgml notion of \emph{entity} does not since \sgml
entities are invoked with a notation that is apart from that used to
markup with \sgml \emph{elements}, which are the objects corresponding
in the \gellmu scheme with \ll \emph{commands}.
Software associated with the \gellmu project falls into two parts:
\begin{enumerate}
\item \bold{The \st.} \ This is a purely syntactic layer for converting
\ll markup in configurable ways to \sgml markup. Its output may
be handled in standard \sgml or \xml systems.
\item \bold{The \dps.} \ This component is a sketch, which might usefully
serve as a base for further development, of an \sgml production
system for an authoring environment that consists of
\begin{enumerate}
\item An \sgml document type called \emph{article} that is
accompanied by a corresponding \xml document type.
\item A package of extensible translating utilities written
in the language \softw{Perl}.
\end{enumerate}
As its name suggests, these materials are didactic and
should be regarded as unfinished for production work.
\end{enumerate}
There are two overall concepts: \emph{basic} mode and \emph{advanced}
mode. The basic mode may be used to write consciously for any
standard document type such as \html, \abbr{DocBook}, or \abbr{TEI},
and the \st is for that mode the only software under this project that
might be relevant. The advanced mode incorporates a configurable
broader array of \ll markup features, mostly for brevity, in the \st.
This mode is fully developed only for use with a \ll document type
such as the didactic \emph{article} document type that is part of the
\dps.
One may use any \sgml or \xml processing framework in working
with the \emph{article} document type. The \dps includes what is
needed to produce both \html and regular \latex forms of an
\emph{article} instance. Consequently, one is able to produce both
\abbr{DVI} using the \latex; format for \tex;, the program, and \pdf
using the \latex; format for \pdf\tex;, the program. Moreover, one
may tune the \pdf in various ways using small alterations of the
\softw{Perl} code for translating the \xml version of \emph{article}
to regular \latex;.
\section[][\label{basic}]{Basic \gellmu}
Neither the basic nor the advanced mode involves in any way adoption
of the language of \latex. (But many command names under the didactic
\emph{article} document type, mimic \latex; command names.) There are
two fundamental ideas:
\begin{enumerate}
\item A \ll command \qquostr{\bsl;foo} corresponds
to an \sgml element \qquostr{\ltc;foo\gtc;}.
\item The syntactic translator is almost entirely ignorant of vocabulary
and a name like \emph{foo} need not have meaning in it although
it must have meaning in the document type for which one is
consciously writing.\footnote{
The \st does, however, have some facilities for
classifying the names in a list in regard to common syntactic
behavior. See, in particular, the \elisp variables
\emph{gellmu-autoclose-list} and \emph{gellmu-parb-hold}, both
of which are not significant in basic \gellmu.}
\end{enumerate}
To use the basic mode one must be familiar with the \sgml document
type for which one is writing. Ordinary \html is an example. Very
few of the features in the advanced markup not also part of the basic
markup\footnote{
With several minor exceptions, one related to the direct writing of
\sgml attributes (which cannot contain markup and which do not have
many parallels in \latex;) and another related to the way of escaping
the character \quochar{\bsl;}, everything about basic mode also
applies to advanced mode.}
make any sense for use in the direct preparation of \html with \ll
input.
\gellmu uses \latex; special characters such as \quochar{\bsl;},
\quochar{\{}, and \quochar{\}} along with \ll argument/option syntax,
where braces immediately following a command name indicate command
arguments and square brackets, i.e., \quochar{\lsb;} and
\quochar{\rsb;}, indicate command options. A command corresponds to
an \sgml element, and in basic mode a command may have at most one
argument, the content of which corresponds to \sgml element content,
and at most one option, the content of which corresponds to a list of
\sgml attribute specifications. Thus for example, in basic mode for
\html one may use the markup
\display{|\a[href="http://www.w3.org/"]{The World Wide Web Consortium}|}
to form the \html anchor:
\display{|The World Wide Web Consortium|
\eos}
(The formation of anchors with the didactic \emph{article} document
type in advanced mode is slightly more complicated because the
characters \quochar{=} and \quochar{/}, which may acquire special (and
\quophrase{overloaded}) semantic significance in mathematical
contexts, are held for delayed evaluation as empty elements and
because the \st, which does not recognize command names, regards this
usage in advanced mode as \emph{multiple}
\siref{argopt}{argument/option syntax}, which is not part of basic
mode.)
An example of the distinction between basic and advanced \gellmu is that
in advanced mode it is possible and easy to arrange to have a blank line,
as in \latex;, represent the beginning of a paragraph. In basic mode
for \html one
must\footnote{
There is a way, with the setting of several
variables for the \st in advanced mode, to have blank lines begin
new paragraphs in basic input for \html}
use \qquostr{\bsl;p} to begin a paragraph, and for the \xml version of
\html one must also provide markup for the end of every paragraph, which
may be done in several ways.
For some of the details on using the basic markup with \html see
\href{ghtml.html}{\emph{Using the \gellmu Syntactic Translator to Write \html}}.
It will be instructive to have the parallel \href{ghtml.glm}{source
markup} available at the same time.
\section[][\label{meta}]{Metacommands}
A metacommand is a \ll command that does not correspond to an
\sgml element. Each metacommand is handled internally by the
\st.
\subsection[][\label{doctype}]{\bsl;documenttype}
A document prepared in \gellmu source usually begins with a
\emph{documenttype} command. For example,
\display{|\documenttype{html}|}
is used to begin a document prepared for the most common form
of classical \html.
The syntactic translator has two public variables
\emph{gellmu-doctype-keylist} and \emph{gellmu-doctype-info},
which are \elisp associative lists, that enable one to match \xml
or \sgml \qquostr{} declarations with
\latex;-like\brs;
\display{|\documenttype[|\emph{my-optional-key}|]{|\emph{my-doctype}|}|}
commands, where \emph{my-optional-key} is available to override a
default key for \emph{my-doctype}. Thus, for example,\brs;
``|\documenttype{html}|'' points to the default key for
\qquostr{html}, which is \qquostr{html-4.01} and which points to
the \softw{W3C HTML 4.01 Transitional} document type, while
\display{|\documenttype[xhtml-1.0s]{html}|}
indicates \softw{W3C XHTML 1.0 Strict}.
A user may configure these variables without modifying the source
code for the \gst;, but minimal knowledge of \elisp will
be required. A future release might provide a configuration file
for this purpose.
A second option for the \emph{documenttype} metacommand, which must
follow the single required argument, is provided for writing an
internal declaration subset. The contents of an internal declaration
subset constructed this way may be any internal declaration subset
material. However, some care is required for entering characters
that are special. To ease the handling of special characters four
metacommands have been provided for use inside the internal declaration
subset:
\begin{menu}
\item \quostr{\bsl;attlist\{...\}}
\item \quostr{\bsl;element\{...\}}
\item \quostr{\bsl;entity\{...\}}
\item \quostr{\bsl;notation\{...\}}
\end{menu}
For example, a user who wishes to be able in source to use
``|&quo;|'' to reference the \abbr{ASCII} quotation mark
when writing consciously in basic mode for \abbr{TEI.2} would begin
the source file with:
\begin{verbatim}
\documenttype{TEI.2}[
\entity{quo """}
]
\end{verbatim}
\subsection[][\label{newcommand}]{\bsl;newcommand}
From a user's viewpoint this provides emulation of \latex;'s
\emph{newcommand}. But it is a simple facility providing macro
substitution with arguments forward in a source file from the
point of its occurrence. It differs from the \emph{newcommand}
facility in \latex; in that it does not add the name
of a newcommand to any namespace. Instead, as the \st encounters
each newcommand definition, it performs the corresponding expansions
and then forgets the name.\footnote{
This means that it is a relatively slow form of processing, but
the author believes that it is a good match for the intuitive
expectations of most \latex; users.
}
The general construction of a newcommand definition is
\display{
|\newcommand{|\emph{name}|}[|\emph{nargs}|][|\emph{first}|]{|\emph{value}|}|}
where \emph{name} is the name of the newcommand,
\emph{nargs} optionally specifies the number of its arguments,
\emph{first} is an optionally-provided default value for the first
argument, and \emph{value} denotes the value string.
In the value string the character \quochar{\#} is used to reference an
argument by the numeric value of its position. Thus, \qquostr{\#1}
refers to the first argument that is provided at an invocation of
the newcommand, \qquostr{\#2} the second, etc., and there is no limit
on the number of arguments. It is not required that \emph{nargs} be
supplied in order to define and use a newcommand taking
arguments. However, for the sake of automatic error checking the
use of \emph{nargs} is strongly recommended when the definition of
a newcommand taking arguments is entered.
\bold{Example.}\ In writing \html one might use
\begin{verbatim}
\newcommand{\href}[2][http://www.w3.org/]{\a[href="#1"]{#2}}
\end{verbatim}
for brevity in writing many \html anchors. With this definition the
invocation
\display{|\href{http://www.myweb.mydomain/me.html}{my web page}|} gives
rise to the \html markup
\display{|my web page|}
while the invocation
``|\href{Web Central}|'' gives rise to
\display{|Web Central|\ \eos}
\bold{Rules.}
\begin{enumerate}
\item The name of a newcommand may not be referenced in its
value string.\footnote{
In a future release an alternative metacommand
called \emph{frontcommand} may be provided which could be used, for
example, if one wishes to have a macro name of some kind match the
name of an actual \sgml element.
}
\item A newcommand may not be invoked before it is defined unless
the invocation occurs in the value string of another newcommand
definition in which case the definition of the latter may be first and
\bold{must} be first if the invocation of the former in the definition
of the latter involves argument substitutions.
\end{enumerate}
Failure to observe these rules may cause the \st to enter an infinite
recursive loop. If a user suspects this may have happened, then the
invocation of the \st should be \siref{interrupt}{interrupted}.
\subsection[][\label{beginend}]{\bsl;begin\{\} \ldots \bsl;end\{\}}
These provide emulation of \latex; environment notation without
actually providing anything that is not otherwise available. Markup
which resembles that for a \latex; environment simply resolves to an
\sgml element. This usage may be convenient for \sgml elements of
large scope such as, for example, the \emph{body} of an \html
document.
With advanced mode the special form
\display{|\begin{document} . . . \end{document}|}
may be used to emulate the corresponding feature of \latex; for a
document type, such as the didactic \emph{article} document type,
that in the large consists of a preamble and a body.
\subsection[][\label{macmac}]{\bsl;macro and \bsl;Macro}
Use of these is discouraged in the absence of a need. One situation
that presents a need is name \quophrase{fronting}: see the
discussion below in \anch[iref="fronting"]{section \ref{fronting}}.
Please note that in most situations one may use \emph{newcommand}
without an argument for simple macro substitutions.
\emph{macro} and \emph{Macro} do very much the same thing except
for the order of expansions. Every \emph{macro} is expanded forward
as encountered before any newcommand definition is expanded. Every
\emph{Macro} is expanded forward after every newcommand
has been expanded.
There are four primary differences between \emph{macro} and \emph{Macro},
on the one hand, and \emph{newcommand}, on the other hand.
\begin{enumerate}
\item Neither \emph{macro} nor \emph{Macro} can be used to define a
macro that takes arguments.
\item The name of a newcommand must consist of word characters,
but there is no restriction on the characters, apart from unbalanced
brace characters (\quochar{\{} and \quochar{\}}), that may appear in
the name field of a \emph{macro} or \emph{Macro} metacommand.
\item If the name of a \emph{macro} or \emph{Macro} does not begin
with the command sequence introducer, i.e., the character \quochar{\bsl;},
then an invocation of that metacommand is given by every forward match
of its name. The use of such names is strongly discouraged because
document segments can then become opaque much too easily.
\item A newcommand invocation, absent the use of a semi-colon
for termination, is only effective at the whole word level --- with
\emph{word} here denoting a maximal string of successive word
characters --- whereas \emph{macro} and \emph{Macro} invocations are
effective regardless of word boundaries.
\end{enumerate}
Elaboration on the last point: If \quostr{x} is the name of a
newcommand, then invocations are
only considered by the \st on the string ``|\x|'' when it is followed
by a non-word character. For example, if the locale is \abbr{us-ascii},
then the word characters are the 52 upper and lower case letters
and the ten numerals. Therefore with \emph{newcommand}, as with its
namesake in regular \latex;, the use of \quophrase{x} as a name will
not intercept the occasions of \qquostr{\bsl;x} as an initial
substring of \qquostr{\bsl;xy}. With either \emph{macro} or
\emph{Macro} such interception does take place. One may block it at
the point of an invocation with the markup \qquostr{\bsl;x;}, and when
this is done, the terminating semi-colon is removed.
Human authors using either \emph{macro} or \emph{Macro} may find
unanticipated interactions between the three forms of macro
substitution.
Unbalanced brace characters, i.e., the characters \quochar{\{} and
\quochar{\}}, may not be used in the name field or the value field of
any form of macro metacommand.
\subsection[][\label{fronting}]{Macro-Level Fronting of \sgml Element Names}
The word \emph{fronting} as used here describes the practice of
modifying the meaning of an \sgml element name by using one or
more of the macro facilities to generate usage of the same name as
an element combined with other markup using the syntax that would
otherwise correspond to basic use of the element.
Suppose, for example, one wants all paragraphs in \html (marked
with |\p| in \gellmu source) to be placed in a (style) class called
\emph{custom}.
\bold{Recommended procedure:} Create a new unique name and then use
\emph{macro} to front it.
\begin{verbatim}
\newcommand{\cp}[1][]{\p[class="custom"]{#1}}
\macro{\p}{\cp}
\end{verbatim}
Each invocation of ``|\p{...}|'' will first be replaced by ``|\cp{...}|''
because all \emph{macro} definitions are expanded before any
newcommand definition is expanded. Subsequent expansion
of the newcommand will yield
\display{|\p[class="custom"]{...}|\ \eos}
This will not intercept the alternate, otherwise nearly equivalent,
markup given with |\begin{p}| \ldots |\end{p}| since \emph{newcommand}
is based on simple macro substitution and does not operate at the
level of namespaces.
\section[][\label{advanced}]{Advanced \gellmu}
The idea with advanced \gellmu is that for \sgml document
types sharing structural characteristics with \latex; one might
wish to have the \st provide \latex;-like markup syntax beyond the level
used with basic \gellmu and that these additional layers of syntax
should be configurable. The only substantial realization of this
program so far is the case of the \gellmu \ddt called \emph{article}.
The specifics of that realization are discussed in the following
section.
\subsection[][\label{illustrations}]{Illustrations.}
One might want to be able to use blank lines, as in
\latex;, for introducing new paragraphs in a document type that
provides paragraphs.
In some article-level document types each sectional unit has a unit
header providing markup for various, often optional, unit descriptors.
It is convenient to be able to use \latex;-like multiple
\siref{argopt}{argument/option syntax} for these.
If the document type provides authoring-level mathematical markup
beyond inclusions of the World Wide Web Consortium's
\href{http://www.w3.org/Math/}{Mathematical Markup Language} (\mathml)
under its \href{http://www.w3.org/TR/REC-xml-names}{\xml namespace}
regime, then one might want to be able to use the \quochar{\$} character
to toggle in and out of inline math, and if the document provides
for math displays, then one might want to use, as in \latex;, the
strings \qquostr{\bsl;[} and \qquostr{\bsl;]} as delimiters for
unstructured mathematical displays, and markup such as
\display{|\begin{equation} . . . \end{equation}|}
for a single equation, and markup such as
\display{|\begin{eqnarray} . . . \end{eqnarray}|}
for a list of equations.
It is important to emphasize, however, that by overall system design
the \st operates without substantial knowledge of vocabulary. While
it is true that if \quochar{\$} is to provide a toggle for an inline
element containing math, say, \emph{tmath}, then the \st needs to have
that association made, but the association is provided as the
value of a configuration variable in the \st that can be changed
between documents so that the \st may be used with many document
types.
One way to make such configuration convenient is to use an array of
\elisp functions that are fronts with various variable configuration
packages for the basic function \emph{gellmu-trans} in the \st.
The general outline for advanced \gellmu with arbitrary document types
is not fully developed in the present release. Instead the project
has concentrated on the realization of these ideas for the project's
didactic \emph{article} document type, which is the subject of the
next section.
As the \st stands now, basic mode is characterized in the \st by the
true setting for its Boolean variable \emph{gellmu-straight-sgml},
while the configuration used by default for the \ddt (which
\emph{could} be handled in basic mode with more verbose source markup)
has that variable set false and also the variable
\emph{gellmu-regular-sgml} set false.
The term \label{regular}\emph{regular} \gellmu refers to use of the
\st with the default configuration for the \ddt. It involves nearly
maximal emulation of \ll markup; it implies both \emph{advanced} mode
and the \siref{ddt}{\ddt} \emph{article}.
\subsection[][\label{argopt}]{Multiple Argument/Option Syntax}
An essential point in the present design is that the whole system
is built from components, each of which has its own
function\footnote{In the prototype production system based on
the didactic \emph{article} document type the output from each
stage is available for examination and, where necessary, intervention.
However, such use of intervention is intended only for temporary
expedient use while a \gellmu system is being designed or enhanced.
As with \latex;, enhancement is an ongoing process.
}. % end of footnote
Consistent with this design the \st operates with knowledge of syntax
but little or no knowledge of language.
Multiple argument/option syntax has been built into advanced mode as
part of the overall idea of providing, where sensible, \ll
features in a precise user markup interface for writing in document
types under \sgml and \xml.
What are the rules for converting the multiple argument/option syntax
in source markup into \sgml? Direct conversion by the \st of this
type of usage into \xml is not available because such conversion
requires some language knowledge and the program does not operate with
knowledge of language at that level\footnote{
In handling \gellmu source markup one could provide
a more elaborate processor that can be configured to know for each
such command the list of names for its positional arguments and
options. It was decided that this goes somewhat beyond syntactic
handling but that the question of whether a list of arguments and options
corresponds to sole content might be regarded as a syntactic matter.
}.
One obtains an \xml version of a
document in the \dps by using a translator with
minimal knowledge of the command vocabulary to create the \xml version
from an \sgml version that is the immediate output of the \st.
In multiple argument/option syntax, which is much like that of
\latex;, arguments and options follow command names. Arguments are
delimited by braces, i.e., \quochar{\{} and \quochar{\}} and options
by square brackets, i.e., \quochar{\lsb;} and \quochar{\rsb;}. There
must be no white space between the arguments and options nor between
the command name and the first member of an argument/option sequence.
Each command with a multiple argument/option sequence is translated to
an open tag whose name is the name of the command. Each argument is
translated to an \emph{ag0} element and each option to an \emph{op0}
element. (Both \emph{ag0} and \emph{op0} lie in \gellmu's reserved
name space.) There are two exceptional cases.
\begin{enumerate}
\item The first argument or option is an option inside which the very
first character is a colon, i.e., \quochar{:}. This is the method
provided in advanced mode for the direct entry of an \sgml attribute
sequence.\footnote{
Its use is optionally permitted in basic mode as well.
}
The entire contents of the option string, apart from the
leading \quochar{:}, which is discarded, are understood to be a
sequence of \sgml attributes for the \sgml element whose name is the
name of the command. There is no syntax check of the attribute
contents by the \st. Such an \emph{attribute option} is not treated
as an \emph{op0} element. In particular, an attribute option is
correctly followed immediately by a semi-colon, i.e., the character
\quochar{;}, if and only if the corresponding \sgml element is a
defined-empty element under the \sgml document type. Since \sgml
attributes correspond to very little of classical \latex;\footnote{
Indeed, \latex; usage allows markup in options, but (element level)
markup is not permitted in \sgml attributes. Note, however, that
in the \ddt the \sgml content model for an option is more restrictive
than that for an argument. Also in the \ddt some options, such as
that for \emph{anch}, which is described later in this section, are
practically required.
},
attribute options are seldom used\footnote{
To say \emph{seldom} is not to say \emph{not}. Two important
instances in the \dps are the \emph{series} attribute for the
\emph{label} command, which stands in, to the extent possible, for the
notion of \emph{counter} in \latex;, and the \emph{type} attribute for
the \emph{series} command, which provides emulation of counter
conversion from, say, number values to letter values.
}
in the \dps. One such use is for the \gellmu equivalent of latex's
\emph{equation*} and \emph{eqnarray*} environments, which is marked up
this way:
\begin{verbatim}
\begin{equation}[:nonum="true"]
e = mc^2
\end{equation}
\end{verbatim}
to produce:
\begin{equation}[:nonum="true"]
e = mc^2
\end{equation}
\item The first argument is the only argument and there are no options
apart from a possible attribute option. This case, which is extremely
common, is exceptional relative to argument/option handling
since the sole argument simply becomes element content without an
\emph{ag0} wrapper.
\end{enumerate}
When a command has a multiple argument/option sequence, the question
arises whether the \emph{ag0} and \emph{op0} elements that arise from
the arguments and options are the only content of the element
corresponding to the command. The syntax does not provide a way to
determine this. On the other hand, the \sgml document type definition
does provide information that indicates whether other content is
possible. It is beyond the scope of the design of the \st for the
\st to read a document type definition. The \st does, however,
have a configurable list variable \emph{gellmu-autoclose-list}
that contains the names of elements
for which no content beyond the elements arising from arguments and
options is possible. While it is not necessary that every such
command be entered in this list, when such a command not in the list
is not explicitly followed by an element closing command, it is
possible in some instances for an \sgml parser to infer incorrectly
the location of end of the element. Thus, the \dps provides a
command \emph{anch} for making anchors. The document type definition
provides for one option, a reference, and one argument, the anchored
text.\footnote{
In the \xml version of \emph{article} the option becomes the element
\emph{anchref} and the anchored text becomes the element \emph{anchv}.
One may use these names directly in \gellmu source, but option/argument
notation is more familiar and more succinct.
}
Because the \st does not consider the document type definition,
if one enters the markup
\display{|\anch[href="http://www.w3.org/"]{W3C} HQ| \ ,}
unless the name \emph{anch} is in the list\footnote{There was no list
of this type in early pre-release versions of the \st.
}
\emph{gellmu-autoclose-list},
an \sgml parser will not have reason to close the \emph{anch} until
it sees the space following the anchored text \quophrase{W3C}, and so
that space will be considered insignificant white space with the result
that there will be no space between the anchored text and the following
\quophrase{HQ}.
\subsection[][\label{xmllimit}]{Limitation in Regard to \xml}
A final general comment about advanced mode is that the features it
can provide beyond basic mode when one is writing consciously for an
\xml document type are somewhat limited. For example, blank lines
cannot easily be made adequate for paragraph markup with the \xml form
of the didactic \emph{article} document type. Although it is not a
specific limitation for future editions of the \st, the vision is that
use of advanced mode will be specifically for a somewhat rich \sgml
version of a document type.
\section[][\label{ddt}]{The Didactic Document Type}
The \ddt is the document type underlying what is called regular
\gellmu. It is the heart of the idea of \gellmu as a bridge for authors
from \latex; to the world of \xml. More specifically, the bridge is
from the world of a \latex; \emph{article} to a document type in the
world of \xml, also called \emph{article}, that has a structure and a
vocabulary similar to those of the \latex; document class.\footnote{
The \latex; concept of document class has only a loose correspondence
with the \sgml concept of document type.
}
The techniques used in the \dps are extensible and can be carried over
to other types of documents than articles. It is important to note
that there are many features in regular \latex; which have no analogue
so far in the development of this project. One might hope to get an
idea of the extent of coverage by reviewing the examples in the
\sapref{archive}{project archive}.
When an author prepares a document as a \latex; \emph{article}, the
document is being marked up as data for a specific typesetting
program: Donald Knuth's program \tex; running with the main \latex;
facilities loaded.
When an author prepares a document as \gellmu source, the \st provides
a \latex;-like markup interface, but its output is not data for a
specific typesetting program. Rather it is data for a broad class of
processors. This means that multiple output formats can be obtained
from a single source without the need for human intervention because
\xml provides a framework that makes it relatively easy to create
reliable programs for translation from an \xml document type to other
formats. It offers, moreover, the possibility of translation to
future formats free of any need for human intervention once
translators from the original document type to such formats are
written. The small price one pays for this advantage in moving from
\latex; markup to \gellmu markup is that the author must learn a few
new things.\footnote{
It is not inconceivable that at some future point conscious writing
for some \xml document types using \latex;-like markup might be
subsumed in the \latex; project, but in saying this, the author is
neither predicting it nor assessing the merits of the idea. He has no
affiliation with the \latex; project other than as a user.
}
There are two formal constructions of the \ddt. The name of the
document type is \emph{article}. The first construction is an \sgml
version of \emph{article} that provides features convenient for
authors that are not available under \xml. The second is an \xml
version. For most non-technical purposes the two constructions should
be regarded as equivalent.
The \sgml construction of an \emph{article} is derived from \gellmu
source markup for a document by using the \st. The didactic document
type is accompanied by a translator implemented under the Perl
language framework \softw{sgmlspl} by David Megginson (see
the release notes in \anch[iref="release"]{appendix \appref{release}}
for more information) for converting the \sgml version of an
\emph{article} to the \xml version.
The description in this section of the manual deals primarily with
source level markup for the \ddt and with how it is handled \footnote{
Strict discussion of an \sgml document type would not allow use of the
word \emph{handled}. In this instance a coordinated pair of document
types is being described, one \sgml and the other an \xml translation.
For most purposes the \sgml document type is the richer of the two.
However, because of its use of a handful of generic elements (in its
reserved namespace consisting of names that contain \quochar{0} (zero)
as first numeric character) for modeling certain convenience features
of \latex;, it is possible for a correct translation of a valid \sgml
article to yield an \xml version that fails validation because the
content models of the generic elements are necessarily loose.
}
by the time the \xml version of an article is generated. Secondarily
there is comment on how it is rendered in the chief output formats
of the \dps, which are regular \html\footnote{\html, version 5, which,
as of March 2011, is supported by the ``big four'' web browsers}
with math rendering facilitated by \softw{MathJax}\footnote{\softw{MathJax},
which was jointly developed by the AMS, SIAM, and Design Science, with
support from other organizations, imposes no requirement on the user
other than that of having a current web browser on a screen of sufficient
size.},
\pdf, \mxhtml, and terminal window \html (for limited screens).
A quick glance at the \siref{flow}{flowchart} shows that the first
\xml stage --- author-level \xml --- may be viewed as a second entry
point to \dps processing. Some day this could become a reasonable
formatting route for translations from things like \softw{Texinfo},
\softw{DocBook}, and, even perhaps, classical \latex itself via a
processor such as \softw{tex4ht}.
\subsection[][\label{caveat}]{Suggestions and Caveats}
Although this is the manual for a software release, it is not a book.
A document of book size would be required for a full description of
the \dps.
Much of the markup vocabulary is copied from \latex;. There are some
instances where there is some deviation from \latex; usage, and many
of those instances are mentioned here.
Definitive information about the \ddt may be derived by consulting the
document type definition. Because the \dps is conceived as a base for
future development there are sketches in the document type definition
that are not covered by the didactic processors. For example,
although there is sketched code for the analogues of \latex;'s
\emph{paragraph} and \emph{subparagraph} commands, which are sectional
in nature, that is found in the translation from \sgml to \xml, there is
no sketched code for these elements in the two formatters.
Another way to obtain information about the \dps is by studying
examples including this manual and the examples in the
\sapref{archive}{project archive}.
\subsection[][\label{mufund}]{Markup Fundamentals}
There are several kinds of commands:
\subsubsection[][\label{explicitnames}]{Explicitly named commands.}
Apart from macro level
metacommands an explicitly named command begins with \bold{a maximal
string introduced by the character} \quochar{\bsl;} \bold{followed by
word characters, including the numerals} \quochar{0}, \quochar{1},
\ldots, \quochar{9}. The notion of \emph{word character} depends on
one's locale, a concept that is formalized in \Emacs. In the
\abbr{ASCII} character set the word characters are the 52 upper and
lower case letters and the 10 numerals. \bold{The first numeral, if
any, must not be \quochar{0}} since such names are reserved for use by
the \st. Command names are case sensitive.
An explicitly named command is a \emph{container}, corresponding to an
\sgml \emph{element}, if its name is immediately followed, without
intervening white space, by the character \quochar{\{}. In that case
the delimited zone of containment normally ends with the subsequent
balancing character \quochar{\}}. (\latex;\hyp;like
\siref{argopt}{multiple argument/option} chains deserve more
discussion; for now it will suffice to point out that the use of the
|\anch| command in this document for making \quophrase{anchors} is an
example, and, of course, \latex;'s |\frac| command is another example.
For the present discussion these commands are considered to be
containers.)
An explicitly named command corresponds to an \sgml
\emph{defined-empty element} if its name is immediately followed,
without intervening white space, by the character \quochar{;}.
An explicitly named command corresponds to an \sgml element closing
tag if its name is immediately followed, without intervening
white space, by the character \quochar{:}.
The name of an explicitly named command is terminated by a non-word
character. There is a small, possibly acceptable, level of syntactic
ambiguity unless the name terminator is one of \quochar{\{},
\quochar{;}, \quochar{:}, or \quochar{[}.
In basic mode if the name terminator is \quochar{[}, then that
character introduces a list of \sgml attribute specifications, each of
the form \emph{name}\eqc\quo\emph{value}\quo;, and the list must be
terminated by the character \quochar{]}. Then if the following
character is \quochar{\{}, the named command is a container that ends
with the balancing \quochar{\}}. Otherwise the following character
may be \quochar{;} if the named command is a defined-empty element and
must be so in that case for direct editing of an \xml document type.
In advanced mode if the name terminator is \quochar{[}, then that
character introduces a \latex;-like command option --- part of the
emulation of \latex;'s multiple \siref{argopt}{argument/option syntax}
--- unless it is immediately followed, without intervening white space,
by the character \quochar{:}, in which case the bracketed content is
a list of \sgml attribute specifications. (The initial \quochar{:},
which may be used optionally in basic mode, is discarded.)
In any other case there is some syntactical ambiguity. The \st will
produce a corresponding \sgml open tag unless the logical variable
\emph{gellmu-xml-strict} has been set.\footnote{ The variable
\emph{gellmu-xml-strict} is by default unset in advanced mode. } If
the usage is consistent with the structure of the document type, an
\sgml parser will in many cases be able to handle the result
correctly. The result of this type of syntactic ambiguity in source
markup is not tolerated if one is editing directly for an \xml document
type. The terminator can be a blank space, but, if so, the
blank space is likely to become invisible after \sgml parsing much
in the way that in \latex; the markup
\display{|\LaTeX document|}
will be collapsed into the single word form \quophrase{\latex;document}
when typeset.\footnote{
The correct \latex; markup is ``|\LaTeX{} document|''. In the \dps the
name of \latex; is \quophrase{latex}, which is a defined-empty
element, that for the \sgml version of \emph{article} may be marked up
safely either as ``|\latex;|'' or as ``|\latex{}|''.
}
\subsubsection[][\label{single}]{Certain single characters.}
The characters
\quochar{\bsl;}, \quochar{\{}, \quochar{\}}, \quochar{\crt;},
\quochar{\und;}, \quochar{\$}, \quochar{\%}, and \quochar{\tld;}
have command meanings that are similar to their meanings in \latex.
The characters \quochar{;} and \quochar{:} are ordinary characters that
have special meaning at the end of a command name. The character
\quochar{\#} is also a special character used, as with \latex;, in
\emph{newcommand} templates. In source for the didactic
\emph{article} document type any non-alphanumeric \abbr{ASCII}
character may be escaped (referenced for itself) with a named command,
e.g. \quochar{\tld;} may be referenced for itself as |\tld;|. This
is \bold{necessary} in order to provide delayed evaluation for
ultimate translation to one of many possible ultimate formats.
The following language meanings apply to both basic and advanced
markup:
\begin{enumerate}
\item ``|\|'': \bold{Command introducer.}\\
Escape in basic mode: |\\|\ . This escape is
incorrect in advanced mode since this notation has a different
meaning --- forced linebreak --- in \latex; itself. For the
didactic \emph{article}
document type the escape is |\bsl;|\ . For other document
types one may resort to an entity reference
if adverse to providing a corresponding
defined-empty element or if one lacks control of the document type.
\item ``|{|'': \bold{Command argument opener.}\\
Escape: |\{| or |\lbr;|\ .
\item ``|}|'': \bold{Command argument closer.}\\
Escape: |\}| or |\rbr;|\ .
\item ``|[|'': \bold{Command option opener.}\\
Escape: |\lsb;| (usually not necessary\footnote{
In math \quochar{[} sometimes needs to be escaped to prevent confusion
between its markup use and its ordinary use in an instance such as
the markup for $\mathbb{Z}\lsb;t\rsb;$\aos The \st would need to know
vocabulary --- at least the argument/option pattern for
\emph{mathbb} --- in order to elude the syntactic ambiguity.
}).
\item ``|]|'': \bold{Command option closer.}\\
Escape: |\rsb;|\ .
\item ``|;|'': \bold{Command terminator for defined-empty tags.}\\
Escape: |\scl;|\ .
Usually an ordinary character.
Its use as a command terminator is invisible and may be
omitted optionally in some contexts. This syntax is analogous to
the use of \quochar{;} as an entity reference terminator in \sgml.
\item ``|:|'': \bold{Command terminator for close tags.}\\
Escape: |\cln;|\ .
Otherwise an ordinary character.
Its use as a command terminator is invisible.
\item ``|%|'': \bold{Comment introducer, in force to end of line.}\\
Escape: |\%| or |\pct;|\ .
\item ``|#|'': \bold{Argument marker in \emph{newcommand} definitions.}\\
Escape: |\#| or |\hsh;|\ .
In the definition of a newcommand ``|#1|'' indicates the first
invocation argument, ``|#2|'' the second invocation argument, etc.
(There is no limit on the number of arguments.)
\end{enumerate}
The following language meanings apply to advanced markup with
allusion to the didactic \emph{article} document type.
\begin{enumerate}
\item ``|~|'': \bold{Non-breaking interword space.}\\
Escape: |\tld;|.\footnote{
``|\~|'' is an example of a markup string that is defined in \latex; (for
an accent) that is not defined in the \ddt. A \latex; user may recover
a prior markup habit of this type using \emph{newcommand} possibly in
combination with \emph{macro}. For more information see the
\siref{accents}{discussion of accents}.
}\\
Equivalent: |\nbs;|, cf. | | in \html.
\item ``|^|'': \bold{Superscript command.}\\
Escape: |\crt;|\ .\\
Equivalent: |\sup| or, in math, |\msup|\ .
\item ``|_|'': \bold{Subscript command.}\\
Escape: |\_| or |\und;|\ .\\
Equivalent: |\sub| or, in math, |\msub|\ .
\item ``|&|'': \bold{Dual use: tabular cells and entity introducer.}\\
Escape: |\&| or |\amp;|\ .\\
\quochar{\amp;} introduces an entity reference if it is followed
by anything other than white space. It is used, as in \latex;, as
a \emph{tabular} cell delineator when it is followed by white space.
\item ``|$|'': \bold{Toggle inline math mode.}\\
Escape: |\$| or |\dol;|\ .\\
Equivalent: ``|\tmath{ . . . }|''. \\
Nearly equivalent: ``|\( . . . \)|''
or ``|\math{ . . . }|''.\footnote{
It is possible to merge the inline \emph{math} and \emph{tmath} zones
at any level of processing beyond the \st. These are indeed the same
in \latex, but the syntactic translator resists the temptation here to
go beyond syntax and merge them. With the didactic \emph{article}
document type the formatting to \latex; inserts the \latex; markup
\qquostr{\bsl;,} for a small horizontal space before and after
\emph{math}, but not before and after \emph{tmath}.
}
\end{enumerate}
\subsubsection[][\label{certainstrings}]{Certain strings}
These are strings of plain text with markup significance in the \ddt that
are part of markup in \latex;.
\begin{enumerate}
\item ``|--|'' \\ \bold{Short dash} as used with a range
of numbers, e.g., 1--2.\\
Equivalent: |\rdash;|\ .
\item ``|---|'' \\ \bold{Long dash} as used for
punctuation, e.g., a dash --- like this.\\
Equivalent: |\pdash;|\ .
\item ``|\ |'' \\ \bold{Blank interword space.}\\
Equivalent: |\spc;|\ .
\item ``|\,|'' \\ \bold{Small horizontal space.}\\
Equivalent: |\hsp;|\ .
\item ``|\\|'' \\ \bold{Forced line break.}\\
\qquostr{\bsl;\bsl;} may be used at the end of a line of
input for a forced line break. In a \emph{tabular} environment
(with the didactic \emph{article} document type, as in
\latex;) it begins a new \emph{tabular} row. Any other use is
deprecated, and will result in translation to the defined-empty
element \emph{bsl} corresponding to the \abbr{ASCII} character
\quochar{\bsl;} with a warning from the \st.\\
Equivalents: |\brk;| for a
line break outside of a \emph{tabular} environment.\footnote{
The \st simply outputs the \sgml defined-empty element
\emph{brk0}, which belongs to its reserved name space. The dual use of
\emph{brk0} involves some \sgml chicanery that is resolved during translation
to the \xml version of the \emph{article} document type,
where \emph{tabular} is converted to \emph{table} and non-tabular use of
\emph{brk0} is converted to \emph{brk}. See also the handling of
\emph{array}, which is different even though the source markup, as in
\latex; is similar.
}
\item Blank line.\\
\bold{Begin new paragraph command.}\\
Equivalent: |\parb|\ . \ \ Nearly equivalent: |\par|\ .
\item ``|``|'' \\
\bold{Left (double) quotation mark.}\\
Equivalent: |\ldq;|\ .
\item ``|''|'' \\
\bold{Right (double) quotation mark.}\\
Equivalent: |\rdq;|\ .
\item ``|\(|'' \\
\bold{Begin \emph{math mode} command.}\\
Equivalent: |\begin{math}|\ .
\item ``|\)|'' \\
\bold{End \emph{math mode} command.}\\
Equivalent: |\end{math}|\ .
\item ``|\[|'' \\
\bold{Begin \emph{displaymath mode} command.}\\
Equivalent: |\begin{displaymath}|\ .
\item ``|\]|'' \\
\bold{End \emph{displaymath mode} command.}\\
Equivalent: |\end{displaymath}|\ .
\item ``|. |'', ``|? |'', ``|! |'' \\
\bold{End-of-sentence marks.}\\
Equivalent: |\eos;|, |\eoq;|, |\eoe;|\ .\\
A period followed either by two blank spaces or by a newline
is recorded as an end of sentence. There is similar provision
for the question mark and the exclamation point. These tagged
forms may be used explicitly for the corresponding punctuation
inside math displays to distinguish punctuation from
mathematical use of the punctuation symbols. Explicit markup
for a comma is ``|\cma;|''.\footnote{
Alternate forms ``|\aos;|'', ``|\aoq;|'', ``|\aoe;|'' of sentence ending
punctuation are provided that may be used following inline mathematical
markup at the end of a sentence. Similarly, ``|\aoc;|'' is an alternate
form for a comma.
}
\end{enumerate}
\subsection[][\label{large}]{Large Structure}
Overall an \emph{article} consists of a \emph{preamble} followed by
a \emph{body}. As \siref{beginend}{noted previously} advanced mode
provides a special form of usage
\display{|\begin{document} . . . \end{document}|}
with the paired metacommands \emph{begin} and \emph{end} that with an
\emph{article} delimit its \emph{body}. This is enabled with the
\emph{gellmu-trans} call for the \st, and, consistent with regular
\latex; usage the \emph{preamble} is present without explicit tagging.
The only required markup in a \emph{preamble} is a \emph{title}, and it
is formally correct for its content to be empty. The \sgml content
model for a \emph{body} is somewhat loose, but is usually understood
to consist of \emph{section}s and may contain ordinary paragraphs
(\emph{par}, which must be marked up explicitly, or \emph{parb},
which is begun with a blank line). Although a \emph{body} may be entirely
empty (likely not useful) or may consist only of \emph{par} and
\emph{parb} elements, all inline text must be within one of these
basic paragraph containers.
For example this formally correct textless document is handled without
noise by the \dps:
\begin{verbatim}
\documenttype{article}
\title{}
\begin{document}\end{document}
\end{verbatim}
while the following document with text outside of a paragraph fails
initial validation in the \dps:
\begin{verbatim}
\documenttype{article}
\title{}
\begin{document}
x
\end{document}
\end{verbatim}
The error may be corrected by placing a blank line before the
character \quochar{x}.
The content model for \emph{preamble} is tighter than that for
\emph{body}. This makes it possible to have a greater level of
error-checking than would otherwise be possible. For example, a
preamble must have exactly one \emph{title} although it is not
specified where in the preamble a title might be. There may be at
most one of each of the elements \emph{surtitle}, \emph{subtitle}, and
\emph{date}. Although \emph{newcommand}, which is a metacommand, does
not affect the document type definition and may be used at different
locations in the preamble, the small number of actual elements that
may be multiply used in the preamble, such as \emph{mathsym} (which is
partly a metacommand) must be located together.
\subsection[][\label{sectioning}]{Sectioning}
In classical \latex; one writes simply
\display{|\section{Some title for a section}|}
to begin a new section. While this markup specifically delineates
the section title, \latex; understands that a section has begun.
The \emph{section} command has an option whose content is an alternate
title for the table of contents, and the starred form of the command
suppresses an otherwise automatic section number.
With \sgml the classical approach is something along the following
lines:
\begin{verbatim}
Some title for a section
A paragraph.
Another paragraph.
. . .
\end{verbatim}
\emph{Basic} \gellmu markup corresponding to this would
be:\footnote{Or one
could use:
\begin{verbatim}
\Section{\sectiontitle{Some...}
\para A para... . . .}
\end{verbatim}
instead of using the environment-like
\emph{begin}/\emph{end} construction.
}
\begin{verbatim}
\begin{section}\sectiontitle{Some title for a section}
\para A paragraph.
\para Another paragraph.
. . .
\end{section}
\end{verbatim}
If, moreover, the markup is to be well-formed \xml markup, then each
\emph{para} tag would need explicit closure with ||.
The main point here is that the open and close tags for a section
in a classical \sgml document type encompass the whole contents of
a section, and separate markup is required for the section
title.\footnote{In fact, classical \sgml document types are often
even more elaborate than this.}
The didactic \sgml document type under \emph{advanced} \gellmu seeks to
model classical \latex; as closely as possible in order to provide a
bridge for authors from \latex; to \sgml (and, indeed, \xml). For
this reason a command \emph{section} similar to the classical \latex;
command that delineates a section title is provided in the didactic
\sgml document type. At the same time this document type has a whole
section container \emph{Section} (upper case \emph{S}) that in the
simplest case consists of a \emph{shead} container for its title (or
header) followed by any number of paragraphs. The \xml form of the
\ddt supports only this latter tag, which means that
the translation script that converts \sgml to \xml has an unambiguous
way of performing the conversion from \emph{section} to \emph{Section}
provided that the author's source leads to an \sgml instance which is
valid\footnote{Caveats:
\begin{menu}
\item No document type definition under \sgml is actually a complete
language definition. A document type definition describes a document
markup structurally; in particular, it does not provide definition
for legal \quophrase{field} values.
\item While the \gellmu syntactic translator is now considered as
\quophrase{alpha} software, the document type and the accompanying
translators, which constitute the didactic production system are
developmental. These materials have some support for obsolete
practice and also contain sketch sections that are not fully robust.
\end{menu}
}
under the \ddt.
\subsection[][\label{labelref}]{Labels, References, and Anchors}
A \emph{label} command may be placed anywhere that text may be placed
in order to mark a location and associate a symbolic \quophrase{key}
with that location. A \emph{ref} command may be placed anywhere that
text may be placed to generate a visible allusion called its
\emph{reference value}, for example a section number, to the location
associated with a label key. An \emph{anch} command may be placed
anywhere that text may be placed to provide a hypertext reference
either by key to an internal location or by uniform resource
identifier (\abbr{URI}) to an external resource.
At this point the discussion will become descriptive of of the entire
\dps rather than simply of its document type.
\subsubsection[][\label{seq}]{Labels and Sequencing}
The basic usage for \emph{label} is
\display{|\label[:series="|%
\emph{series-name}%
|" serseq="|%
\emph{number}%
|" refkey="|%
\emph{ref-key}%
"|]{|%
\emph{key}%
|}|}
where use of the attributes is optional and, moreover, the required
\emph{key} may be empty, i.e., the markup |\label{}| is permitted.
The \emph{key} must be a case-sensitive string that is
\bold{monocase} unique.\footnote{
It is recommended that the characters in label key strings be
restricted to lower case \abbr{ASCII} letters, the digits 0--9, and
possibly the character \quochar{-} or the character \quochar{.} for
maximal inter-operability with current and future formattings.
For example, the \quochar{\_} is problematical in this context.
}
The \dps will provide a unique automatically generated string value
for \emph{key} if none is provided by the author; this can be useful
if a \emph{series} name is specified as an attribute for the label.
An author should never reference a label key not provided by the
author, but the empty element \emph{popkey} is intended for processing
evaluation as the last label \emph{key}, automatic or not, preceding
its location.
Sequencing\footnote{
Although one \emph{could} provide \sgml modeling for \latex;\apos;s
counters, it would not be very much along the lines of main track
\sgml or \xml document types.
}
may be handled under the \gellmu didactic \emph{article}
document type using labels and references. Toward this end one
makes use of three \sgml attributes that are provided with the
\emph{label} tag:
\begin{description}
\item[series] The value is the name of a sequenced family of labels.
A label may belong to at most one family, but there may be multiple
labels at the same location. There is no default value of series.
\item[serseq] The value is the sequence number of the label in its series
if a series is defined. It is meaningless if no series is specified
for the label.
\item[refkey] The name of a label key from which to spawn an automatically
generated value for the attribute \emph{serseq} of the current label.
\end{description}
Every label has a reference value that is normally accessed with
the \emph{ref} command. This results in the creation, when
the \xml version is generated, of an \xml entity reference with name
based on the reference's \emph{key} argument, that matches a \abbr{CDATA}
entity definition at the top of the \xml document. The use of indirection
provided by this entity technique means that it is immaterial whether the
reference is forward or backward.
The \emph{evalref} command gives access to the literal value of a
reference without indirection, and places that value as a literal in
the \xml version of an article. Thus, \emph{evalref} is the name of a
tag only in the \sgml document type and in the author-level \xml document
type but not in the elaborated \xml document type.
An \emph{evalref} invocation will be successful only with a backward
reference. This is extremely useful for managing non-default
numbering of sectional units. For ordinary label references its use
is undesirable even though it is possible for backward references.
The reference value of a label is determined as follows:
\begin{enumerate}
\item Basic reference values are positive integers. Upper and lower
case alphabetic values and upper and lower case roman numeral values
may be obtained by applying the \emph{series} command (not to be
confused with the \emph{series} attribute for the \emph{label} command)
to a basic reference value with \emph{type} attribute of \emph{series}
set to one of \quochar{A}, \quochar{a}, \quochar{I}, or \quochar{i}.
\item If the label is assigned to a label series and is given a
\emph{refkey} attribute, then the reference value of the label is the
reference value of the label referenced by the key that is the value
of the \emph{refkey}. (This mechanism is used to re-start the
sequencing of the sectional unit id's at the end of this document.)
\item Else if the label is assigned to a label series and the author
supplies an explicit \bold{literal} numeric value for the
\emph{serseq} attribute, then the value of \emph{serseq} is its
reference value. (Markup --- in particular, \emph{evalref} ---
cannot be used in defining an attribute value.)
\item Else if the label is assigned to a label series, the reference
value of the label is the next (positive integer) value for a label
in that series. (This mechanism is used to control the sectional
id of the last section of this document. It may also be used to
run parallel interleaved sequences of sectional units, such as, for
example, questions and answers, within a document.)
\item Else the reference value of the label is the sectional unit
identifier, i.e., its \emph{sunit} value rather than the logical
value in its attribute \emph{sid}, of the smallest sectional
unit containing the label. These values may not be numeric. For
example the string \quophrase{A.3.2} might be a sectional unit
identifier. The \emph{series} command should not be used to express
type conversion of a reference value that might resolve as a sectional
unit identifier.
\end{enumerate}
\subsubsection{Anchors}
The basic usage for \emph{anch} is
\display{|\anch[|%
\emph{reference specs}%
|]{|%
\emph{anchored text}%
|}|}
where the option, which is not an attribute option but rather
becomes the element \emph{anchref} in XML (while the anchor's argument
--- its \quophrase{presented content} --- becomes \emph{anchv}), is
expected to contain white space separated strings of the form
\quostr{name="value"}\footnote{
The value strings may contain simple markup such as, for example,
\quophrase{|\tld;|} to provide robust multiple output processing of
\quochar{\tld} whereas an attribute option may not contain markup.
}
with name restricted to one of the following:
\begin{defnlist}
\term{fref}\desc A footnote reference. Value is a string
that becomes the content of an automatically created
footnote to the text in \emph{anchv}. This usage is deprecated;
use \bsl;footnote instead.
\term{href}\desc A web reference as with \emph{href} in \html.
That is, value is a \abbr{URI}. It \emph{could be} of the form
\qquostr{\#labelkey} where \qquostr{labelkey} is the name of a label
key that is preferably and more easily used with \emph{iref}.
(The \quochar{\#} needs to be escaped, i.e., marked up as
\qquostr{\bsl;\hsh;}.) In a non hypertext formatting the \abbr{URI}
may be presented as a note or footnote associated with the text in
\emph{anchv}.
\term{Href}\desc Same as \emph{href} except that the author wishes
to suppress any note or footnote presentation of the \abbr{URI}. This
might be the case if, for example, the \abbr{URI} might be obviously
deducible from the \emph{anchv} content.
\term{iref}\desc An internal reference; value must be a label key arising
from \emph{label} or \emph{klabel}\footnote{A \emph{klabel} is a
\quophrase{visible key} label.}.
\end{defnlist}
Note also that there is a command \emph{urlanch} (probably should have
been \emph{urianch}), taking a single argument, used for \abbr{URI}s,
which is intended to have the same effect as a newcommand with
one argument for creating a web anchor with the \abbr{URI} as presented
content.
\subsubsection[][\label{counters}]{Example Emulating a \latex Counter}
Suppose that one wants to fashion an inline enumerated list at some
point in a document. An \emph{enumerate} environment is a list
structure that, while it may occur in a paragraph, is not inline. The
idea is to give each item a label and fashion the inline list item
number by referencing that label. One writes a newcommand to ease
this.
The name of a \latex counter is emulated with the name of the
\emph{series} attribute for one or more labels. Unless one wishes to
be able to reference the items apart from the immediate reference, one
need not provide a label key. The \dps will provide a default labelkey
and the command ``|\popkey|'', which grabs the key of the last preceding
label, may be used to furnish the key for the immediate reference.
If one wants small Roman numerals, one wraps the ``|\ref|'' command
in a ``|\series|'' command. Since \emph{series} requires an actual
number, one must use \emph{evalref} instead of \emph{ref}, which is
permitted so long as the reference follows the label.
Here's the markup:
\begin{verbatim}
\label[:series="foo" serseq="0"]{} % zero the counter named foo
\newcommand{\ti}{%
\label[:series="foo"]{}(\series[:type="i"]{\evalref{\popkey}})}
Hilbert's three most important contributions to algebraic geometry
are \ti~the basis theorem, \ti~the nullstellensatz, and \ti~the syzygy
theorem.
\end{verbatim}
The rendering is this:
\label[:series="foo" serseq="0"]{} % zero the counter named foo
\newcommand{\ti}{%
\label[:series="foo"]{}(\series[:type="i"]{\evalref{\popkey}})}
Hilbert's three most important contributions to algebraic geometry
are \ti~the basis theorem, \ti~the nullstellensatz, and \ti~the syzygy
theorem.
With the current \dps if one had used ``|\ref|'' instead of
``|evalref|'', the translator from author-level \xml to elaborated
\xml, which resolves references, would have issued a warning in the
scroll, but the build would have run to otherwise successful completion
by ignoring the presence of the \emph{series} command.
\subsection[][\label{secusage}]{\label{last}General Usage for Sectional Units}
This describes the content model in the \gellmu \ddt
for the \quophrase{whole} sectional units \emph{Section},
\emph{Subsection}, \ldots as fully tagged.
\subsubsection[][\label{seccontent}]{The Content Model}
The content model for sectional units is:
\display{\quostr{((sopt)?,(sprefix)?,(sunit)?,(shead),(\%UnitContent)*)}}
where:
\begin{defnlist}
\term{\%UnitContent}\desc
refers to the subsections, loose paragraphs, and other general content
that is allowed inside a section.
\term{shead}\desc
is the required\footnote{
It may be left empty, but it must be present.
}
section header or title.
\term{sopt}\desc is an optional title for use in the table of
contents\footnote{
The presence of \emph{sopt}
does not cause a table of contents to be produced automatically.
For that one uses \qquostr{\bsl;tab\-le\-of\-con\-tents}. Moreover,
the presence of \emph{sopt} should have no effect upon a manually
constructed \qquostr{\bsl;Tab\-le\-Of\-Con\-tents}.}.
\term{sprefix}\desc
is optional markup for text that is to precede the sectional unit
sequence notation. For example, if the sectional unit sequence is
\quophrase{B} and \emph{sprefix} is the markup string
\qquostr{Appendix } (ending with a blank space), then the visible
indicator for the sectional unit, when consistent with the setting of
\emph{secnumdepth}, is \quophrase{Appendix B} both at the beginning
of the section and in the table of contents. Or if the sequence is
\quophrase{3} and the \emph{sprefix} is \quophrase{A}, then the
visible indicator is \quophrase{A3}.
\term{sunit}\desc
is an optional setting for the sectional unit sequence. The \gellmu
\ddt has an attribute \quophrase{sid} for the
sectional units \emph{Section}, \emph{Subsection}, \ldots that is
optional (and rare for author usage) in the \sgml version but
required in the \xml version. The translator from \sgml to \xml
computes it in the standard way. For example subsubsection 1 of
subsection 3 in section 2 acquires the \emph{sid} \quophrase{2.3.1}.
It is expected that formatters will use this value for the visible
sectional unit indicator, preceded, as previously described, by
any \emph{sprefix}, unless the user provides \emph{sunit}. While
\emph{sunit} is intended to override the visible indicator, it is
not provided to override the \emph{sid} attribute itself which a
formatter should see as describing logical structure.
\end{defnlist}
\subsubsection[][\label{ltxsecusage}]{The \latex;-Like Form of General Usage}
Corresponding \latex;-like \argopt syntax can be used, as previously
indicated, in \gellmu source with \emph{section}, \emph{subsection},
\ldots . If, however, \argopt syntax is used, one must be mindful of
the ordering of the options, and use empty option brackets, as
necessary, to indicate the position in the sequence of an option with
content. For example, a sole option is understood as \emph{sopt}, the
version of the sectional unit title to be used in the table of
contents. To provide only \emph{sprefix} with \argopt syntax one
precedes the bracket sequence for \emph{sprefix} with an empty pair
\qquostr{[]} of option brackets\footnote{
The \dps offers a way to furnish a formally empty string, which is an
empty element called \emph{empty} in the document type for use in
places such as the the table of contents option of a sectional unit,
where it is not otherwise possible to distinguish after parsing whether
deliberately empty content was specified by the author.
That is, the markup \qquostr{\bsl;sopt\{\}} furnishes an empty
string which, in turn, signals \quophrase{no \emph{sopt}}, while
\qquostr{\bsl;sopt\{\bsl;empty;\}} indicates that empty content
was specified for \emph{sopt}.
}
for \emph{sopt}.
\subsection[][\label{verb}]{\emph{verbatim}, \emph{verblist}, and
\emph{manmac}}
The ordinary notion of \quophrase{verbatim} is complicated in the
context of a document type that is intended for processing toward
multiple output formats. There is no special provision for verbatim
markup under \emph{basic} \gellmu\footnote{
When editing for \html one may, of course use \html's \emph{pre},
which stands for \quophrase{pre-formatted text}
},
but there are two layers, one simplistic and one sophisticated, in the
\dps with each layer having provision for both inline and block-level
verbatim markup.
The simplistic approach involves the provision of an inline element
\emph{verb} and a block-level element \emph{verbatim} in the \ddt.
With these the author is responsible for entering special characters
in ways that are safe for input source notation, safe for \st
output, and safe for each output format that is envisioned.\footnote{
With a sufficiently long list of output format candidates each of the
33 non-alphanumeric printable \abbr{ASCII} characters is unsafe.
However, one might use an external character-to-string conversion
program to prepare a large amount of verbatim material for inclusion
inside the simplistic \emph{verbatim} command in \gellmu source.
}
In the simplistic layer the formatting program in the \dps for \html
output renders \emph{verb} as an \html \emph{kbd} element and
\emph{verbatim} as an \html \emph{pre} element. In formatting for
regular \latex; output each of these \sgml elements is rendered
as its \latex; namesake. Neither the \emph{pre} element in \html
nor the (basic) \emph{verbatim} command in \latex; is context-sensitive,
and typically are formatted with left margin justification.
The sophisticated layer in the \st for \emph{verbatim} is enabled by
setting the variable \emph{gellmu-verbatim-clean} true. If this is
the case, then a user should input verbatim material literally between
|\begin{verbatim}| and |\end{verbatim}| markers occupying whole lines
and the \st will convert each of the 33 non-alphanumeric printable
\abbr{ASCII} characters therein to the corresponding empty element in
the \ddt, and render each line of the converted material as a list
item bearing the item name \emph{nln} in a list element
\emph{verblist}.\footnote{
To export this procedure for general \emph{advanced} mode usage, all
of the names used need to be made user-configurable.
}
Similar arrangements pertain to the sophisticated inline analogue
of \emph{verb}, which is enabled in the \st by setting the variable
\emph{gellmu-manmac-bar-name} to a non-empty string value that is
to become the name of the element, such as \emph{quostr}, to contain
the content, which should be delimited in source by successive
\quochar{\vbr;} characters. By default the user must enter a
verbatim string in \quophrase{safe} form, but may enter it,
apart from any occurrence of the character \quochar{\vbr;}, literally
if the variable \emph{gellmu-manmac-literal} is true. The term
\quophrase{manmac} is derived from an old plain \tex; package for
writing documentation by that name in which the character \quochar{|}
is used to delimit inline literal material for verbatim presentation.
Moreover, consistent with usage observed in \latex; documentation
markup of the form
\display{
\quostr{\bsl;}\emph{name}\quostr{\vbr}\emph{literal-text}\quostr{\vbr}
}
is translated by the \st to an element named as with simple
\qquostr{\vbr;...\vbr;} having a setting for its attribute
with name the value of the string variable
\emph{gellmu-manmac-attribute}, which for \emph{quostr} in the \ddt
would appropriately be its attribute \emph{inv}.
Variable settings to provide for the use of literal input both for
\emph{verbatim} as a metacommand front for the list element
\emph{verblist} and \emph{manmac} configuration for the element
\emph{quostr}, as described, are default when the \st is begun with
the non-default function \emph{gellmu-latex-faq}. This function also
sets the variable \emph{gellmu-squophrase-name}, which otherwise
defaults to the empty string, to the value \qquostr{squophrase}.
This causes the \st to attempt to interpret balancing
character pairs consisting of the character \quochar{\lsq;} and the
character \quochar{\rsq;} as delimiters for the element
\emph{squophrase}, an alternate form along with \emph{quophrase},
for \quophrase{quoted phrase}, with odd instances of the character
\quochar{\rsq;} set as the empty element \emph{apos}, which represents
an apostrophe.
\subsection[][\label{accents}]{Accents}
Traditional \latex; markup employs accent commands for modifying
\abbr{ASCII} characters in order to produce non-\abbr{ASCII}
characters. On the other hand characters from 16
\quophrase{byte-planes} of \abbr{Unicode} are now available in \html
and \xml as ordinary characters, with treatment as the atoms of
ordinary strings of text not requiring any special attention or notice
from the viewpoint of markup. It may happen that this will affect
development in the \latex; project, and it is, therefore, unclear what
the long term role might be of \latex;'s accent commands.
Nonetheless, the \ddt provides element names corresponding to the
14 classical \latex; accents although it does not provide classical
markup notations such as \qquostr{\bsl;\apos;},
\qquostr{\bsl;\quo;}, \qquostr{\bsl;\tld;}, \ldots inasmuch
as names are required in \st output.\footnote{Familiar short forms
may be introduced by a user using the \emph{macro} facility.}
The names for the accents may be seen by locating the word
\quophrase{accent} in the \dps file \path{gellmu.dtd} and reading
the following 14 lines.
An author, particularly an author residing in an English language
locale, may introduce, for example, the character \acute{e} in
several ways:
\begin{enumerate}
\item by using an algorithmic accent command --- in this case
|\acute{e}|: \acute{e}.
\item by direct Unicode-based entity reference -- in this case
|é|: é.
\item by simple direct entry of the Unicode point in a file
having the UTF-8 text encoding.
\end{enumerate}
Note that Unicode values are numbers represented in hexadecimal
notation (base 16); the digits are 0--9 and A--F. Thus, E9 is
233, and one could also use the entity reference |é|.
While for the long term, the largely equivalent\footnote{For the
third method one's \gellmu driver script must provide appropriately
for the text encoding of the source file.}
second or third methods are superior, there are caveats for their use:
\begin{itemize}
\item One's \latex; back end must have a suitably robust way
of handling Unicode, perhaps with the \emph{inputenc}
package or possibly by using a unicode-capable \tex; engine
such as \softw{xetex} or \softw{luatex}.
\item Even when fonts are available, a few web browsers
still seem only to handle Unicode via entity references.
\end{itemize}
\subsection[][\label{tabular}]{Tabular Environment Emulation}
The \ddt is able to accommodate emulation
of \latex's \emph{tabular} environment with \qquostr{l}, \qquostr{c},
\qquostr{r}, and \qquostr{p} column cell specifiers with, where
robust \abbr{CSS} support is available for \html, \qquostr{|} indicators
for vertical cell rules and \qquostr{\bsl;hline} commands for horizontal
rules between rows. While \emph{multicol}, and \emph{multirow} are not
presently modeled, cells may contain other \emph{tabular}s recursively.
A \qquostr{p} column specifier optionally takes a decimal argument
that represents the fraction of available width to be made available
for the cells in that column. When is no fractional width specifier,
the default fractional width is $\frac{1}{n+1}$ where $n$ is the
total of number of columns.
What may be viewed as a shortcoming of the \emph{tabular} emulation
is that in each \emph{tabular} row the first cell must contain
some markup in order for the document to be structurally correct.
This arises from a rule associated with \sgml document types
that might be overcome if \gellmu source were processed by a
monolithic program. Instead, however, it is an
assembly of \sapref{flow}{separate components} in which the first stage
has knowledge of syntax but not of the markup vocabulary.
One thing to keep in mind with \emph{tabular} is that in the current,
though perhaps not future, design of \html tables are not allowed
in paragraph content. The \dps works hard to deal with this. In most
web browsers there will be a line break before an \html table and
again after an \html table. (One way to finesse that is to place a
table in a cell in a sur-table, which is abuse of markup.) This
author generally places a \emph{tabular} inside a \emph{display},
which is the \gellmu object corresponding to \latex's \emph{center}.
The document type does not offer ``floating'' objects, and, therefore,
the name \emph{table} is not used as as in \latex;.
The name \emph{table} is used for emulation in regular \gellmu
source of \html tables. A \emph{table}, like a \emph{tabular},
takes a required argument consisting of column specifiers.
In the \dps at the point where an article is spun to the elaborated
\xml version of \emph{article}, there is no longer a distinction
between \emph{tabular} and \emph{table}.
There is also \emph{array}, which, as in \latex;, is the in-math
version of \emph{tabular}, except that in \gellmu its emulation of
\latex;'s array survives translation to the \xml version of
\emph{article}. An \emph{array} has only \qquostr{l}, \qquostr{r},
and \qquostr{c} cells.
\bold{Example:} The markup for the \iref{dtdfiles}{table of \dtd's}
in Appendix \sref{dtdfiles} is this:
\begin{verbatim}
\begin{display}
\begin{tabular}{l|cc}
~ & Latin-1 & \abbr{UTF-8} \\
\hline
First stage \abbr{SGML} & \quostr{gellmu.dtd} & \quostr{ugellmu.dtd} \\
Author Level \abbr{XML} & ~ & \quostr{axgellmu.dtd} \\
Elaborated \abbr{XML} & \quostr{xgellmu.dtd} & \quostr{uxgellmu.dtd}
\end{tabular}
\end{display}
\end{verbatim}
Because the one horizontal divider and the one vertical divider rely
on \css support in \html, these dividers might not be seen in a web
browser with weak \css support.
\subsection[][\label{graphics}]{Graphic Inclusions}
There is basic support for graphics inclusions suitable for the \html
backend and the \latex backend with both \abbr{DVI} and \pdf outputs.
The arrangements are not unlike those required for the use of
the \emph{includegraphics} command provided by \latex's
\emph{graphicx} package. This means that, apart from the \dps's chain
of processors, one needs to provide several different versions of a
given graphic object in order to accommodate the needs of the three
different output formats (\html, \pdf, and \dvi). A reliable graphics
format converter such as that provided by \softw{ImageMagick}
(\urlanch{http://www.imagemagick.org/})
or the \softw{netpbm} utilities
(\urlanch{http://netpbm.sourceforge.net/})
and a broad \tex; support environment similar to that provided by
\abbr{TUG}'s \href{http://www.tug.org/texlive/}{\softw{TeXLive}}
are essential for work with included graphics.
For example, if one begins with an encapsulated PostScript image
\quostr{glmy.eps} made, say, with \abbr{Metapost}, then one might run
the commands
\begin{verbatim}
epstopdf glmy.eps
convert glmy.pdf glmy.png
\end{verbatim}
to generate \pdf and \abbr{PNG} versions of the image. This should
be done before processing the document through the \dps;.
In this case the \pdf version of the graphic should be
regarded as best for use with \softw{pdflatex}, and so one wants to
have the \abbr{PNG} version out of view from \softw{pdflatex} when
building the \gellmu source \quostr{glman.glm} for the document.
With the new \softw{mmkg} drivers one may prepare a tiny file
\quostr{glman.prx} that is a list of names of image files, one file
per line, that should be out of view at the point in the pipeline when
\softw{pdflatex} is active. Thus, in the case of the given example, a
line in the file \quostr{glman.prx} should contain the name
\quostr{glmy.png}.
In \urlanch{glman.glm} one has the
code:
\begin{verbatim}
\begin{quotation}
Sometimes a picture is worth a thousand words.
\display[:frame="1.1"]{\includegraphics[:scale="0.2"]{glmy}}
\end{quotation}
\end{verbatim}
which yields:
\begin{quotation}
Sometimes a picture is worth a thousand words.
\display[:frame="1.1"]{\includegraphics[:scale="0.2"]{glmy}}
\end{quotation}
In this markup one sees, first of all, two uses of \abbr{SGML}
attributes --- \emph{frame} and \emph{scale}. Attribute
options are opened with \qquostr{\lsb;:} rather than simply with
\qquostr{\lsb;}. Whereas options generally may contain markup,
attribute options may not.
The meaning in this example of \emph{scale} is that the \bold{width}
of the graphic in \abbr{DVI} and \abbr{PDF} outputs will be the result
of multiplying the \emph{scale} by the width of text on the page
(\latex;'s value of \emph{\bsl;textwidth}).\footnote{This meaning of
\emph{scale} differs from the meaning of \emph{scale} with
\emph{includegraphics} under \latex's \emph{graphicx} package. New
controls of this type may be introduced in a future version.}
By default the \html formatter will link to the \abbr{PNG} without a width
specification. Trying to interpret the scale attribute in the context
of \html, where pages can be re-sized and re-flowed, is not sensible.
Of course, it is always preferred that \html images contains width and
height specifications, and these need not be the actual image sizes but
generally should be proportional to the actual sizes. To specify \html
sizes for an image with stem name \quostr{glmy} prepare a one line text
file \quostr{glmy.htsz} containing the width and height in pixels separated by a space.
The meaning of \emph{frame} is that for print
outputs the graphic image will be surrounded by a \latex;
\emph{framebox} with diameter that is $1.1$ times the diameter of the
\emph{display}'s content. In the current \html formatter no use is
made of the numeric value of \emph{frame} although a default frame is
constructed via an \html attribute giving values for the \abbr{CSS}
properties \emph{border} and \emph{padding}. When image framing is
desired, another approach is to incorporate framing in the graphic
itself.
\section[][\label{math}]{Mathematics in \emph{article}}
\subsection[][\label{genmath}]{General}
There was previous mention of the basic mathematical container element
\emph{tmath} in the discussion of \siref{single}{single string} markup
and the \emph{math} and \emph{displaymath} elements in the discussion
of \siref{certainstrings}{certain strings} of special markup
significance.
The markup used inside these containers is very similar to that
which is used in \latex; although far from all of \latex;'s math
markup, as extended by the \softw{amsmath} package, has any
analogue in the \dps.
There are a few things that deserve short mention:
\begin{defnlist}
\term{\bsl;sum, \bsl;int, and \bsl;prod} \desc are all similar to
their \latex; namesakes except that each requires explicit closure.
For example,
\display{|\[ \sum_{0}^{\infty} \frac{x^k}{k!} \sum: \]|}
produces
\[ \sum_{0}^{\infty} \frac{x^k}{k!} \sum: \ \eos\]
\term{\bsl;regch, \bsl;mbox, and \bsl;text}
\desc The \ddt provides \emph{regch},
for \quophrase{regular character} that is a one character version
of \emph{mbox} provided for separating from general
\emph{mbox} matter the content to which a non-math
\siref{accents}{algorithmic accent}, may
be applied. For example,
\begin{verbatim}
\newcommand{\Q}{\regch{\bold{Q}}} % intended for use in math
\newcommand{\galQ}{\mbox{Gal}(\ovbar{\Q}/\Q)} % only in math
$\galQ$\aos;
\end{verbatim}
produces
\newcommand{\Q}{\regch{\bold{Q}}} % intended for use in math
\newcommand{\galQ}{\mbox{Gal}(\ovbar{\Q}/\Q)} % only in math
$\galQ$\aos;
Both \emph{regch} and \emph{mbox} should be used only for symbols.
Note that ``Gal'' in the foregoing is a symbol. The command
\bsl;text is provided for the use of text phrases -- usually conjunctive
in nature -- inside math zones. For example, the markup
\begin{verbatim}
\[ \absval{x} = \lbalbr{\begin{array}{rl}
x & \text{\ if\ } x \geq 0 \\
-x & \text{\ if\ } x < 0
\end{array}} \]
\end{verbatim}
produces
\[ \absval{x} = \lbalbr{\begin{array}{rl}
x & \text{\ if\ } x \geq 0 \\
-x & \text{\ if\ } x < 0
\end{array}} \]
In this example note that the command name \emph{lbalbr} stands for
``\bold{l}eft \bold{bal}anced \bold{br}ace''. It corresponds to
the use of |\left\{ ... \right.| in \latex;\footnote{
Note that the use of \emph{lbalbr} in this instance is insufficiently
semantic for translation to content \mathml while it is meaningful
for translation to presentation \mathml. Adequate enhancement might
be had by providing |mml="cases"| (using a name from \emph{amsmath})
as an attribute for \emph{lbalbr} with this example.
}.
\term{\bsl;aos;, \bsl;aoc; \bsl;aoq;, and \bsl;aoe;} \desc are the
named forms of the \latex; space adjustment commands \qquostr{\bsl;@.},
\qquostr{\bsl;@,}, \qquostr{\bsl;@?}, and \qquostr{\bsl;@!}, which
provide correct placement of inline punctuation immediately following
inline mathematical markup. There is now also default provision in the
\dps for the more \latex;-like \qquostr{\bsl;@} usage. Note that the
use of \quostr{\bsl;@} is seldom necessary.
\end{defnlist}
\subsection[Assertions][\label{assert}]{Assertions: Near Emulation of
\latex;'s Theorem-Like Environments}
\subsubsection{Examples}
The container element \emph{\bsl;assertion} is used in the \ddt for the
creation of its analogue under \sgml of theorem-like environments. As
a first example of the near emulation of a \latex; theorem-like
environment, here is markup to give \latex;-like meaning to
``|\begin{theorem}|'' and ``|\end{theorem}|''.
\begin{verbatim}
\newcommand{\begin{thm}}[1][]{%
\begin{assertion}[#1][theorem]{Theorem}[\evalref{\popkey}]}
\newcommand{\end{thm}}{\end{assertion}}
\end{verbatim}
\newcommand{\begin{thm}}[1][]{%
\begin{assertion}[#1][theorem]{Theorem}[\evalref{\popkey}]}
\newcommand{\end{thm}}{\end{assertion}}
For a first instance it is invoked without supplying the optional first
argument, which is for a label key.
\begin{thm} The continued fraction expansion of a real number is finite
if and only if the real number is a rational number.
\end{thm}
This is just text to follow the statement of a theorem.
If we do the same thing again, the theorem number should go up by
$1$ since in the \dps this usage is equivalent to using the default
counter associated with a label series name \emph{theorem}.\footnote{
There is also a default counter that is used when no label series name
is present. That counter simply is the position of the underlying
\emph{assertion} in the list of all assertions.
}
\begin{thm} The continued fraction expansion of a rational number is
eventually periodic if and only if the real number is an irrational
number that is the root of some quadratic polynomial with rational
coefficients.
\end{thm}
\subsubsection[][\label{assertion}]{Usage for \emph{assertion}}
The general usage of \emph{assertion} is the following:
\begin{menu}
\item |\begin{assertion}[|\emph{key}|][|\emph{series}%
|]{|\emph{name}|}[|\emph{id}|]|
\item | . . .|
\item |\end{assertion}|
\end{menu}
The options \emph{key} and \emph{series} represent the same things as
the corresponding \siref{seq}{\emph{label}} options. The \emph{id}
option is for explicit customization of the visible identifier, e.g.,
theorem number, as illustrated with one of the examples later in this
section. One may use the \emph{id} option, which must follow
the name argument, without using either of the other options. But in
order to use the \emph{series} option, a \emph{key} option, which may
simply be empty, must be present.
There is also a fully named way to proceed:
\begin{menu}
\item |\begin{assertion}|
\item |\asstkey{. . .}\asstser{. . .}\asstname{. . .}\asstid{. . .}|
\item | . . .|
\item |\end{assertion}|
\end{menu}
Here order is important, but any of the options may simply be omitted.
For example, we may write
\begin{verbatim}
\newcommand{\begin{Thm}}[1]{%
\begin{assertion}\asstser{#1}\asstname{Theorem}%
\asstid{\sref;.\evalref{\popkey}}%
}
\newcommand{\end{Thm}}{\end{assertion}}
\begin{Thm}{XXseries}
If $[ n_1, n_2, \ldots ]$ is an infinite continued fraction, then
its sequence of convergents always has a limit.
\end{Thm}
\end{verbatim}
to obtain:
\newcommand{\begin{Thm}}[1]{%
\begin{assertion}\asstser{#1}\asstname{Theorem}%
\asstid{\sref;.\evalref{\popkey}}%
}
\newcommand{\end{Thm}}{\end{assertion}}
\begin{Thm}{XXseries}
If $[ n_1, n_2, \ldots ]$ is an infinite continued fraction, then
its sequence of convergents always has a limit.
\end{Thm}
Notice that the \emph{asstid} argument is merging the reference value
for (the new) series \emph{XXseries} with the visible id of the
current sectional unit.
\subsection[][\label{eqn}]{Equations and Equation Arrays}
The general usage for \emph{equation} is the following:
\begin{menu}
\item |\begin{equation}[|\emph{key}|][|\emph{series}|]|
\item | . . . |
\item |\end{equation}|
\end{menu}
The options \emph{key} and \emph{series} represent the same things as
the corresponding \siref{seq}{\emph{label}} options. In order to use
the \emph{series} option, a \emph{key} option, which may simply be
empty, must be present. The use of ``|equation*|'' as a name for an
equation display that is not numbered is
not permitted, but one may instead use the \emph{nonum} attribute
as follows:
\display{|\begin{equation}[:nonum="true"] . . . \end{equation}|\ .}
General usage for an equation array (name \emph{eqnarray}) is:
\begin{menu}
\item |\begin{eqnarray}[|\emph{key}|][|\emph{series}|]|
\item | . . . |
\item |\end{eqnarray}|
\end{menu}
where the content is an \emph{eqnabody} consisting of \emph{eqnrow}'s,
each row may be terminated in \ll markup, as in \latex;, with the
string ``|\\|'' and consists of three parts, corresponding to elements
\emph{eqnleft}, \emph{eqncenter}, and \emph{eqnright}, that may
be separated in \ll markup with the character \quochar{\&}.
Support for numbering in eqnarrays is only minimally developed
although there is suggestive sketching in the \ddt that is not
supported in the formatters. By default the equations in equation
arrays are numbered consecutively throughout an article. This
behavior can be altered by using the \emph{series} attribute of an
\emph{eqnarray}. If that is done, then, as things have been sketched,
numbering applies to whole arrays rather than to the equations within
arrays. Numbering in an equation array may be suppressed by setting
its attribute \emph{nonum} to the string \quophrase{|true|}.
\subsection[][\label{mathsym}]{The \bsl;mathsym Metacommand}
\emph{mathsym} is a macro substitution metacommand that is
available in the \dps for enabling an author to declare that a macro
name represents a mathematical symbol. It is a formal way of
recording statements commonly made by authors in introducing notation.
Unlike regular metacommands, which may appear at any point in
\gellmu source, \emph{mathsym} may only appear in the \emph{preamble}
of an \emph{article}, or, equivalently with defaults in the
\st, \emph{mathsym} may only appear before the \ll
\qquostr{\bsl;begin\{document\}}.
Its usage is:
\display{
|\mathsym{|%
\emph{symbol-name}%
|}{|%
\emph{symbol-rendering}%
|}[|\emph{symbol-meta-info}%
|]|\ \eos;}
Here \emph{symbol-name} is an alphanumeric string (case-sensitive)
beginning with a letter. The second argument is the presentation
rendering of the symbol in \gellmu markup. It is like the definition
of a \emph{newcommand} except that it may not involve
arguments.\footnote{
However, a declared math symbol may be invoked in
a \emph{newcommand} that takes arguments.
}
The optional third argument \emph{symbol-meta-info} is an
alpha-numeric string that might also include possibly a few other
string characters such as \quochar{/}, \quochar{-}, \quochar{,},
\quochar{.}, \quochar{*}, etc. Its exact structure depends on the
typing system. (No typing system is part of the \dps.) For example,
it might consist of (name, value) pairs for conveying meta-information
about the symbol.
The syntactic translator replaces each invocation of a given
\emph{mathsym} with the specified rendering and writes for each
\emph{mathsym} definition a corresponding element in the \sgml output
whose content consists solely of the declared symbol name if there is
no meta information but otherwise consists of the symbol name followed
by a blank space and then whatever string of meta information is
provided in the optional argument. Additionally, each invocation is
wrapped in a rendering-inert \emph{Sym} element whose \emph{key}
attribute reveals the name given to the symbol at the point of
declaration (and by which the symbol is invoked). This makes it
possible for a downstream authoring platform processor that has
remembered the list of declared symbol names to match each invocation
of a declared symbol with its associated meta information, if any,
provided by the author in the symbol declaration.
A related feature in the didactic \gellmu document type is the
\emph{mlg} tag for marking mathematical logical groups. This is
somewhat akin to the \emph{lgg} tag for \tex;-like logical groups,
traditionally created in \tex; markup with braces that are not
attached to a command.\footnote{
Such unattached braces in \gellmu markup lead to an \emph{lg0} tag in
the output of the syntactic translator that is translated to an
\emph{lgg} tag in the \xml version of the didactic document type.
}
As with \emph{lgg} there is no obvious evidence of an \emph{mlg} tag
in a typeset rendering, but the presence of such a tag is intended as
a signal to downstream mathematical parsers that the contents of the
tag be given grouping priority as, say, with visible parentheses.
Furthermore, the \emph{mtype} and \emph{mml} attributes of the
\emph{mlg} tag may be used to pass semantic information about the
tag's contents to a processor.
\section[][\label{dps}]{The Didactic Production System}
The \dps is the suite of processors and technical support files
underlying what is called regular \gellmu.
\subsection{Permission}
The items of the \dps are copyrighted free software released
under the \gpl.
\subsection[][\label{materials}]{Materials}
The release contains everything originating with the author that is
currently used in \quophrase{building} \gellmu documents.
It also contains a slightly \href{../perllib/SGMLS.pm}{modified version}
of David Megginson's \softw{Perl} module \qquostr{SGMLS.pm} based on
another slightly modified version that was furnished to me by Dave
Holden in a very quick early 1999 response to my posted request for a
modification that handles the labels provided (optionally) by
\softw{nsgmls} for \sgml elements that are defined empty. A similar
slight modification was also supplied a few days later by Vassilii
Kachaturov and had been available at his web site.
Although the materials offered in this package aside from the \st
pertain only to the \ddt and the \dps, it should be understood that
the larger design for \gellmu envisions other parties, on the one
hand, building in various ways to extend the functionality of the
didactic system, and, on the other hand, applying the methods of the
didactic system to other document types and other formatting programs
for those document types.
The basic items originating with the author, aside from the document
type definition \siref{dtdfiles}{files} are:
\begin{defnlist}
\qhref{../gellmu.el}{gellmu.el}{the \gst, which makes \sgml}
\qhref{../xplaingart.pl}{xplaingart.pl}{converts \sgml to
author-level \xml}
\qhref{../xmlgart.pl}{xmlgart.pl}{converts author-level \xml
to elaborated \xml}
\qhref{../ltxgart.pl}{ltxgart.pl}{translates elaborated \xml
to \latex}
\qhref{../htmlgart.pl}{htmlgart.pl}{translates elaborated \xml
to classical \html and translates specially prepared \xml to \mxhtml}
\qhref{../mathcdata.pl}{mathcdata.pl}{first of two special
preparations for translation toward \mxhtml}
\qhref{../mathprep.pl}{mathprep.pl}{second of two special
preparations for translation toward \mxhtml}
\qhref{../mval.pl}{mval.pl}{check for certain types of \mathml errors}
\end{defnlist}
Since some users will only be interested in the \st, additional
description of these materials is found below in
\siref{usingdps}{``Using the \dps''} and in the
\sapref{release}{Release Notes}.
\subsection[][\label{requiredsoftware}]{Other Required Software}
To make use of the \gst a user must have or separately acquire
\href{http://www.gnu.org/software/emacs/}{\emacs}.\footnote{
Version 20 or later should be adequate. Although the author began
this project using version 19, he is no longer able to run tests
with that version.}
(\quophrase{Windows} users should look at the special
\href{http://www.gnu.org/software/emacs/windows}{FAQ}.) \emacs is
commonly found on \gnu/Linux systems and on *ix systems. It may be
found on other systems when provided by system managers.\footnote{
It is an embarrassment of the business world in the years since 1985
that many business computing installations do not provide general
purpose cross-platform programming systems despite the widespread
availability of excellent free robust systems such as \emacs (for
Lisp), \softw{gcc} (for C), and \softw{Perl}. This new phenomenon
apparently arises not so much from lack of organizational interest but
from the fact that the responsibility for maintenance cannot be passed
beyond the local system manager to a vendor.
}
To make use of the \dps beyond the \st a user must have or acquire
the following items of free cross-platform software
\begin{itemize}
\item an \abbr{ESIS}
generating \sgml parser such as found in the cross-platform package
\href{http://www.jclark.com/sp/}{\softw{SP}} by James Clark,
which has stood the test of years, or the newer variant
\href{http://openjade.sourceforge.net/}{\softw{OpenSP}}, which
is internationalized, from the \softw{OpenJade} Team.
\item \href{http://www.cpan.org/}{\softw{Perl} at CPAN}.
\item a complete \tex system, for which one may look to
\href{http://www.tug.org}{The \tex Users Group (\abbr{TUG})} or
The Comprehensive \tex Archive Network
(\href{http://www.ctan.org}{\abbr{CTAN}}).
\item \softw{xmlwf} --- a basic utility that is part of the release of
James Clark's \href{http://expat.sourceforge.net/}{\softw{expat}}.
\end{itemize}
\subsection{Using the \st}
This explains how to use the \st, which is the Emacs Lisp program
contained in the file \upanch{gellmu.el}.
\subsubsection{Operation in Batch Mode}
For linux and the other *ix a script like \upanch{bin/linux/g2s} will be
adequate if your working directory is the distribution directory\footnote{
None of the enclosed scripts either for linux or for win32 should be used
without prior examination and verification for suitability.
}.
\display{Usage:\ \ \quostr{g2s\ \ }\emph{stem-name}
\quostr{\ \ [ }\emph{function-call}\quostr{ ]} }
For example, if \qquostr{foo.glm} is the name of the source file, then
the first argument should be \qquostr{foo}. The optional second
argument \emph{function-call} is the name of the function in the Emacs
Lisp package \qquostr{gellmu.el} that is to be used. The function
call defaults to \qquostr{gellmu-trans}, which is the correct name for
\latex;-like usage under the \siref{ddt}{\ddt}.
There are also parallel scripts \qquostr{g2h} and \qquostr{g2x}.
\qquostr{g2s} will byte compile \qquostr{gellmu.el} if
\qquostr{gellmu.elc} is not present.
\qquostr{g2h} runs the function \emph{gellmu-html} for the case
where the \gellmu file has been written directly for \abbr{HTML}.
The file \urlanch{ghtml.glm} and the derived file \urlanch{ghtml.html}
are examples.
\qquostr{g2x} runs the function \emph{gellmu-xml} for the case
where the \gellmu file has been written directly for \abbr{XML}.
The directory \qquostr{bin/win32} contains parallel, though more
complicated, batch files for use in a \quophrase{DOS} command line
under \quophrase{Windows}.
\subsubsection{Interactive Operation}
Open \Emacs interactively on the \gellmu source file. When
finished editing the source, save it but keep \softw{Emacs} open. Then do
\display{\quostr{M-x load-file gellmu.el}}
and
\display{\quostr{M-x gellmu-trans} \ \eos;}
(It is better to have byte-compiled \qquostr{gellmu.el} and if the
byte-compiled version \qquostr{gellmu.elc} is in your \softw{Emacs}
\emph{load-path}, then
\display{\quostr{M-x load-library gellmu}}
is faster.)
The \sgml output should come up in a second buffer. Save that buffer
to save the output.
Make any corrections or changes in the \gellmu source buffer and re-run
\display{\quostr{M-x gellmu-trans} \ \eos;}
As with batch operation the functions \emph{gellmu-html} and
\emph{gellmu-xml}, may be handled parallel to \emph{gellmu-trans}.
There are a number of other functions besides these three for
obtaining syntactic translation from \gellmu source to \sgml. Each of
these is, in fact, a front for calling \emph{gellmu-trans} with
various combinations of variable settings. There are a great many
user configurable variables in the syntactic translator. Notable
among these for \siref{regular}{\emph{regular} \gellmu} are \iseq{(1)}
\emph{gellmu-parb-nogo} and \iseq{(2)}\emph{gellmu-autoclose-list}.
See the variable documentation text, available interactively when the
\gellmu library is loaded in \softw{Emacs} with the key combination
qquostr{C-h C-h v}, for more information. For a list of the names of
all user configurable variables see the variable documentation for
\emph{gellmu-public-vars}.
For example, setting \emph{gellmu-verblist} true causes a sequence of lines
beginning with the line \qquostr{\bsl;begin\{verbatim\}} and ending
with the line \qquostr{\bsl;end\{verbatim\}} to be considered
\emph{verbatim} as in \latex;, i.e., without requiring escaped forms
of special characters, and then to be set as a simple \emph{verblist},
which is in most circumstances superior to \gellmu's version of
pre-historic \emph{verbatim}.\footnote{
The main reasons that this version is not the default with a call to
\emph{gellmu-trans} are:
\begin{enumerate}
\item It breaks the paradigm under which a \gellmu command name is the
name of an \sgml element.
\item It breaks backward compatibility with earlier versions of the
syntactic translator, i.e., breaks older documents.
\item It is felt that the user invoking \emph{verblist} this way
should be aware of what is being done.
\end{enumerate}
Note that direct invocation of \emph{verblist} requires escaping
special characters. Thus, using the function call
\emph{gellmu-verblist} converts the name \emph{verbatim} from a
command name to a meta-command name.}
\subsubsection[][\label{interrupt}]{Interrupting the Syntactic
Translator}
Interruption of the \gst will be necessary in the event that the
combined use of \emph{newcommand}, \emph{macro}, \emph{Macro},
and \emph{mathsym} (advanced mode only) leads to infinite recursive
loops. Users should avoid the use of \emph{macro} and \emph{Macro}
unless such use is absolutely necessary since these metacommands
present greater looping risks.
Inasmuch as there are two ways to invoke the \st, there are two
different procedures for interrupting it should that be necessary.
\begin{description}
\item[Batch mode invocation] This is the case when \Emacs is launched
in batch processing mode to run the \st. To interrupt the \st in this
case one must interrupt the \emacs process. The author does not
know of any case when \emacs does not respond to standard interrupts.
For example, on \softw{linux} systems pressing \quophrase{Control-C}
when the process was launched from a shell provides a standard interrupt.
If the processed was launched in some other way, a normal \qquostr{kill}
addressed to the process should have the same effect.
\item[Interactive invocation] This is the case when the \st is launched
from within the \Emacs editing interface. Use the standard \emacs
function \quophrase{quit}, accessed with the key \qquostr{C-g} (Control-G)
to interrupt the \st.
\end{description}
\subsection[][\label{usingdps}]{Using the \dps}
\subsubsection{Staged Design}
The items in the \dps are components for use with staged processing.
The document type may be used with any \sgml system. Of course, one
may not use a parser that is limited to \xml with the \sgml version of
the document type. Moreover, if one makes use of features in the
\sgml version such as the positional argument and option elements,
then one might want to provide translation to the \xml version of the
document type.
No particular processing system is required for the \xml version of
the document type. For example, one might profitably write an
\href{http://www.w3.org/TR/xslt}{\abbr{XSLT}} sheet for translation to some
other format and then submit the document and the \abbr{XSLT} sheet
to an \abbr{XSLT} engine such as \softw{xt}, \softw{xsltproc}, or
\softw{saxon}.
\subsubsection{Default Staging}
The release includes \upanch{bin/linux/lmkg} and
\upanch{bin/linux/mmkg} as example driver scripts for running the
following sequence. (The \upanch{bin/win32} directory contains
old driver scripts for the MS Windows command line that might someday
be worth updating.) The behavior of these driver scripts depends
on the way they are called though the specific of this are somewhat
different for \quostr{lmkg} than for \quostr{mmkg}. The older
\quostr{lmkg} scripts do not generate \mxhtml; at this point they are
provided primarily for backward compatibility.
The \quostr{mmkg} scripts by default make \mxhtml but not if called by
a name, e.g., via a symbolic link, without the substring \qquostr{mm}.
If an \quostr{mmkg} script is called with a name ending in the string
\qquostr{froms} or \qquostr{fromx}, then it will take as starting
point, respectively, an \sgml, i.e., \qquostr{.sgml}, or author-level
\xml, i.e., \qquostr{.xml}, version of the document. Thus, for
example, \quostr{mmkgfromx} might be used to operate on a document
that is an author-level \xml translation from a non-\gellmu source.
\bold{Caution.} These scripts, like all shell scripts, should be
examined for file system locations, system environmental variables,
and other platform-specific and location-specific issues. The user
who introduces a script on a platform should understand the script.
A user who does not understand a script should not attempt to
introduce it on a local platform.
\label{flow} Flow in the \dps is portrayed in the following diagram:
\display{\includegraphics[:scale="0.6"]{gcompst}}
These are the stages in the \dps pipeline:
\begin{enumerate}
\item Prepare \gellmu source using a text editor.
\item Process the source with the \st to obtain an \sgml document under
the \ddt.
\item Use \softw{nsgmls} to validate the \sgml document and obtain an
\abbr{ESIS} for it as output.
\item Submit the \sgml \abbr{ESIS} as input\footnote{
Specifically, this mention of \quophrase{input} refers to
what is called \quophrase{standard input} in a command line
situation. There may be a challenge here on platforms that do
not provide a command line.
}
to the \softw{Perl}
program \softw{sgmlspl} with the script \upanch{xplaingart.pl}
as file argument, obtaining an author-level \xml document.
\item Use \softw{nsgmls} to validate the author-level \xml document
and then submit its \abbr{ESIS} as input to to \softw{sgmlspl}
with the script \upanch{xmlgart.pl}, obtaining an elaborated
\xml document. This document, which is accompanied by several
auxiliary files\footnote{
Formally, two of these auxiliary files are considered part of the
elaborated \xml document.
},
has things such as sectional unit numbers and cross references
fully resolved so that there will be consistency in these across
the various output formats.
\item Use \softw{nsgmls} to validate the elaborated \xml document and
submit its \abbr{ESIS} as input multiply to \softw{sgmlspl}:
\begin{enumerate}
\item with the script \upanch{htmlgart.pl} to obtain a
classical \html document that then will be validated if an
\html validation program is identified in the driver script.
\item with the script \upanch{ltxgart.pl}
as file argument, obtaining a \latex; document. The \latex
document is then built with \softw{latex} to make a \dvi
file and with \softw{pdflatex} to make a \pdf file.
\item for a pipeline using successive runs of \softw{sgmlspl} with
3 scripts, \upanch{mathcdata.pl}, \upanch{mathprep.pl},
and \upanch{htmlgart.pl} (called in a special way) to make
a \mxhtml file that is then checked for \xml well-formedness
using \softw{xmlwf}, checked for certain kinds of \mathml
errors using \softw{sgmlspl} with \upanch{mval.pl}, and
validated if a suitable validation program is
identified in the driver script.
\end{enumerate}
\end{enumerate}
\subsubsection{Parsing with \softw{nsgmls}}
The program \softw{nsgmls} is part of the
\href{http://www.jclark.com/sp/}{\abbr{SP}} package, which includes
extensive documentation. Those familiar with it will want to ignore
these hints.
Since for both the \sgml and the \xml versions of the \ddt \abbr{SP}
requires non-default \sgml declarations, it is recommended that the user
employ \sgml catalogs, one for \sgml and another for \xml.
The file system location of a catalog is conveyed to \softw{nsgmls}
as the value of its command line argument immediately following the
argument \quophrase{-c}.
Each catalog should contain an \abbr{SGMLDECL} directive that is the
file system location of an \sgml declaration. Aside from that a catalog
may contain a number of three string lines of either of the following
forms
\begin{menu}
\item PUBLIC~~\emph{formal-public-identifier}~~\emph{quoted-pathname}
\item SYSTEM~~\emph{quoted-system-identifier}~~\emph{quoted pathname}
\end{menu}
where the quoted pathname, which may be relative to the location of the
catalog, should for this context in each case be that of a
\abbr{DTD} file.
It is recommended in each case that \softw{nsgmls} be run with
arguments \quophrase{-l} (for propagating line number references) and
\quophrase{-oempty} (for flagging defined-empty elements). For
processing the \xml version of the \ddt one should additionally use
the argument \quophrase{-wxml}.
Additionally, a user may wish to make locally-specific arrangements
for the handling of character sets.
\subsubsection{Processing with \softw{sgmlspl}}
The program \softw{sgmlspl} is part of David Megginson's
\abbr{SGMLSPM} package. Megginson's extensive documentation for it
may be found in the (December 1995) release found at
\href{http://www.cpan.org/}{CPAN}.\footnote{The code for \qquostr{SGMLS.pm}
included in this \gellmu release contains a very small modification
of Megginson's 1995 release to add a method \emph{\$element-\gtc;defempty},
offered by Dave Holden, for indicating whether an element is defined-empty.
}
One uses \softw{sgmlspl} by calling the Perl program \softw{sgmlspl}
with an \abbr{ESIS} as input and a script as argument. Additional
arguments become arguments for the script.
Although some operating systems provide a way for dealing with a Perl
program, which is stored in a text file, as an executable object, in
other cases one must explicitly call the Perl engine as a program with
an \abbr{ESIS} as input, the system name of \softw{sgmlspl} as first
argument, and (the system name of) a script as second argument. In
both cases one will want to arrange, perhaps with an environmental
variable or perhaps with the \quophrase{-I} argument to the Perl engine,
for the directory containing \qquostr{SGMLS.pm} and its supporting
module \qquostr{SGMLS/Output.pm} to be in its path array
\quostr{@INC}.
\subsubsection[][\label{environmentals}]{Environmental Variables}
There are a number of environmental variables that affect processing
in the \dps. The names all begin with the string \qquostr{GELLMU\_}.
Of course, the names are case-sensitive.
Many of these variables are set in the distributed driver scripts.
When that is the case, the distributed driver scripts commonly check
for a previous setting (which may, therefore, be easily placed in a
fronting script that makes a setting and then just calls the
distributed driver).
Setting environmentals can be difficult in a non-Unix-like
operating system environment. This is one reason why the author
generally recommends that Windows users install \gellmu under
\href{http://www.cygwin.com/}{\softw{Cygwin}}.
\begin{defnlist}
\dte{Dir} The top of the directory tree where \gellmu is installed.
\dte{StyleDir} URI or directory location of \css and \abbr{XSLT} style
sheets that is used by the \dps in writing links in \xml,
\html and \mxhtml files.
The value usually has a different meaning under the eye of a
web server than in a local file system. A relative URI or
path is usually best. A value like \qquostr{../webstyle}
can often be made to work both ways.
\dte{CSSName} Name, relative if given relative syntax, to the value of
\genv{StyleDir}, of the \css stylesheet written by the
\dps in \html and \mxhtml files.
\dte{XhtmlSuffix} Suffix given to \xhtml or \mxhtml files written
by \quostr{htmlgart.pl}.
\dte{MathJaxURL} URL for the version of \qquostr{MathJax.js} used
for the HTML5 + MathJax output. This defaults to
the latest version on the \softw{MathJax} CDN server.
\dte{NoUMSS} Value $0$ or $1$: if $1$, signals to \quostr{htmlgart.pl}
that it should not link to \abbr{W3C}'s
\href{http://www.w3.org/Math/XSL/}{UMSS} \abbr{XSLT}
stylesheets.
\dte{UTF8} Value $0$ or $1$: signals to \softw{sgmlspl} scripts that
Perl's handling of the UTF-8 text encoding should be invoked.
The meaning is subtly different between Perl versions 5.6
and 5.8.
\dte{Encoding} String value for text encoding that is set by the
\quostr{xplaingart.pl} in writing author-level \xml and by
\quostr{xmlgart.pl} in writing elaborated \xml. (\html,
\xhtml, and \mxhtml are always written with the UTF-8
encoding.)
\dte{LaTeXUTF8} Value $0$ or $1$: signals to \softw{latex} and
\softw{pdflatex} to expect the UTF-8 encoded text in
their input.
\dte{LaTeXStyle} Pathname for \latex stylesheets that \softw{latex}
and \softw{pdflatex} should use when such stylesheets are
not properly positioned for \tex system \abbr{KPSE}-based
location. (It's better to use a local or personal
\abbr{TDS} tree.)
\dte{PAPER} String value for the paper used in printing; becomes an
option for the \emph{documentclass} command in the output
\latex; file.
\dte{Memoir} Value $0$ or $1$: if $1$, use the \emph{memoir}, rather
than \emph{article}, documentclass in the output \latex file.
\dte{DefaultEmptyEqncenter} \emph{Experimental.} String value
consisting of a small bit of \latex; wrapped as a Perl
string to use in tweaking the \latex-rendered appearance of
a \gellmu \emph{eqnarray} (which is rendered in \latex using
either \emph{align} or \emph{aligned}, depending on
numbering arrangements) in the case of an empty middle cell
(\emph{eqncenter}). The current default value used in
\quostr{ltxgart.pl} is the string \qquostr{ \bsl;qquad }. Be
mindful of how such a string can be entered as a literal
string in Perl or as part of an on-the-fly environmental
setting from a command line shell.
\dte{XLink} Value $0$ or $1$. How to handle links in \mathml output
when writing \mxhtml. Such links, which are currently
illegal inside \mxhtml math zones, are confined to
|\text{...}| areas in \gellmu. If the value is $1$, use
XLink; otherwise, switch into the \html namespace and
write an \html anchor. (\softw{Firefox} handles both,
while more of the other browsers seem to choke on the
namespace switch than choke on the necessarily cumbersome
use of XLink.)
\dte{OriginLabel} Name for an automatic label key, chiefly of occasional
value for \html and \mxhtml outputs, that, when this
variable is present in the environment, places a link target,
with id the value of this variable, at the top of the document.
\end{defnlist}
\appendix{archive}{The \gellmu Archive}
The \gellmu Archive is the web site
\urlanch{\gurl/}.
It is the source for late breaking information about \gellmu.
Among other things, it houses a largely uncommented
\href{\gurl/examples}{archive of examples}. This is provided
in the belief that the study of examples is one of the quickest
ways to learn a markup language.
Of course, this document, which is furnished with the release,
is also an example.
Another item, also an example, that is housed in the archive
is \href{\gurl/examples/gfaq.html}{The \gellmu \abbr{FAQ}}.
\appendix{release}{Release Notes}
This version of the manual represents current development code to the
extent that it is documented. The \siref{materials}{\gellmu materials} consist of:
\begin{enumerate}
\item The manual, which is this document.
\item The \gst, a \emacs Lisp program, which is the only
item of software required for those who simply wish to use \gellmu
markup for the conscious preparation of \html documents or documents
under some other classical \sgml or \xml document type for which the
user is otherwise equipped.
\item The \gellmu \dps, which consists of
an \sgml document type called \emph{article}, an \xml document type
also called \emph{article}, and three separate collections of \perl
functions for the well-known \perl \sgml processing framework
\softw{sgmlspl} by \href{http://www.megginson.com/}{David Megginson}.
A very slightly modified version of Megginson's \perl library
\softw{SGMLSpm} that provides a method for detecting defined-empty
\sgml elements, as flagged in an \sgml parse stream in \abbr{ESIS}
format, is included as part of the \dps.
Since it is by size 60\% of the software content of the
Megginson package on \href{http://www.cpan.org/}{CPAN}, the rest of the
package, licensed under the \gnu
\href{http://www.gnu.org/copyleft/gpl.html}{General Public License}
is distributed with the \dps as well, though without its
documentation. The distribution includes 7 scripts for use with
\softw{sgmlspl} in the \dps pipeline. For more
on this see \siref{usingdps}{``Using the \dps''}.
\end{enumerate}
\subsection{Comments on the Syntactic Translator}
The \gst is more mature than the other components. The following
comments pertain to it.
\begin{defnlist}
\dfitem{internationalization}
Internationalization has a considerable and evolving
level of support in \emacs. The concept is that an author resides in
a locale. When the author enters a character from a locale, it gives
rise in \emacs to a somewhat complicated multibyte entity that can have
\quophrase{properties}. Particularly relevant variables in \emacs
are: \quostr{coding-system-for-write} and
\quostr{buffer-file-coding-system}. \gellmu provides the user
variable \quostr{gellmu-sgml-default-coding}, which should be properly
coordinated via driver script settings with one's \sgml parser.
\dfitem{inclusions}
It is not actually a limitation that a \gellmu source file cannot
be included in another. The primary reason is that one should
make use of the inclusion mechanism of \sgml. For that one needs
to define the included pieces as entities in the direct internal
subset \footnote{
The direct internal subset is the content of the optional argument of
the \emph{documenttype} metacommand that follows its required argument.
It should be noted in the \dps that the direct internal
subset cannot be propagated to the \xml form of \emph{article}
because it is digested by any standard \sgml parser and, hence,
by any translation based on a standard parsing. Thus,
any pieces are merged in the \xml form of an article although
the translator \quostr{xmlgart} might be modified to
construct an internal declaration subset there and provide
partitioning of the \xml version among filesystem pieces based
on document structure.
}
of the source file and then reference each as an entity, e.g.,
\qquostr{\§2;} at the appropriate location in the source
file where it is to be included. Because the inclusion happens
at the \sgml level there are two points to observe:
\begin{enumerate}
\item Macro information is local to each source file.
\item The situation is optimal for the location of validation
errors provided that one's parser reports such errors by
filename and line number since the syntactic translator
provides line number alignment between source and generated
\sgml.
\end{enumerate}
A second reason is that source inclusion would disturb
line number alignment between source and \sgml output. This
is important for the interpretation of \sgml validation error
messages. Such validation is considered routine, and plays an
important role in detecting an author's mistakes. Some author errors
are diagnosed in the syntactic translator.
A third reason, which at the same time might be considered also
a disadvantage, is that all of \gellmu's macro facilities are
local to each source file. This adds both robustness and flexibility
at the price of the inconvenience of physical inclusion of common
macro definitions.
\dfitem{variable management}
This refers to the management of user variables in the syntactic
translator. These are \elisp variables. One who is familiar with
\elisp should be able to provide values in batch mode without
making changes in the \elisp source.\footnote{
Please observe the rules of the \latex; project regarding filenames
as well as the license rules of the \gnu General Public License
if you wish to distribute a modified version of the \elisp
source. Alternatively, the author is always interested in learning
of suggestions for change.}
Setting values interactively in the Emacs editing interface can be
done easily using \qquostr{M-x set-variable}.
With a future release it is likely that additionally a user resource
file for custom variable settings without the need for writing \elisp
code will be provided.
\dfitem{Bugs}
No serious existing problem is known in the \gst at the time of this
release. Of course, as stated in source code comments, there is no
warranty of any kind. Please report bugs to the author:
\,\anch[Href="mailto:hammond@math.albany.edu"
]{\path{hammond@math.albany.edu}}\,.
\begin{itemize}
\item \bold{Reserved element names.}
The \gst reserves for its own internal use all \sgml or \xml element
names in which the first numeric character in the range
\quophrase{0--9} is the character \quophrase{0}.
\item \bold{Limitation on braces in macros.}
Unbalanced braces are not permitted in either the name or the value
field of any form of macro metacommand.
\item \bold{First cell limitation.}
In the \latex-like emulation of an \emph{array} or \emph{tabular}
environment the first cell in each row must have something other than
white space. Of course, sometimes no content is wanted, and then
|\empty| (for nil markup, not to be confused with the mathematical
|\emptyset|) is one way to handle it, but this author usually uses
something that is mostly inconsequential like ``|~|'' or
``|\,|''. Another way to handle it is to invoke the names
of the \sgml elements, i.e., |\firstcell{}| for \emph{tabular} or
|\firstacell{}| for \emph{array}.
\item \bold{Concept of advanced \gellmu immature.}
Inasmuch as didactic article is the only document type for
which the idea of advanced \gellmu has been implemented, the general
concept of advanced \gellmu is not fully developed in the \gst.
Basic \gellmu is characterized in the \gst by the evaluation of
the Boolean variable \emph{gellmu-straight-sgml} to \quophrase{true}.
This automatically make the Boolean variable
\emph{gellmu-regular-sgml} true.
Full LaTeX-like support for the \dps is realized only by both
of these Booleans being false. Thus, advanced \gellmu will
need to evolve in the space in between, probably after the
introduction of further such Boolean variables, some public and
some private. This technique will make it possible for the
code to continue performing as now when the variable
\emph{gellmu-straight-sgml}, the flag for basic \gellmu, is true
and also when both of these flags are false.
\item \bold{Reserved strings.}
The strings \quophrase{|<<|} and \quophrase{|>>|} have been reserved
as future notation for mathematical objects. Although it might seem
at first glance that this type of short hand has no place apart from
the fully \ll environment of the \gst in the context of the \dps, in
which they have not yet been used, it is actually not so clear that
one could not make sensible use of such notation in the context of
\quophrase{\xhtml plus \mathml} under advanced \gellmu along with
other features such as blank lines for new paragraphs and many other
mathematical shortcuts. It awaits the further development of advanced
\gellmu, and reserving this notation is necessary to ensure backward
compatibility.
Consequently, for example, entering \quophrase{|<>|} is
problematical, because only the first \quophrase{|<|} or \quophrase{|>|}
will be converted to something appropriate. In basic mode
\quophrase{|<|} and \quophrase{|>|} are one-step ways of
circumventing this when these entities are available, which is
guaranteed for any form of \html as well as for any form of \xml. In
the \dps one should use \quophrase{|\ltc;|} and
\quophrase{|\gtc;|}. For other cases the one-step circumvention is
to use entity references to the numeric character codes, e.g., in
\abbr{ASCII} \quophrase{|C;|} and \quophrase{|E;|}, and for
convenience these may be brought up as macros, perhaps
\quophrase{|\lt|} and \quophrase{|\gt|}.
\end{itemize}
\end{defnlist}
\subsection{Comments on the Didactic Production System}
The \dps is to be understood as a potential base for development.
As such it is not intended ever to offer everything that might
be imagined. The following comments pertain to it.
\subsubsection{Internationalization}
Internationalization has been a concern of the project. It is
possible, for example, to use the ISO-Latin-1
character \squophrase{\acute{e}} in the name of an element. The
didactic \emph{article} document type offers, for example, an element
\emph{\acute{e}tale}, which is a style, parallel to \emph{bold}.
Use of the character \squophrase{\acute{e}} as a raw word character
data with the \dps is less robust than the more careful
\qquostr{\bsl;acute\{e\}}\footnote{
The corresponding usage in \latex; would be \qquostr{\bsl;\rsq;e};
this could be brought into \gellmu source using \emph{\bsl;macro},
but it must be resolved to a name in the output of the \gst where
everything that is markup needs a name. Rather than using a
general container \emph{acute}, the document type could have provided
a name for the specific character.
}
construction, which is desirable for translation of \emph{article}
to formats that do not support latin1. For that matter, the
exact extent of \latex;'s support of latin1 is a bit tricky, and
the whole matter of internationalization is currently undergoing
change in the \latex; project.\footnote{Alternatively \latex; source
can be submitted to an alternative \tex; engine such as \softw{xetex},
\softw{luatex}, \softw{omega}, \ldots that is an extending
modification of Donald Knuth's program \tex;.}
\subsubsection[][\label{doctypes}]{Document Type Definitions}
Currently the project has one \abbr{SGML} document type definition
and two \abbr{XML} document type definitions. Files under the
various document types are suffixed as follows:
\display{\begin{tabular}{ll}
First stage \abbr{SGML} & \quostr{.sgml} \\
Author Level \abbr{XML} & \quostr{.xml} \\
Elaborated \abbr{XML} & \quostr{.exml}
\end{tabular}}
Additionally, in the three steps of processing to generate an
\mxhtml file from an elaborated \xml file there are two intermediate
\xml files generated, the first with suffix \qquostr{.yml}, which
lives under the document type definition for an elaborated \xml
document, and the second with suffix \qquostr{.zml}, an \xml file
for which there is no extant formal document type
definition.\footnote{There is no formal document type definition
for a \qquostr{.zml} file because such a file is endowed via XML
attributes with information about tree structure for
mathematical zones.}
The author-level \abbr{XML} document type is formalized by the
\abbr{DTD} "axgellmu.dtd", while the elaborated \abbr{XML} document
type is formalized by a modification that is found in the \abbr{DTD}
"uxgellmu.dtd". (The latter was the only \abbr{XML} document type
used with the regular \abbr{GELLMU} production stream prior to
October, 2006.)
The document type represented by "uxgellmu.dtd" is now called the
elaborated \abbr{XML} document type.
The author-level \abbr{XML} document type is suitable as a translation
target from other markups. The elaborated \abbr{XML} document type
should not be used as a translation target other than from the
\abbr{GELLMU} author-level \abbr{XML} document type.
All document type definitions are available under the \abbr{UTF-8}
text encoding. The two older document type definitions will continue
to exist for a while under the Latin-1 (\abbr{ISO-8859-1}) text
encoding. The text encoding of a so-called \abbr{DTD} file (the main
part of a document type definition in this system) is significant
in regard to the names of \abbr{SGML}/\abbr{XML} entities and
elements rather than in regard to document instances which might
be processed. The names of the \abbr{DTD} files are:
\label{dtdfiles}
\begin{display}
\begin{tabular}{l|cc}
~ & Latin-1 & \abbr{UTF-8} \\
\hline
First stage \abbr{SGML} & \quostr{gellmu.dtd} & \quostr{ugellmu.dtd} \\
Author Level \abbr{XML} & ~ & \quostr{axgellmu.dtd} \\
Elaborated \abbr{XML} & \quostr{xgellmu.dtd} & \quostr{uxgellmu.dtd}
\end{tabular}
\end{display}
\subsubsection{Translation to \xml}
Presently the author-level \xml files link to a \css stylesheet that
provides primitive rendering. One could go further in this direction,
but the rendering of mathematics will be limited without more development
in that direction of \css.
\subsubsection{Translation to \html}
\begin{description}
\item[Math in classic \html]
The classic \html output does not use graphic images for mathematical
zones in the manner of programs like \softw{latex2html}. Instead it
uses pseudo-\tex notation for math.
There are a
number of reasons:
\begin{enumerate}
\item Well typeset mathematics is available in the modern form of
\html that is more precisely called \mxhtml.
\item Graphical images completely block accessibility in the
sense of the World Wide Web Consortium's
\href{http://www.w3.org/WAI/}{Web Accessibility Initiative}.
\item The present classical \html output files may be deciphered
in terminal window browsers.
\item The present classical \html output files may be ``dumped'' to
plain text using a program such as \softw{lynx} or \softw{w3m}
for various sometimes useful purposes.
\end{enumerate}
\item[Style.]
\html and \mxhtml made with the \dps now rely on \css, even, for
some things, level 2 \css.
\end{description}
\subsubsection{Translation to \latex;}
This translator writes \latex;_{2E}. A number of packages, including
\emph{graphicx}, \emph{amsmath}, \emph{amssymb}, \emph{amsfonts},
\emph{bm}, \emph{url} (not \emph{hyperref} for the standard track
where the focus is on printed output), and \emph{inputenc} for UTF-8
(which may be needed even if the \gellmu source or, otherwise, the
author-level \xml source is not UTF-8 encoded).
Apart from current font availability issues, the author would have
preferred to invoke the T1 font encoding.
Even though \gellmu source uses the names \emph{equation} and
\emph{eqnarray}, in the \latex formatting \emph{amsmath} constructions
are used.
A small modification of this translator can be used to write
Adobe's Portable Document Format (\pdf) with pages sized for screens
rather than for paper.
\subsubsection{Future Plans}
This is a very limited list.
\begin{defnlist}
\dfitem{A literate document type definition.}
Capable of spawning not only the 5 \dtd's but type definitions
under other mechanisms such as, for example, \softw{RelaxNG}.
\dfitem{Mathematical Semantics}
Provision for optional semantic tightening sufficient for authors
wishing to be able to export mathematical markup into computer algebra
systems.
\end{defnlist}
\end{document}