University at Albany XMLSGML ArchiveExtensible Markup Language (XML)
Standard Generalized Markup Language (SGML)William F. HammondLast revision: June 2, 20061BasicSU-12Classical HTML is not an XML LanguageSU-23The Nature of SGMLSU-34Styling and Translating XML documentsSU-45Example Languages under XML and SGMLSU-56ReferencesSU-67Software Available LocallySU-78MiscellaneousSU-88.1XML and Electronic Data Interchange (EDI)SU-8.18.2Library MetadataSU-8.28.3How This Document Was PreparedSU-8.31BasicStandard Generalized Markup Language (SGML) is a
language for defining markup languages SGML is defined by the
International Standards Organization Document ISO 8879 [1986]The ISO document is not freely available A copy of it is
found in the book:
Hypertext Markup Language (HTML), the basic
language of the World Wide Web, is a markup language under
SGMLExtensible Markup Language (XML) is a limited form
of SGML that is currently under heavy promotion by the World
Wide Web Consortium (W3C) It is sometimes perceived as
extended HTML XML has been designed to be
usable on browsing platforms, while fullfledged SGML is
usually more suitable for authoring platforms In fact, XML
has for most purposes become the only form of SGML that is
suitable for public sharing Many SGML languagesThe phrase SGML language used here, as well as the
parallel phrase XML language, is formally not
correct usage What is called here an SGML (respectively,
XML) language is formally known as an SGML
(respectively, XML) application Every XML
application may also be regarded as an SGML application There
is an identifying correspondence between applications in this
sense and document types
that are realistically suitable for authors admit rapid automatic
translation to nearly equivalent XML languages (Note that an
XML language need not contain the HTML tag set nor have
any relation to HTML, and HTML is not an XML
language although it may be automatically converted to a language
under XML.)
2Classical HTML is not an XML LanguageClassical HTML refers to the markup language behind World Wide
Web locations from the beginning of the Web at CERN, Geneva, until
very recently The versions of W3C HTML numbered from 2.0
through 4.01 are all languages under SGML that do not fall
within XMLThree simple reasons why HTML is not an XML language
are:
In HTML most paragraphs are marked up using an opentag
P at the beginning of the paragraph without needing a
closetag P at the end, while there must be a closetag
for every opentag in XMLIn HTML tag names are not casesensitive, while in
XML tag names are casesensitive (A new standard
way of converting HTML into an XML language will
specify that tag names all be lower case.)
In HTML some attribute values need not be placed inside
quotation marks, while in XML all attribute values must be
quotedEarly in the year 2000 a new evolute of HTML referred to as
href="http://www.w3.org/TR/2000/rec-xhtml1-20000126"XHTML but bearing the formal document type name
html (lower case characters only) acquired the
status of W3C Recommendation XHTML, version 1.0, is
an XML language that has the same tag set as
href="http://www.w3.org/TR/1999/REC-html401-19991224"HTML 4.01 Apart from technical details XHTML 1.0 is
almost the same language as HTML 4.01 Because of the
technical differences, however, a computer does not need the full
weight of an SGML processor to interpret XHTML This
advantage is offset by the fact that it is slightly more difficult for
authors to create XHTML than to create classical HTML3The Nature of SGMLWhile SGML may be described as a language for creating markup
languages with a shared syntax, more realistically and more
abstractly, an SGML language (formally, an SGML
application) is a template for processing For this
reason when an SGML document (formally, an SGML
instance) is written, the author is, in fact, setting its
text as organized dataThe abstract character of languages under the SGML umbrella
makes it possible to use the family to describe computer programs
The Extensible Style Language (XSL) described below is an
example of such an SGML application that is, in fact, an
XML application4Styling and Translating XML documentsIn principle, an author may create a personal XML language To
do so the author must be prepared to provide, in addition,
(1) companion style sheets or (2) companion translatorsIf one uses a language under XML or SGML, one must
understand what companion style sheets or translators will be used
with that languageA style sheet is a document that is created to provide directions for
a processing program, perhaps a printing formatter or a web browser,
on the formatting or rendering of a document that is prepared in a
markup languageWhile a translator may be any program, typically a translator is a
package of small programs (sometimes called functions) for processing
a document under an XML language to some other language, which
might be , HTML, another XML, .. under a general
framework for processing XML or SGML There are free
frameworks for writing such programs in various languages Most of
these frameworks require preprocessing parsers, and free parsers
are also availableNearterm plans for the development of the World Wide Web anticipate
major web browsing programs having the capability to provide
finelytuned rendering of XML documents that are
accompanied by a style sheet Style sheet support for HTML
documents is currently availableLimited rendering of XML documents on the World Wide Web
is based on
href="http://www.w3.org/Style/CSS/"Cascading Style
Sheets (CSS), which has been in use for customized
rendering guidance with HTML browsing programsA future standard style language for XML documents in World
Wide Web browsing programs is called
href="http://www.w3.org/Style/XSL/"Extensible
Style Language (XSL) XSL is a restricted form
of href="http://www.jclark.com/dsssl/"Document Style
Semantics and Specification Language (DSSSL) that is
written with XML syntax
The specification for XSL was still under draft at W3C
on March 1, 2000, while a variant called
href="http://www.w3.org/TR/xslt"XSL Transformation
Language (XSLT), which may be used for translating
XML languages to other languages (whether XML or not),
became a W3C recommendation in late 1999While XSLdirected formatting offers more precision than is
available with CSSguided formatting,
in the overall world of XML processing one should expect
formatting based on either CSS or XSL style sheets to be
a limited type of formatting One should expect to obtain the finest
typesetting results by going beyond the narrow class of XML
translation programs that admit expression in a style sheet languageA relatively new simple example of SGML processing may be found
in the system manual under SunOS, version 5.7
Observant users of University at Albany SunStations may have noticed
that as of the summer of 1999 most of the system manual in the central
usrman area now exists in source form under an SGML
document type for the manual rather than, as formerly, in the
nroff typesetting language (This is temporarily hampering the
operation of the classical X11 program xman for the
affected portions of the system manual; text rendering is not
affected.) See the manual page for solbook and browse
usrlibsgml and usrsharelibsgml for more
informationA document created carefully today under a well designed XML or
SGML language should admit automatic conversion to future
formats once an SGML or XML translator for such
conversion has been created5Example Languages under XML and SGMLCALS is a language under SGML that is widely used
in the U.S. Department of Defensehref="http://www.oasis-open.org/docbook/"DocBook
is a public language under SGML that
may be used by authors A fall
1999 book, Norman Walsh, href="http://www.docbook.org/tdg/html/"DocBook: The Definitive Guide is available online and
in bookstores Walsh maintains a web site http://nwalsh.com/
with a great deal of information about related topics, including an
excellent tutorial on XSL
(Campus UNIX Network only: A copy of the
href="file:///usr/share/local/xml/docbook/dtd/"DocBook DTD is available on the local network.)
The href="http://www.tei-c.org/"TEI Consortium has emerged from
the href="http://www.uic.edu/orgs/tei/"Text Encoding
Initiative at The University of Illinois at Chicago as custodian
of the TEI language definition TEI is another public
language that may be used by authors Its modular design has led to the
creation of the href="http://www.hcu.ox.ac.uk/TEI/newpizza.html"TEI Pizza Chef web site at OxfordA copy of the current
href="http://www.tei-c.org/P4X/"TEI Guidelines in HTML,
which includes
href="http://www.tei-c.org/P4X/SG.html"A Gentle Introduction
to XML is available for local browsing on the Sun network
from the file system location
href="file:///usr/share/local/xml/tei/P4X/index.html"usrsharelocalxmlteiP4Xindex.htmlHTML is a language under SGMLXHTML (formerly HTMLVoyager) is a language
under XML, recommended by
the World Wide Web Consortium (W3C), that is designed to be
equivalent to HTML
It is intended to be the base for extending HTML to a
language under XML See:
http://www.w3.org/TR/xhtml1/.MathML, Mathematical Markup Language is a client
platform language under XML that is intended to add mathematical
functionality to the world wide web
See:
http://www.w3.org/Math/.
The W3C Recommendation for MathML, version 2, points to a
document type definition at W3C for the implementation of a
href="http://www.w3.org/TR/REC-xml-names/"
namespacebased extension of XHTML that includes MathMLThe W3C working draft on Scalable Vector Graphics
(SVG) format proposes an XML language for online
graphics This draft may be found along with other related information
at http://www.w3.org/Graphics/SVG/.href="http://www.cs.rpi.edu/puninj/XGMML/"XGMML,
eXtensible Graph Markup and Modeling
Language, developed recently in New York's Capital District at
href="http://www.rpi.org/"RPI,
is an XML application based on GML which is used for graph
description See also
http://xml.coverpages.org/xgmml.html.Any programming assembly language in which each line
consists of an operation code followed by parameters is
equivalent to an XML languageThe device independent typesetting file format (DVI)
associated with the typesetting language (and with the program
groff) is equivalent to an XML language6ReferencesThe World Wide Web Consortium is the driving force behind XML
See:http://www.w3.org/XML/A 1998 book on XML is:
A very comprehensive catalogue of information about SGML
and XML may be found on the web at
http://xml.coverpages.org/An interesting and useful web site with ties to Sun MicroSystems,
one of the principal sponsors of XML, is
http://metalab.unc.edu/xml/An early survey href="http://www.w3.org/TR/NOTE-sgml-xml-971215.html"Comparison of SGML and XML is
available from W3CMonitoring the UseNet newsgroups news:comp.text.sgml and
news:comp.text.xml is an excellent way to have a window
on current discussionOne may also seek answers to questions in the newsgroups when the
answers cannot be obtained locally through the HelpDesk at
mailto:helpdesk@csc.albany.edu However, one should first
make sure that the question is appropriate to the specific topic of
the newsgroup For example, most questions about creating web pages
do not belong in these two newsgroupsInformation about the topic of mathematics and SGML
may be found at the (local) URL
http://math.albany.edu:8800/hm/sgml/about.html7Software Available LocallyThe University at Albany UNIX Network has several basic,
general purpose, freely available tools for working with SGML
and XML including:
The open source evolute, called onsgml, of
James Clark's SGML parser nsgmls, which is an
application under the OpenSP C library The public location for OpenSP is the
href="http://openjade.sourceforge.net"OpenJade
Project at SourceForge
The public location for information about SP is:
http://www.jclark.com/. Note: onsgmls, when properly called, may be used to check
the structural correctness of an HTML document At the
University at Albany the
command validhtml is an interface to onsgmls for
this method of HTML validationScript interfaces to various Javabased tools of James Clark for
handling XML including:
dtdinsta utility to generate an XML instance
that models an XML document type definition given in
DTD formjcxtthe engine called xt for transformations specified
in the XSLT languagejinga utility to validate an XML instance
against a document type definition specified in the form of either
a href="http://www.relaxng.org/"RelaxNG schema or a
href="http://www.w3.org/XML/Schema"W3C schema.
tranga utility for translations between various types
of XML document type definitions
David Megginson's general purpose SGMLtoanything processor,
sgmlspl, which is an application under his Perl5 library
SGMLSPM Local documentation on SGMLSPMsgmlspl may be
found at:
file:///usr/share/local/xml/html/sgmlspm/index.html The public location for information about SGMLSPMsgmlspl
for many years washttp:home.sprynet.comsprynetdmeggins.
That appears to have been superseded by
http://www.megginson.com/Software/;
and SGMLSPMsgmlspl is also available at
href="http://www.cpan.org/modules/by-authors/DavidMegginson/"
CPAN8Miscellaneous8.1XML and Electronic Data Interchange (EDI)XML offers a standard framework for the general interchange of
many kinds of data The usefulness of XMLEDI lies in the
inherent adaptability to this end of the many new tools for handling
XML There is a substantial amount of material on this topic
in the book by Goldfarb and Prescod cited above See the web site:
http://www.geocities.com/WallStreet/Floor/5815/.The World Wide Web Consortium (W3C) has basic information about
how one might proceed to model a database in XML at
the site:
http://www.w3.org/XML/.8.2Library MetadataThe Open Archives Initiative (http://www.openarchives.org/)
has developed a protocol for interoperable handling of library metadata
across the network based on records prepared under special purpose
XML document types that are defined using the new notion of
href="http://www.w3.org/XML/Schema"XML schema8.3How This Document Was PreparedThis document was prepared in Generalized Extensible like
Markup (GELLMU), which is the author's user markup interface
for SGML languages Presently the
system, still under development, may be used to create both
href="general.ltx"standard and
href="general.html"HTML versions from a single
href="general.glm"like source, a text file
The program latex may be used to prepare a high quality
href="general.dvi"typeset version in DVI
formatDonald Knuth's Device Independent Format (DVI)
suitable for printing on this system using the program dvips,
and a variant of latex known as pdflatex may be used
to prepare a different href="general.pdf" typeset version in
PDF format and an alternate form of processing to HTML
will produce href="general.xhtml"XHTML extended by
MathML For more information on this system see
http://www.albany.edu/hammond/gellmu.