% LaTeX \documentclass[leqno]{article} \usepackage{url} \usepackage{graphicx} \usepackage{amsmath} \usepackage{amssymb} \usepackage{amsfonts} \usepackage{bm} \usepackage{gellmu} \usepackage[margin=100bp,nohead]{geometry} \setlength{\parskip}{6bp} \setlength{\parindent}{0bp} \thispagestyle{empty} \title{Extensible Markup Language (XML)\\[0.25\baselineskip] Standard Generalized Markup Language (SGML)} \date{Last revision: \ June 2, 2006} \newlength{\centerskip} \setlength{\centerskip}{\topsep} \newcommand{\hsf}{\hspace*{\fill}} \newcommand{\tdbc}[1]{\hsf\textbf{#1}\hsf} \newenvironment{menulist}{ \begin{list}{}{ \setlength{\topsep}{0bp} \setlength{\labelwidth}{0.03\linewidth} \setlength{\leftmargin}{0.06\linewidth} \setlength{\itemindent}{0bp} \setlength{\itemsep}{-6bp} \setlength{\parsep}{6bp}} }{\end{list}} \newenvironment{Menulist}{ \begin{list}{}{ \setlength{\topsep}{0bp} \setlength{\labelwidth}{0.03\linewidth} \setlength{\leftmargin}{0.06\linewidth} \setlength{\itemindent}{0bp} \setlength{\itemsep}{3bp} \setlength{\parsep}{6bp}} }{\end{list}} \newenvironment{toclist}{\normalsize \begin{list}{}{ }}{\end{list}} \newenvironment{Toclist}{\large \begin{list}{}{ }}{\end{list}} \newenvironment{citations}{ \begin{list}{}{ \setlength{\topsep}{0bp} \setlength{\labelwidth}{0bp} \setlength{\leftmargin}{0.04\linewidth} \setlength{\labelsep}{0bp} \setlength{\itemindent}{-0.2\leftmargin} \setlength{\itemsep}{3bp} \setlength{\parsep}{0bp}} }{\end{list}} \author{William F. Hammond} \begin{document} \begin{center}\LARGE\bfseries{} Extensible Markup Language (XML)\\[0.25\baselineskip] Standard Generalized Markup Language (SGML) \end{center} \begin{center}\Large\bfseries{} \textsl{William F. Hammond} \end{center} \begin{center} \large\bfseries{} Last revision: \ June 2, 2006 \end{center} \medskip \section*{Table of Contents} \begin{Toclist} \item[]{1\ \ Basic\dotfill{}~\pageref{SU-1}} \item[]{2\ \ Classical \textsc{HTML} is not an \textsc{XML} Language\dotfill{}~\pageref{SU-2}} \item[]{3\ \ The Nature of \textsc{SGML}\dotfill{}~\pageref{SU-3}} \item[]{4\ \ Styling and Translating \textsc{XML} documents\dotfill{}~\pageref{SU-4}} \item[]{5\ \ Example Languages under \textsc{XML} and \textsc{SGML}\dotfill{}~\pageref{SU-5}} \item[]{6\ \ References\dotfill{}~\pageref{SU-6}} \item[]{7\ \ Software Available Locally\dotfill{}~\pageref{SU-7}} \item[]{8\ \ Miscellaneous\dotfill{}~\pageref{SU-8}} \begin{toclist}\normalsize \item[]{8.1\ \ XML and Electronic Data Interchange (EDI)\dotfill{}~\pageref{SU-8.1}} \item[]{8.2\ \ Library Metadata\dotfill{}~\pageref{SU-8.2}} \item[]{8.3\ \ How This Document Was Prepared\dotfill{}~\pageref{SU-8.3}} \end{toclist} \end{Toclist} \section*{1\ \ \label{SU-1}Basic} \par{``Standard Generalized Markup Language'' (\textsc{SGML}) is a language for defining markup languages. \ \textsc{SGML} is defined by the International Standards Organization Document \textsc{ISO} 8879 [1986]. \ } \par{The \textsc{ISO} document is not freely available. \ A copy of it is found in the book: \begin{menulist} \item Charles F. Goldfarb, \emph{The \textsc{SGML} Handbook}, \\{}Clarendon Press, Oxford, 1990. \end{menulist} } \par{``Hypertext Markup Language'' (\textsc{HTML}), the basic language of the World Wide Web, is a markup language under \textsc{SGML}. \ } \par{``Extensible Markup Language'' (\textsc{XML}) is a limited form of \textsc{SGML} that is currently under heavy promotion by the World Wide Web Consortium (\textsc{W3C}). \ It is sometimes perceived as ``extended \textsc{HTML}''. \ \textsc{XML} has been designed to be usable on browsing platforms, while full-fledged \textsc{SGML} is usually more suitable for authoring platforms. \ In fact, \textsc{XML} has for most purposes become the only form of \textsc{SGML} that is suitable for public sharing. \ Many \textsc{SGML} languages\footnote{The phrase ``\textsc{SGML} language'' used here, as well as the parallel phrase ``\textsc{XML} language'', is formally not correct usage. \ What is called here an \textsc{SGML} (respectively, \textsc{XML}) language is formally known as an \textsc{SGML} (respectively, \textsc{XML}) \emph{application}. \ Every \textsc{XML} application may also be regarded as an \textsc{SGML} application. \ There is an identifying correspondence between \emph{applications} in this sense and \emph{document types}. \ } that are realistically suitable for authors admit rapid automatic translation to nearly equivalent \textsc{XML} languages. \ (Note that an \textsc{XML} language need not contain the \textsc{HTML} tag set nor have any relation to \textsc{HTML}, and \textsc{HTML} is not an \textsc{XML} language although it may be automatically converted to a language under \textsc{XML}.) } \section*{2\ \ \label{SU-2}Classical HTML is not an XML Language} \par{Classical \textsc{HTML} refers to the markup language behind World Wide Web locations from the beginning of the Web at CERN, Geneva, until very recently. \ The versions of \textsc{W3C} \textsc{HTML} numbered from 2.0 through 4.01 are all languages under \textsc{SGML} that do not fall within \textsc{XML}. \ } \par{Three simple reasons why \textsc{HTML} is not an \textsc{XML} language are: \begin{enumerate} \item In \textsc{HTML} most paragraphs are marked up using an opentag \texttt{"\string"} at the beginning of the paragraph without needing a closetag \texttt{"\string"} at the end, while there must be a closetag for every opentag in \textsc{XML}. \ \item In \textsc{HTML} tag names are not case-sensitive, while in \textsc{XML} tag names are case-sensitive. \ (A new standard way of converting \textsc{HTML} into an \textsc{XML} language will specify that tag names all be lower case.) \item In \textsc{HTML} some attribute values need not be placed inside quotation marks, while in \textsc{XML} all attribute values must be quoted. \ \end{enumerate} } \par{Early in the year 2000 a new evolute of \textsc{HTML} referred to as \textsc{XHTML}\footnote{URI: http://www.w3.org/TR/2000/rec-xhtml1-20000126} --- but bearing the formal document type name ``html'' (lower case characters only) --- acquired the status of \textsc{W3C} Recommendation. \ \textsc{XHTML}, version 1.0, is an \textsc{XML} language that has the same tag set as \textsc{HTML} 4.01\footnote{URI: http://www.w3.org/TR/1999/REC-html401-19991224}. \ Apart from technical details \textsc{XHTML} 1.0 is almost the same language as \textsc{HTML} 4.01. \ Because of the technical differences, however, a computer does not need the full weight of an \textsc{SGML} processor to interpret \textsc{XHTML}. \ This advantage is offset by the fact that it is slightly more difficult for authors to create \textsc{XHTML} than to create classical \textsc{HTML}. \ } \section*{3\ \ \label{SU-3}The Nature of SGML} \par{While \textsc{SGML} may be described as a language for creating markup languages with a shared syntax, more realistically and more abstractly, an \textsc{SGML} language (formally, an \textsc{SGML} \emph{application}) is a template for processing. \ For this reason when an \textsc{SGML} document (formally, an \textsc{SGML} \emph{instance}) is written, the author is, in fact, setting its text as organized data. \ } \par{The abstract character of languages under the \textsc{SGML} umbrella makes it possible to use the family to describe computer programs. \ The Extensible Style Language (\textsc{XSL}) described below is an example of such an \textsc{SGML} application that is, in fact, an \textsc{XML} application. \ } \section*{4\ \ \label{SU-4}Styling and Translating XML documents} \par{In principle, an author may create a personal \textsc{XML} language. \ To do so the author must be prepared to provide, in addition, (1) companion ``style sheets'' or (2) companion translators. \ } \par{If one uses a language under \textsc{XML} or \textsc{SGML}, one must understand what companion style sheets or translators will be used with that language. \ } \par{A style sheet is a document that is created to provide directions for a processing program, perhaps a printing formatter or a web browser, on the formatting or rendering of a document that is prepared in a markup language. \ } \par{While a translator may be any program, typically a translator is a package of small programs (sometimes called functions) for processing a document under an \textsc{XML} language to some other language, which might be \TeX{}, \textsc{HTML}, another \textsc{XML}, ... \ under a general framework for processing \textsc{XML} or \textsc{SGML}. \ There are free frameworks for writing such programs in various languages. \ Most of these frameworks require pre-processing parsers, and free parsers are also available. \ } \par{Near-term plans for the development of the World Wide Web anticipate major web browsing programs having the capability to provide finely-tuned rendering of \textsc{XML} documents that are accompanied by a style sheet. \ Style sheet support for \textsc{HTML} documents is currently available. \ } \par{Limited rendering of \textsc{XML} documents on the World Wide Web is based on ``Cascading Style Sheets'' (\textsc{CSS})\footnote{URI: http://www.w3.org/Style/CSS/}, which has been in use for customized rendering guidance with \textsc{HTML} browsing programs. \ } \par{A future standard style language for \textsc{XML} documents in World Wide Web browsing programs is called ``Extensible Style Language'' (\textsc{XSL})\footnote{URI: http://www.w3.org/Style/XSL/}. \ \textsc{XSL} is a restricted form of ``Document Style Semantics and Specification Language'' (\textsc{DSSSL})\footnote{URI: http://www.jclark.com/dsssl/} that is written with \textsc{XML} syntax. \ The specification for \textsc{XSL} was still under draft at \textsc{W3C} on March 1, 2000, while a variant called ``\textsc{XSL} Transformation Language'' (\textsc{XSLT})\footnote{URI: http://www.w3.org/TR/xslt}, which may be used for \emph{translating} \textsc{XML} languages to other languages (whether \textsc{XML} or not), became a \textsc{W3C} recommendation in late 1999. \ } \par{While \textsc{XSL}-directed formatting offers more precision than is available with \textsc{CSS}-guided formatting, in the overall world of \textsc{XML} processing one should expect formatting based on either \textsc{CSS} or \textsc{XSL} style sheets to be a limited type of formatting. \ One should expect to obtain the finest typesetting results by going beyond the narrow class of \textsc{XML} translation programs that admit expression in a style sheet language. \ } \par{A relatively new simple example of \textsc{SGML} processing may be found in the system manual under \textsc{SunOS}, version 5.7. \ Observant users of University at Albany SunStations may have noticed that as of the summer of 1999 most of the system manual in the central \texttt{"/usr/man"} area now exists in source form under an \textsc{SGML} document type for the manual rather than, as formerly, in the \emph{nroff} typesetting language. \ (This is temporarily hampering the operation of the classical \textsc{X11} program \emph{xman} for the affected portions of the system manual; text rendering is not affected.) See the manual page for ``solbook'' and browse \texttt{"/usr/lib/sgml"} and \texttt{"/usr/share/lib/sgml"} for more information. \ } \par{A document created carefully today under a well designed \textsc{XML} or \textsc{SGML} language should admit automatic conversion to future formats once an \textsc{SGML} or \textsc{XML} translator for such conversion has been created. \ } \section*{5\ \ \label{SU-5}Example Languages under XML and SGML} \begin{enumerate} \item \textsc{CALS} is a language under \textsc{SGML} that is widely used in the U.S. Department of Defense. \ \item ``DocBook''\footnote{URI: http://www.oasis-open.org/docbook/} is a public language under \textsc{SGML} that may be used by authors. \ A fall 1999 book, Norman Walsh, DocBook: The Definitive Guide\footnote{URI: http://www.docbook.org/tdg/html/} is available online and in bookstores. \ Walsh maintains a web site \url{http://nwalsh.com/} with a great deal of information about related topics, including an excellent tutorial on \textsc{XSL}. \ (\textbf{Campus UNIX Network only}: A copy of the DocBook \textsc{DTD}\footnote{URI: file:///usr/share/local/xml/docbook/dtd/} is available on the local network.) \item The TEI Consortium\footnote{URI: http://www.tei-c.org/} has emerged from the Text Encoding Initiative\footnote{URI: http://www.uic.edu/orgs/tei/} at The University of Illinois at Chicago as custodian of the \textsc{TEI} language definition. \ \textsc{TEI} is another public language that may be used by authors. \ Its modular design has led to the creation of the TEI Pizza Chef\footnote{URI: http://www.hcu.ox.ac.uk/TEI/newpizza.html} web site at Oxford. \ \par{A copy of the current TEI Guidelines\footnote{URI: http://www.tei-c.org/P4X/} in HTML, which includes \emph{A Gentle Introduction to \textsc{XML}}\footnote{URI: http://www.tei-c.org/P4X/SG.html} is available for \textbf{local browsing} on the Sun network from the file system location \url{/usr/share/local/xml/tei/P4X/index.html}\footnote{URI: file:///usr/share/local/xml/tei/P4X/index.html}. \ } \item \textsc{HTML} is a language under \textsc{SGML}. \ \item \textsc{XHTML} (formerly \textsc{HTML}-Voyager) is a language under \textsc{XML}, recommended by the World Wide Web Consortium (\textsc{W3C}), that is designed to be equivalent to \textsc{HTML}. \ It is intended to be the base for extending \textsc{HTML} to a language under \textsc{XML}. \ See: \begin{center} \url{http://www.w3.org/TR/xhtml1/}\,. \end{center} \item \textsc{MathML}, \emph{Mathematical Markup Language} is a client platform language under \textsc{XML} that is intended to add mathematical functionality to the world wide web. \ See: \begin{center} \url{http://www.w3.org/Math/}\,. \end{center} The \textsc{W3C} Recommendation for \textsc{MathML}, version 2, points to a document type definition at \textsc{W3C} for the implementation of a namespace\footnote{URI: http://www.w3.org/TR/REC-xml-names/}-based extension of \textsc{XHTML} that includes \textsc{MathML}. \ \item The \textsc{W3C} working draft on \emph{Scalable Vector Graphics} (\textsc{SVG}) format proposes an \textsc{XML} language for online graphics. \ This draft may be found along with other related information at \begin{center} \url{http://www.w3.org/Graphics/SVG/}\,. \end{center} \item \textsc{XGMML}\footnote{URI: http://www.cs.rpi.edu/\textasciitilde{}puninj/XGMML/}, \emph{eXtensible Graph Markup and Modeling Language}, developed recently in New York's Capital District at RPI\footnote{URI: http://www.rpi.org/}, is an XML application based on GML which is used for graph description. \ See also \begin{center} \url{http://xml.coverpages.org/xgmml.html}\,. \end{center} \item Any programming \emph{assembly language} in which each line consists of an operation code followed by parameters is equivalent to an \textsc{XML} language. \ \item The \emph{device independent} typesetting file format (\textsc{DVI}) associated with the typesetting language \TeX{} (and with the program \texttt{groff}) is equivalent to an \textsc{XML} language. \ \end{enumerate} \section*{6\ \ \label{SU-6}References} \par{The World Wide Web Consortium is the driving force behind \textsc{XML}. \ See:\begin{center} \url{http://www.w3.org/XML/}. \ \end{center} } \par{A 1998 book on \textsc{XML} is: \begin{menulist} \item Charles F. Goldfarb and Paul Prescod, \\{} \emph{The \textsc{XML} Handbook}, Prentice Hall. \ A second edition has now appeared. \ \end{menulist} } \par{A very comprehensive catalogue of information about \textsc{SGML} and \textsc{XML} may be found on the web at \begin{center} \url{http://xml.coverpages.org/}. \ \end{center} } \par{An interesting and useful web site with ties to Sun MicroSystems, one of the principal sponsors of \textsc{XML}, is \begin{center} \url{http://metalab.unc.edu/xml/}\ . \ \end{center} } \par{An early survey \emph{Comparison of \textsc{SGML} and \textsc{XML}}\footnote{URI: http://www.w3.org/TR/NOTE-sgml-xml-971215.html} is available from \textsc{W3C}. \ } \par{Monitoring the UseNet newsgroups \url{news:comp.text.sgml} and \url{news:comp.text.xml} is an excellent way to have a window on current discussion. \ } \par{One may also seek answers to questions in the newsgroups when the answers cannot be obtained locally through the HelpDesk at \url{mailto:helpdesk@csc.albany.edu}. \ However, one should first make sure that the question is appropriate to the specific topic of the newsgroup. \ For example, most questions about creating web pages do not belong in these two newsgroups. \ } \par{Information about the topic of ``mathematics and \textsc{SGML}'' may be found at the (local) \textsc{URL} \begin{center} \url{http://math.albany.edu:8800/hm/sgml/about.html}. \ \end{center} } \section*{7\ \ \label{SU-7}Software Available Locally} \par{The University at Albany \textsc{UNIX} Network has several basic, general purpose, freely available tools for working with \textsc{SGML} and \textsc{XML} including: \begin{enumerate} \item The ``open source'' evolute, called \texttt{"onsgml"}, of James Clark's \textsc{SGML} parser \texttt{"nsgmls"}, which is an application under the \texttt{OpenSP} C++ library. \ \par{ The public location for \texttt{OpenSP} is the \textsl{OpenJade} Project\footnote{URI: http://openjade.sourceforge.net} at \emph{SourceForge}. \ The public location for information about \texttt{SP} is: \begin{center} \url{http://www.jclark.com/}\,. \end{center} } \par{ Note: \texttt{"onsgmls"}, when properly called, may be used to check the structural correctness of an \textsc{HTML} document. \ At the University at Albany the command \texttt{"validhtml"} is an interface to \texttt{"onsgmls"} for this method of \textsc{HTML} validation. \ } \item Script interfaces to various Java-based tools of James Clark for handling \textsc{XML} including: \begin{description} \item[{\texttt{dtdinst}}] a utility to generate an \textsc{XML} instance that models an \textsc{XML} document type definition given in \textsc{DTD} form. \ \item[{\texttt{jcxt}}] the engine called ``xt'' for transformations specified in the \textsc{XSLT} language. \ \item[{\texttt{jing}}] a utility to validate an \textsc{XML} instance against a document type definition specified in the form of either a Relax-NG schema\footnote{URI: http://www.relaxng.org/} or a W3C schema\footnote{URI: http://www.w3.org/XML/Schema}. \item[{\texttt{trang}}] a utility for translations between various types of \textsc{XML} document type definitions \end{description} \item David Megginson's general purpose \textsc{SGML}-to-anything processor, \texttt{"sgmlspl"}, which is an application under his Perl-5 library \texttt{"SGMLSPM"}. \ \par{ Local documentation on \texttt{"SGMLSPM/sgmlspl"} may be found at: \begin{center} \url{file:///usr/share/local/xml/html/sgmlspm/index.html}\, \end{center} } \par{ The public location for information about \texttt{"SGMLSPM/sgmlspl"} for many years \emph{was} \begin{center} \texttt{http://home.sprynet.com/sprynet/dmeggins/}\,. \end{center} That appears to have been superseded by \begin{center} \url{http://www.megginson.com/Software/}\,; \end{center} and \texttt{SGMLSPM/sgmlspl} is also available at CPAN\footnote{URI: http://www.cpan.org/modules/by-authors/David\_Megginson/}. \ } \end{enumerate} } \section*{8\ \ \label{SU-8}Miscellaneous} \subsection*{8.1\ \ \label{SU-8.1}XML and Electronic Data Interchange (EDI)} \par{\textsc{XML} offers a standard framework for the general interchange of many kinds of data. \ The usefulness of \textsc{XML-EDI} lies in the inherent adaptability to this end of the many new tools for handling \textsc{XML}. \ There is a substantial amount of material on this topic in the book by Goldfarb and Prescod cited above. \ See the web site: \begin{center} \url{http://www.geocities.com/WallStreet/Floor/5815/}\,. \end{center} } \par{The World Wide Web Consortium (W3C) has basic information about how one might proceed to model a database in XML at the site: \begin{center} \url{http://www.w3.org/XML/}\,. \end{center} } \subsection*{8.2\ \ \label{SU-8.2}Library Metadata} \par{The Open Archives Initiative (\url{http://www.openarchives.org/}) has developed a protocol for interoperable handling of library metadata across the network based on records prepared under special purpose \textsc{XML} document types that are defined using the new notion of XML schema\footnote{URI: http://www.w3.org/XML/Schema}. \ } \subsection*{8.3\ \ \label{SU-8.3}How This Document Was Prepared} \par{This document was prepared in Generalized Extensible \LaTeX{}-like Markup (\textsc{GELLMU}), which is the author's user markup interface for \textsc{SGML} languages. \ Presently the system, still under development, may be used to create both standard \LaTeX{}\footnote{URI: general.ltx} and \textsc{HTML}\footnote{URI: general.html} versions from a single \LaTeX{}-like source\footnote{URI: general.glm}, a text file. \ The program \textsl{latex} may be used to prepare a high quality typeset version\footnote{URI: general.dvi} in \textsc{DVI} format\footnote{Donald Knuth's Device Independent Format (\textsc{DVI})} suitable for printing on this system using the program \textsl{dvips}, and a variant of \textsl{latex} known as \textsl{pdflatex} may be used to prepare a different typeset version\footnote{URI: general.pdf} in \textsc{PDF} format and an alternate form of processing to \textsc{HTML} will produce \textsc{XHTML}\footnote{URI: general.xhtml} extended by \textsc{MathML}. \ For more information on this system see \begin{center} \url{http://www.albany.edu/~hammond/gellmu}\,. \end{center} } \end{document}