University at Albany XMLSGML ArchiveExtensible Markup Language (<abbr >XML</abbr >)<brk /> Standard Generalized Markup Language (<abbr >SGML</abbr >)William F. HammondLast revision: June 2, 20061BasicSU-12Classical HTML is not an XML LanguageSU-23The Nature of SGMLSU-34Styling and Translating XML documentsSU-45Example Languages under XML and SGMLSU-56ReferencesSU-67Software Available LocallySU-78MiscellaneousSU-88.1XML and Electronic Data Interchange (EDI)SU-8.18.2Library MetadataSU-8.28.3How This Document Was PreparedSU-8.3
1BasicStandard Generalized Markup Language (SGML) is a language for defining markup languages SGML is defined by the International Standards Organization Document ISO 8879 [1986]The ISO document is not freely available A copy of it is found in the book: Charles F. Goldfarb, The SGML Handbook,Clarendon Press, Oxford, 1990.Hypertext Markup Language (HTML), the basic language of the World Wide Web, is a markup language under SGMLExtensible Markup Language (XML) is a limited form of SGML that is currently under heavy promotion by the World Wide Web Consortium (W3C) It is sometimes perceived as extended HTML XML has been designed to be usable on browsing platforms, while fullfledged SGML is usually more suitable for authoring platforms In fact, XML has for most purposes become the only form of SGML that is suitable for public sharing Many SGML languagesThe phrase SGML language used here, as well as the parallel phrase XML language, is formally not correct usage What is called here an SGML (respectively, XML) language is formally known as an SGML (respectively, XML) application Every XML application may also be regarded as an SGML application There is an identifying correspondence between applications in this sense and document types that are realistically suitable for authors admit rapid automatic translation to nearly equivalent XML languages (Note that an XML language need not contain the HTML tag set nor have any relation to HTML, and HTML is not an XML language although it may be automatically converted to a language under XML.)
2Classical HTML is not an XML LanguageClassical HTML refers to the markup language behind World Wide Web locations from the beginning of the Web at CERN, Geneva, until very recently The versions of W3C HTML numbered from 2.0 through 4.01 are all languages under SGML that do not fall within XMLThree simple reasons why HTML is not an XML language are: In HTML most paragraphs are marked up using an opentag P at the beginning of the paragraph without needing a closetag P at the end, while there must be a closetag for every opentag in XML In HTML tag names are not casesensitive, while in XML tag names are casesensitive (A new standard way of converting HTML into an XML language will specify that tag names all be lower case.) In HTML some attribute values need not be placed inside quotation marks, while in XML all attribute values must be quotedEarly in the year 2000 a new evolute of HTML referred to as href="http://www.w3.org/TR/2000/rec-xhtml1-20000126"XHTML but bearing the formal document type name html (lower case characters only) acquired the status of W3C Recommendation XHTML, version 1.0, is an XML language that has the same tag set as href="http://www.w3.org/TR/1999/REC-html401-19991224"HTML 4.01 Apart from technical details XHTML 1.0 is almost the same language as HTML 4.01 Because of the technical differences, however, a computer does not need the full weight of an SGML processor to interpret XHTML This advantage is offset by the fact that it is slightly more difficult for authors to create XHTML than to create classical HTML
3The Nature of SGMLWhile SGML may be described as a language for creating markup languages with a shared syntax, more realistically and more abstractly, an SGML language (formally, an SGML application) is a template for processing For this reason when an SGML document (formally, an SGML instance) is written, the author is, in fact, setting its text as organized dataThe abstract character of languages under the SGML umbrella makes it possible to use the family to describe computer programs The Extensible Style Language (XSL) described below is an example of such an SGML application that is, in fact, an XML application
4Styling and Translating XML documentsIn principle, an author may create a personal XML language To do so the author must be prepared to provide, in addition, (1) companion style sheets or (2) companion translatorsIf one uses a language under XML or SGML, one must understand what companion style sheets or translators will be used with that languageA style sheet is a document that is created to provide directions for a processing program, perhaps a printing formatter or a web browser, on the formatting or rendering of a document that is prepared in a markup languageWhile a translator may be any program, typically a translator is a package of small programs (sometimes called functions) for processing a document under an XML language to some other language, which might be , HTML, another XML, .. under a general framework for processing XML or SGML There are free frameworks for writing such programs in various languages Most of these frameworks require preprocessing parsers, and free parsers are also availableNearterm plans for the development of the World Wide Web anticipate major web browsing programs having the capability to provide finelytuned rendering of XML documents that are accompanied by a style sheet Style sheet support for HTML documents is currently availableLimited rendering of XML documents on the World Wide Web is based on href="http://www.w3.org/Style/CSS/"Cascading Style Sheets (CSS), which has been in use for customized rendering guidance with HTML browsing programsA future standard style language for XML documents in World Wide Web browsing programs is called href="http://www.w3.org/Style/XSL/"Extensible Style Language (XSL) XSL is a restricted form of href="http://www.jclark.com/dsssl/"Document Style Semantics and Specification Language (DSSSL) that is written with XML syntax The specification for XSL was still under draft at W3C on March 1, 2000, while a variant called href="http://www.w3.org/TR/xslt"XSL Transformation Language (XSLT), which may be used for translating XML languages to other languages (whether XML or not), became a W3C recommendation in late 1999While XSLdirected formatting offers more precision than is available with CSSguided formatting, in the overall world of XML processing one should expect formatting based on either CSS or XSL style sheets to be a limited type of formatting One should expect to obtain the finest typesetting results by going beyond the narrow class of XML translation programs that admit expression in a style sheet languageA relatively new simple example of SGML processing may be found in the system manual under SunOS, version 5.7 Observant users of University at Albany SunStations may have noticed that as of the summer of 1999 most of the system manual in the central usrman area now exists in source form under an SGML document type for the manual rather than, as formerly, in the nroff typesetting language (This is temporarily hampering the operation of the classical X11 program xman for the affected portions of the system manual; text rendering is not affected.) See the manual page for solbook and browse usrlibsgml and usrsharelibsgml for more informationA document created carefully today under a well designed XML or SGML language should admit automatic conversion to future formats once an SGML or XML translator for such conversion has been created
5Example Languages under XML and SGMLCALS is a language under SGML that is widely used in the U.S. Department of Defense href="http://www.oasis-open.org/docbook/"DocBook is a public language under SGML that may be used by authors A fall 1999 book, Norman Walsh, href="http://www.docbook.org/tdg/html/"DocBook: The Definitive Guide is available online and in bookstores Walsh maintains a web site http://nwalsh.com/ with a great deal of information about related topics, including an excellent tutorial on XSL (Campus UNIX Network only: A copy of the href="file:///usr/share/local/xml/docbook/dtd/"DocBook DTD is available on the local network.) The href="http://www.tei-c.org/"TEI Consortium has emerged from the href="http://www.uic.edu/orgs/tei/"Text Encoding Initiative at The University of Illinois at Chicago as custodian of the TEI language definition TEI is another public language that may be used by authors Its modular design has led to the creation of the href="http://www.hcu.ox.ac.uk/TEI/newpizza.html"TEI Pizza Chef web site at Oxford A copy of the current href="http://www.tei-c.org/P4X/"TEI Guidelines in HTML, which includes href="http://www.tei-c.org/P4X/SG.html"A Gentle Introduction to XML is available for local browsing on the Sun network from the file system location href="file:///usr/share/local/xml/tei/P4X/index.html"usrsharelocalxmlteiP4Xindex.html HTML is a language under SGML XHTML (formerly HTMLVoyager) is a language under XML, recommended by the World Wide Web Consortium (W3C), that is designed to be equivalent to HTML It is intended to be the base for extending HTML to a language under XML See: http://www.w3.org/TR/xhtml1/. MathML, Mathematical Markup Language is a client platform language under XML that is intended to add mathematical functionality to the world wide web See: http://www.w3.org/Math/. The W3C Recommendation for MathML, version 2, points to a document type definition at W3C for the implementation of a href="http://www.w3.org/TR/REC-xml-names/" namespacebased extension of XHTML that includes MathML The W3C working draft on Scalable Vector Graphics (SVG) format proposes an XML language for online graphics This draft may be found along with other related information at http://www.w3.org/Graphics/SVG/. href="http://www.cs.rpi.edu/puninj/XGMML/"XGMML, eXtensible Graph Markup and Modeling Language, developed recently in New York's Capital District at href="http://www.rpi.org/"RPI, is an XML application based on GML which is used for graph description See also http://xml.coverpages.org/xgmml.html. Any programming assembly language in which each line consists of an operation code followed by parameters is equivalent to an XML language The device independent typesetting file format (DVI) associated with the typesetting language (and with the program groff) is equivalent to an XML language
6ReferencesThe World Wide Web Consortium is the driving force behind XML See:http://www.w3.org/XML/A 1998 book on XML is: Charles F. Goldfarb and Paul Prescod, The XML Handbook, Prentice Hall A second edition has now appearedA very comprehensive catalogue of information about SGML and XML may be found on the web at http://xml.coverpages.org/An interesting and useful web site with ties to Sun MicroSystems, one of the principal sponsors of XML, is http://metalab.unc.edu/xml/An early survey href="http://www.w3.org/TR/NOTE-sgml-xml-971215.html"Comparison of SGML and XML is available from W3CMonitoring the UseNet newsgroups news:comp.text.sgml and news:comp.text.xml is an excellent way to have a window on current discussionOne may also seek answers to questions in the newsgroups when the answers cannot be obtained locally through the HelpDesk at mailto:helpdesk@csc.albany.edu However, one should first make sure that the question is appropriate to the specific topic of the newsgroup For example, most questions about creating web pages do not belong in these two newsgroupsInformation about the topic of mathematics and SGML may be found at the (local) URL http://math.albany.edu:8800/hm/sgml/about.html
7Software Available LocallyThe University at Albany UNIX Network has several basic, general purpose, freely available tools for working with SGML and XML including: The open source evolute, called onsgml, of James Clark's SGML parser nsgmls, which is an application under the OpenSP C library The public location for OpenSP is the href="http://openjade.sourceforge.net"OpenJade Project at SourceForge The public location for information about SP is: http://www.jclark.com/. Note: onsgmls, when properly called, may be used to check the structural correctness of an HTML document At the University at Albany the command validhtml is an interface to onsgmls for this method of HTML validation Script interfaces to various Javabased tools of James Clark for handling XML including: dtdinsta utility to generate an XML instance that models an XML document type definition given in DTD form jcxtthe engine called xt for transformations specified in the XSLT language jinga utility to validate an XML instance against a document type definition specified in the form of either a href="http://www.relaxng.org/"RelaxNG schema or a href="http://www.w3.org/XML/Schema"W3C schema. tranga utility for translations between various types of XML document type definitions David Megginson's general purpose SGMLtoanything processor, sgmlspl, which is an application under his Perl5 library SGMLSPM Local documentation on SGMLSPMsgmlspl may be found at: file:///usr/share/local/xml/html/sgmlspm/index.html The public location for information about SGMLSPMsgmlspl for many years was http:home.sprynet.comsprynetdmeggins. That appears to have been superseded by http://www.megginson.com/Software/; and SGMLSPMsgmlspl is also available at href="http://www.cpan.org/modules/by-authors/DavidMegginson/" CPAN
8Miscellaneous8.1XML and Electronic Data Interchange (EDI)XML offers a standard framework for the general interchange of many kinds of data The usefulness of XMLEDI lies in the inherent adaptability to this end of the many new tools for handling XML There is a substantial amount of material on this topic in the book by Goldfarb and Prescod cited above See the web site: http://www.geocities.com/WallStreet/Floor/5815/.The World Wide Web Consortium (W3C) has basic information about how one might proceed to model a database in XML at the site: http://www.w3.org/XML/. 8.2Library MetadataThe Open Archives Initiative (http://www.openarchives.org/) has developed a protocol for interoperable handling of library metadata across the network based on records prepared under special purpose XML document types that are defined using the new notion of href="http://www.w3.org/XML/Schema"XML schema 8.3How This Document Was PreparedThis document was prepared in Generalized Extensible like Markup (GELLMU), which is the author's user markup interface for SGML languages Presently the system, still under development, may be used to create both href="general.ltx"standard and href="general.html"HTML versions from a single href="general.glm"like source, a text file The program latex may be used to prepare a high quality href="general.dvi"typeset version in DVI formatDonald Knuth's Device Independent Format (DVI) suitable for printing on this system using the program dvips, and a variant of latex known as pdflatex may be used to prepare a different href="general.pdf" typeset version in PDF format and an alternate form of processing to HTML will produce href="general.xhtml"XHTML extended by MathML For more information on this system see http://www.albany.edu/hammond/gellmu.