<!DOCTYPE article SYSTEM "gellmu.dtd"><article stem="eos">
<title>Why <emph>eos</emph><eoq/></title>
<author>William F. Hammond</author>
<body>
<parb>
The <abbr>GELLMU</abbr> <emph>article</emph> document type has an end<hyp/>of<hyp/>sentence
mark <emph>eos</emph>, which is a defined<hyp/>empty <abbr>XML</abbr> element,
corresponding to the concept of sentence in languages such as English
and French<eos/>  But there is no provision for regarding a western
sentence itself as an <abbr>XML</abbr> element<eos/>  Why<eoq/>
<parb>
There are two reasons<eos/>
<parb>
Sometimes one wants to begin a <emph>display</emph> in the middle of a
sentence<eos/>  Then it can happen that the display is the last part of
the sentence<eos/>  It is a formal rule of <abbr>XML</abbr> that if an element
begins inside another element, then the second element must be closed
before the first element is closed<eos/>  Following this rule, when a
display is the last part of a sentence, the display must be ended
before the sentence is ended<eos/>  As a consequence an <abbr>XML</abbr> processor
must usually work very hard to place the sentence<hyp/>ending punctuation
mark correctly<eos/>
<parb>
Is this just a technical <abbr>XML</abbr> issue<eoq/>  Not really<eos/>
<parb>
The second reason for modeling an end<hyp/>of<hyp/>sentence mark but not a
sentence is that some literary use of a language such as English
does not actually resolve into clean sentence units even though
end<hyp/>of<hyp/>sentence punctuation is used<eos/>
<parb>
One could argue that when a sentence is used, it could be marked up
with a <emph>sentence</emph> element<footnote>The model would then likely
permit each of <emph>sentence</emph> and <emph>display</emph> to contain the
other</footnote><eos/>  In that event it is unlikely that authors would want to be
required to insert end<hyp/>of<hyp/>sentence marks explicitly<eos/>  Moreover, there
would be something of a dilemma for the <abbr>XML</abbr> processor if it
happens to notice an item of <abbr>CDATA</abbr> at the end of a sentence
that appears to be an end<hyp/>of<hyp/>sentence mark<eos/>  There would still be
the vexation caused by a display that ends a sentence<eos/>  And would
authors use the <emph>sentence</emph> element<eoq/>
<parb>
Will authors want to use the explicit <emph>eos</emph> rather than the
simple <abbr>CDATA</abbr> punctuation mark <quochar>.</quochar><eoq/>  If so, how is the
sequence <qquostr>.<ltc/>eos<sol/><gtc/></qquostr> to be handled by a processor<eoq/>
<parb>
Authors are the end users, and authors need convenience<eos/>  Reasonable
convenience lies in the convention that began with the dawn of the
mechanical typewriter:
<quote>A sentence is ended with a period followed either by a
newline or by two or more blank spaces<eos/></quote>
<parb>
Handling this convention is not a reasonably efficient task for
an <abbr>XML</abbr> processor<eos/>  But it works very well with a <latex/><hyp/>like
markup interface for <abbr>XML</abbr>, i.e., when there is pre<hyp/>processing
from <latex/><hyp/>like markup to <abbr>XML</abbr> markup<eos/>

</body>
</article><!-- GELLMU version 0.7.4.2 (05-Jan-2006) -->
