Re: [xml-dev] Text Markup Part II

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]

From: John Cowan <cowan@mercury.ccil.org>
To: David Lee <dlee@calldei.com>
Date: Fri, 20 Jan 2012 10:56:52 -0500

David Lee scripsit:

> * Was there a legacy/historic (SGML?) reason why documents needed to be
> tagged from the very beginning in order to parse tags at all ? (A kind of
> inline file header/type/magic number ?)

In SGML there is a partial disconnect between element structure (which
is the same as in XML) and tagging.  SGML parsers can leverage the DTD,
which must be present, in order to deduce where missing end-tags and even
start-tags must be.  For example, if C elements are found only within
B elements, then a <C> tag not inside a B element will cause the parser
to infer a <B> tag.  Similarly, if I tags cannot contain P tags, a </P>
within an I element will cause the </I> tag to be inferred.

This is done in a hard-wired way in HTML parsers, but in SGML parsers
the DTD explicitly says which tags of which elements can be omitted.
The only thing an SGML parser can't always infer is what the root element
should be, which is why DOCTYPE declarations specify the root element.
In XML, they still do in the name of backward compatibility, even though
all element structure has to be explicitly tagged in XML.

-- 
"Repeat this until 'update-mounts -v' shows no updates.         John Cowan
You may well have to log in to particular machines, hunt down   cowan@ccil.org
people who still have processes running, and kill them."

References:
- Text Markup Part II
  - From: "David Lee" <dlee@calldei.com>

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]