[
Lists Home |
Date Index |
Thread Index
]
- From: Joe English <jenglish@flightlab.com>
- To: xml-dev@lists.xml.org
- Date: Tue, 01 Aug 2000 07:48:58 -0700
"Simon St.Laurent" <simonstl@simonstl.com> wrote:
> At 08:56 PM 7/31/00 -0400, Jonathan Borden wrote:
> >You correctly complain that the Infoset is only a partial abstraction of
> >XML.
> >
> >Suppose this: we define a 100% complete abstract model of an XML 1.0, and
> >XML Namespace compliant document, and also define a mechanism for defining
> >subsets of such an abstract model.
> >[...] In
> >this scenario, the current XML Information Set would be derived from the
> >full fidelity Base XML Information Set.
>
> I'd support that as well. While I'm not certain that the abstraction layer
> provided by the Infoset is necesary, I'd much rather that the abstraction
> layer be as complete as possible if there is to be one.
>
> Subsets would be completely acceptable once 'full-fidelity' was attained.
It seems to me that we already have a 100% complete abstract
model for XML: the formal grammar in the XML 1.0 Recommendation.
This assigns a role to every character in the input sequence
via grammar productions. A parse tree derived from this grammar
is also the *minimal* complete representation -- any model that
doesn't account for every character is by necessity incomplete.
(The Infoset does augment it some -- the XML Rec doesn't account
for the "base URI" property -- but everything else in the Infoset
is straightforwardly derivable from the concrete syntax.)
Having a canonical "subsetted" model like the Infoset is very
important to tool-builders, spec writers, and schema designers
though. Without it, it's all to easy to design an application
that relies on properties of the input document that most tools
consider accidental syntactic properties; then documents built
in conformance with that application can't be processed with
those tools. This has happened to me a couple of times when
dealing with SGML.
--Joe English
jenglish@flightlab.com
|