OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: typing

Sean McGrath wrote:

> What if schema stuff including DTDs is *always* outside
> the instance?
> Does this simplify the infoset issues? yes
> Does this allow a variety of schema approaches to be used on
> a mix and match basis during pipeline processing? - yes
> Does it allow the same instance to be viewed through
> the eyes of both local and global semantics via different
> schemata? yes
> Does it appeal to simpletons? yes

Arguably and in retrospect one might have argued to specify DTDs
differently, for example to split the roles of parse time entity
substitution from validation. Aside from as a guide to split different
functions into different layers or specifications, I'm not sure what this
would do for the problem at hand, namely 'typing', 'PSVI', 'XML Schema' etc.

Suppose we deprecate DOCTYPE today -- how would that solve these specific

> The optionality of DTD validation, coupled with its explicit binding
> to document instances, coupled with its explosive effect
> on the complexity of infosets, is the nub of the problem
> in my opinion.

The surprising number of perhaps subtle but real errors I've found in common
XML parsers related to validation leads me to strongly suspect that not much
validation is going on in common everyday practice. I'm not sure how this is
a real problem in terms of complexity.

> HTML parsers don't do any of that stuff. There is no
> point in putting them in your HTML even though
> SGML says you can. As a consequence of simply
> ignoring all these "optional" features, HTML parsers
> yield a simple infoset. Yes, I know that the absence
> of start and end-tags makes the DAG variable from
> one parser to the another but the core infoset is
> *simple*.

I think the SAX interface is simple. The DOM is slightly more complex (on
the other hand using SAX may be slightly more complex than using the DOM due
to the frequent need to keep track of state etc. but that's not the point).
The point is that both of these interfaces as well as XPath 1.0 reflect
pretty much the XML Infoset.

Don't get me wrong, the goal of simplicity is a noble one, my argument is
solely with your assertion that the "current" Infoset as reflected in
SAX,DOM,XPath is cause for complexity in XML, rather one could look to these
interfaces as examples of XML's success.

> What would it take (I am addressing this question to
> those with an intimate knowledge of the XML 1.0 spec.)
> to allow validating XML 1.0 parsers to be handed
> two URIs. One for the DTD and one for the instance.

That is simply an implementation issue (IMHO). Simon St. Laurent has already
produced a SAX parser/filter that supplies or replaces a DOCTYPE definition.
Dave Brownell's Aelfred2 has sliced out validation into a separate layer
above parsing. There you have it. Simon, how many people have downloaded
this open source filter? And if the number isn't terribly high, its not that
this hasn't been a terrific idea or implementation, rather because the
demand -hasn't- been there, but you now know where to get it.

> This I believe, would be a great first step towards
> separating the expression of data and model.
> It would also make DTD level validation a peer
> of other validation/mapping/transclusion
> technologies rather than an eminence.

 Again and to summarize, if we were designing XML 1.0 today, and had the
number of schema languages available today as were then, your argument would
be completely (sic) valid. As it is DOCTYPE is optional. I would very much
like to look to XML 1.0 as a continued foundation upon which we move
forward. We have new battles to fight.