OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Natural Motions (was Re: intertwined specs)

From: Simon St.Laurent <simonstl@simonstl.com>

>XSLT 2.0 processors will need an understanding of XML Schema datatypes,
>while XPath 2.0 processors will need to implement the regular expression
>language specified in XML Schemas.  XQuery builds on all of these, using
>the strategy pioneered by Quilt.  The current draft of XML Schemas requires
>schema processors to understand XPath as well.

>Anyone else find this unnerving?

I completely agree with Simon's key sentiment (that basing the new
generation of XML technology on the Post-Schema-Validation Infoset is a
betrayal of XML's small-is-beautiful premise, and that we should be *very*
concerned about it as an issue that affects us all, and that we should
seriously consider what to do) though perhaps not with his actual arguments.

I don't find it unnerving, because this path was started in 1999 (before my
time) when the Schema WG went/kept the route of the PSVI, as can be seen in
the drafts.  Murata-san left to do RELAX after that time, but perhaps he
knows more about the ins and outs of the rationale.

However, just as SGML was too complex for WWW applications so we created a
subset, and XPath is too complex for some (streaming) applications so we
will make a down-reference-only subset (you didn't hear that from me), so
XML Schemas will be too complex for some things and in time there will be a
subset of it (in fact, we already have that with ISO RELAX and OASIS TREX).
This natural rhythm should be unsurprising, though it may make us sea-sick.

Most of XPath 2 reqs look excellent.  Even the introduction of data-type
awareness does not require a PSVI: it could be done by explicit casting of
some kind and the requirements include a casting operator. Some parts of the
PSVI (e.g. the attribute defaulting and normalized data values) could be
layered on top the existing DOM transparently.

But the killer is #5 "Should Add Support for XML Schema: Structures." That
is the point where it requires a PSVI.

What to do?  I suggest people lobby very hard that XPath 1.1 should be first
improved with all the XPath 2 features (including casting of datatypes and
regular expression) that do not require the PSVI *before* XPath 2 is
released, and that XSLT 1.2 be upgraded to support these too.

I do not believe it is possible to stop PSVI-based specs at W3C, nor do I
believe that would be remotely desirable: there are popular and useful fat
applications (of course I mean database management systems) that will be
well-served by the PSVI, and it will open the door for lots of useful
innovation and press releases. Lots of people are interested in using XML as
a framework not for document exchange but for databases, and why not?

But I hope the W3C will not forget those of us who are more interested in
improving XML as a document exchange notation, including document exchange
to lightweight processors.   All W3C needs to do is make sure that current
non-PSVI specifications track their PSVI twins as much as possible, or
provide some mechanism to handle the absense of a PSVI, or to include an
attribute to tell whether they require a PSVI or not and to allow two
conformance levels for implementations: infoset or PSVI infoset.

Being able to process elements based on their  xsi:type is simply not
something that there is any demand for that I have seen.  I have never seen
a single posting on any forum asking "why cannot we do this?"  However, if
were available I would use it and probably like it.

How do you send a PSVI document?  There are no conventions from W3C for
serializing a PSVI, except to send the schema with the document and
reparsing at the other end.  (We could use the ISO LTDR, AFDR etc. See the
Buck/Goldfarb/Prescod note at the W3C Technical Report site.  But then the
document needs a new schema!)   And there is at least one implementation of
XML Schemas that has an excellent PSVI dump format, but it is hardly
practical for data interchange.  With DTDs, we can normalize and
canonicalize the data into XML and largely do away with declarations (except
NOTATION and binary ENTITIES and ID/IDREF) but this is not possible with the

At the point of the PSVI, we are no longer dealing with XML!

This utterly changes XML: it is the infoset road already embarked on by
XPointers (allowing ranges, a thing that cannot be serialized, though their
new solution of corrupting the data by providing spurious containers inline
is a good one.)  I also agree with Simon's naming point:  a specification
that works on the infoset and produces an unserializable result, or that
works on something that is unserializable should not be called "XML blah

I would prefer the Post-Schema Validation Infoset to have some completely
different name, such as the W3C Typed Tree Information Set; then XQuery can
be called the Typed Tree Query Language, and XPath 2 can be the Typed Tree
Path language, etc.   At a certain point, it stops being XML, and should be
given a separate name.  (And does anyone else find it strange that W3C is
creating database-supporting specs while OASIS is doing  lightweight schemas
for the WWW?)

Rick Jelliffe
(not speaking on behalf of W3C Schema WG, though I expect some other members
would not disagree)