OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: PSVI



From: James Robertson <jamesr@steptwo.com.au>

>Can anyone give me a pointer to a _brief_ discussion of PSVI?

The Post Schema Validation Infoset is the current name given to the XML
Infoset after the document has been schema-processed. It is augmented.

There are four or so basic choices (non-exclusive) for what a Schema
language can do:

-- transform the document, leaving the original in place: e.g. report
Boolean valid, generate a set of links, pretty-print the document, generate
code for interfaces

-- provide type information for programs accessing the infoset, and allow
optimized infosets to be built: e.g. so that a query on a number stores the
query value in an int rather than a string, and uses number-based
comparisons not text-based

 -- augment the infoset using the existing categories: e.g. the defaults in
a DTD add special attribute values, but the fact of whether the attribute
values were specified or defaulted or are fixed makes no difference to the
normal API. (This augmentation could be implemented by lookup rather than
inline if there is a 1-1 correspondence between node name and type. Even if
there is no a 1-1 correspondence, the defaulting etc would usually not be
explicit but implemented by pointing to a singleton representing the
appropriate markup declarations.)

 -- augment the infoset with new kinds of information which do not
correspond to any XML markup: e.g. allow nodes to have properties or types
or facets, allow individual nodes to have validity status (or other
outcomes) added per node, allow extra nodes corresponding to a data value
after it has been normalized and put into some optimized form such as an
array of int. (This augmentation could be implemented by lookup rather than
inline if there is a 1-1 correspondence between node name and type. Even if
there is no a 1-1 correspondence, the defaulting etc would usually not be
explicit but implemented by pointing to a singleton representing the
appropriate markup declarations.)

It is this last augmentation that is the "Post Schema Validation Infoset"
approach.
XML Schemas takes this approach.  There is no way to re-serialize the PSVI
without altering the structure of the document drastically; however, some
architectural forms etc could be constructed.  There are some attempts to
make a standard dump format for PSVI (Richard Tobin's has one and I think
Jonathan Borden has an idea in the works too for RDF) but structure is no
preserved in these.  So every time the document is transmitted, it must be
re-augmented; there is no way for a document to declare "don't augment me",
though perhaps the SOAP/XML Protocols technology might be able to do
something (I doubt they would--a bit fiddly).

So I think it is important to disconnect the idea of schema augmentation and
strong typing from the need for a PSVI proper: as I said above, there are
augmentations possible which do not add any new information types (e.g.
attribute defaulting) and schema queries can have strong typing built in.
And casting _could_ be used in query languages to get strong typing even
without a schema at all: e.g.
  <xsl:template  match="person[@(date)birthday='1972/12/24']" >..
so we also should confuse that a schema is needed to get strong typing per
se...just for automated selection of type.

The proponents of PSVI say it is harmless, that we already do similar things
in real DOMs (i.e. if we already have a pointer to the element/attlist
declaration it can be substituted for a singleton holding the XML Schema PSV
information), that there are implementation techniques that make it
efficient, that it opens up the door for more sophisticated processing, and
that it is required in XML Schemas because otherwise we cannot process
substitution groups generically and because the presence of xsi:type means
that a query writer cannot always rely on the schema to know the type
because type may be explicitly specified (which would also cause the
strong-typed query to fail).  An augmented infoset can allow all sorts of
nice error messages and information.

The opponents of PSVI say it is harmful, that it forces an accross-the-board
upgrade of technology with all the disruption and intereoperability problems
that will involve, that it does not provide much additional functionality
(remember we are disconnecting PSVI from the ability of schemas to
autogenerate optimized interfaces or queries, and from simple augmentation
of the XML infoset), that it may mean that existing non-PSVI specs  are not
maintained, that on the WWW we need to reduce the amount of information sent
so PSVI systems work, that on lightweight devices it is too slow or big to
be useful (and so we will end up with subsets of XPath, XSLT, XML Schemas
and all PSVI specs anyway), that implied markup is bad practise in any case
(i.e. it is OK to use substitution groups to allow similar elements in a
location, but they must be processed indivually), and that xsi:type is a
kludge that is only required because XML Schemas does not provide selection
of type parameterized by attribute values (a.k.a. generalized markup).  The
PSVI is not geared to a world of small-lightweight (or heavy-load)
communicating devices but to the old world of big fat centralized systems
and clients.  Furthermore, making it that an XSLT 2 script may use the PSVI
means that a programmer (e.g. a maintanance programmer, or a
beg/borrow/stealer) needs to understand XML Schemas--this adds significantly
to the background knowledge required; furthermore, it is a betrayal of XML's
basic premise that it (and by expectation its derived technologies) will be
straightforward to use of the WWW and easy to implement.  Furthermore, at
least some people think that the lack of expressiveness in some major areas
of XML Schemas means that it cannot claim to be a "universal" schema
language, and so the PSVI does not provide enough bang/buck: I know a
Schematron fan who thinks Schematron has eroded the areas where XML Schemas
is the preferable schema language, and James Clark (as reported in
xmlhack.com) has commented on some other features he thinks are important to
model.  Some people also may feel that the PSVI/XML Schemas is so
complicated that it centralizes web technology into the hands of the
privileged few (large companies and those funded by them, or Western
countries in general) and so is fundamentally not a "people's
technology"--we have left the DPH a long time ago: they may also feel that
this complexity and over-completeness plays into the hands of the large
commercial interests by making the technology too difficult for starts ups
and, being too verbose for reading, almost guarantees that fancy GUIs must
be used to present the schemas (creating a market for the tools-makers.)

I think it is possible to hold a middle view: that the PSVI is certainly
useful and appropriate for many applications (editors, fat systems) but that
it is not appropriate (or not appropriate _now_) to abandon non-PSVI
versions of specs for PSVI versions.  That it would be more appropriate for
other specs to make use of the other non-PSVI features made available by XML
Schemas (as above: transformation, query typing, XML infoset augmentation.)

I met several people at the W3C meeting in Boston who were very satisfied
with XML Schemas; I don't recall that any of them actually required access
to any of the new PSVI information (as distinct from information that could
be expressed in the current DOM or XPath)  however.  So I don't think
Simon's comment that many people don't like XML Schemas is so relevant to
evaluating the current desirability of the PSVI (and Henry's comment that
there are many people who like XML Schemas is similarly not to-the-point.)
This is not so much an issue of XML Schemas but of how other specs make use
of XML Schemas IYKWIM.  By forcing existing W3C technologies to be based on
XML Schema PSVI, we don't have a world where co-existance is possible.

Hope this is useful and correct.

Cheers
Rick Jelliffe