Lists Home |
Date Index |
Recent discussions here about XQuery, XPath 2.0, and their knotted
relationships with W3C XML Schema have made me think a fair amount about
the relationship between XML and W3C XML Schema, particularly the
Post-Schema Validation Infoset (PSVI), more deeply.
There were a bunch of presentations last year about how XML + XSD -> XML
2.0, something I found merely annoying then but which makes more sense
now. The community that craves these features is poorly served in many
ways by XML 1.0, with its text orientation, structures that can be loose
to the edge of complete unpredictability, and a human-readability
requirement that is incredibly verbose but useful in many cases only for
XML 1.0 is now more and more buried under layers of other processing,
and the common foundation for W3C work moving forward appears to be the
PSVI - or at least an enormous amount of effort is going into
integrating the PSVI with a large number of projects, and it seems that
most of the vendor and programmer excitement these days is focused on
the PSVI, not the brutish markup that lurks underneath.
The PSVI seems to be what programmers and database folks want. It
offers strongly typed and highly structured information, already
guaranteed to conform to their expectations. It has the same flexible
named hierarchies that XML offers, with none of the messy concerns about
character encodings, CDATA sections, or the limitations of text for
storing binary information.
At the same time, the PSVI is pretty difficult to express in XML.
Layers of type information can make it complex to pin down how best to
describe a particular piece of information. Object-oriented development
manages that every day, but doesn't have to express the whole hierarchy
for every piece of information in a flat representation. Given recent
discussions of synthetic PSVIs, it's not always clear that
I'm concluding from all of that that XML is not a good foundation for
the kinds of information developers want from the PSVI, and that
retrofitting XML to carry that information is perhaps the root cause of
the complexity explosion we're seeing in W3C XML Schema and
specifications which build on it. It seems to me that it might be wiser
to use the PSVI directly for more abstract information modeling rather
than expecting XML representations to carry the load.
So where does this take us? Developers who want to work with the PSVI
should work with the PSVI, and not worry about XML. The kind of
interoperability the PSVI is designed to provide is very different from
the kind of interoperability that XML provides - a perfectly reasonable
conclusion given the different situations leading to the creation of
their respective specifications.
Beyond that, it seems like some easily-exchanged representation of the
PSVI is in order. XML works, sort of, but it seems pretty obvious that
there are better approaches to representing information if you have all
the information the PSVI provides rather than a simple "all is text"
approach. This could easily be a binary format, though text might also
be an option.
XML has done a wonderful job of convincing the world that it is possible
to agree on base formats for some kinds of information, and that generic
tools (parsers, editors, etc.) can be useful for a wide variety of
specific problems. It seems reasonable to suggest that the lesson of
XML is not "everyone must use angle brackets and text" but rather that
"shared information formats are really useful when supported by a
reasonable set of tools".
Given the immense bias in current XML work at the W3C toward support for
the PSVI, it seems like it might well be time to find an appropriate
means of expression for the PSVI. Conversions from strongly typed PSVI
to loosely typed XML should be trivial, while XML to PSVI should only
require a W3C XML Schema (or other PSVI generator) to provide the
PSVI processors could use or extend existing XML infrastructures,
replacing only the bottom layer - the parser - and possibly developing
its own structures for the layers above. I suspect that taking the PSVI
to its fullest potential is going to involve a lot more work than taking
untyped markup to its fullest potential. It's simply a larger set of
A binary PSVI format could sure make XML-RPC (PSVI-RPC?) messages a lot
smaller. All it takes is a spec, some free parsers, and some tools.
Maybe someday programmers will look back on XML as the bootstrap phase
of the PSVI, while the occasional markup geek still pokes around CDATA
Ring around the content, a pocket full of brackets
Errors, errors, all fall down!