xml-dev - RE: [xml-dev] PSVI formalization

RE: [xml-dev] PSVI formalization
[ Lists Home | Date Index | Thread Index ]
To: "'Simon St.Laurent'" <simonstl@simonstl.com>, xml-dev@lists.xml.org
Subject: RE: [xml-dev] PSVI formalization
From: Matthew Gertner <matthew.gertner@schemantix.com>
Date: Thu, 9 May 2002 18:35:52 +0200
Simon,

Cool, I was about to respond to your other post, but this is a much clearer
formulation of the issue. Your initial premise is exactly right: what gets
developers excited about XML is the prospect of schema-enabling it.
Certainly that's true for me (your humble PSVI poster boy), and you're also
right that I don't like CDATA, NOTATIONs and the like.

I think where you are missing the boat is the assertion that somehow there
could be some alternative representation of the PSVI that wouldn't be XML
but would satisfy all the gearheads out there. Frankly this is a crazy idea.
It took decades for something like XML to appear, and it's a huge boon. We
are developing software right now that uses XML along with schema (having
created our own version of the PSVI two years ago), and we also use XML
parsers, XML editors, XPath, XSLT and a whole slew of other XML technologies
and tools. Why on earth would we reinvent the wheel when XML works for us!?
Just to preserve the "purity" of the language for the benefit of some markup
Luddites?

Someone on XML-Dev recently hit the nail on the head when they talked about
the N^N complexity of XML integration. The only solution is to have commonly
agreed-upon semantics for the documents, as someone else pointed out. The
most basic semantics relating to structural and datatyping constraints are
contained in a schema, and already make a lot of generic processing
possible. Without this, you really can't do anything useful with an XML
document without writing specific code, and the whole XML-on-the-web vision
falls apart.

I simply can't get away from the suspicion that your objection lies more in
the specific instantiation of XML Schema, PSVI, XPath, XQuery, etc. rather
than the underlying concepts. If you're such as schema skeptic, why did you
waste all of that time with DDML? The idea of well-formed vs. valid
documents has been around since the genesis of XML, and I don't remember
anyone getting upset about it until we started being faced with an array of
200-page specs that no one can understand. I would submit that:

1) You are upset to see a 35-page spec turn into a 160-page spec with
assorted dependencies in other huge specs (XPath), and this is
understandable. The notion of strong typing in XPath doesn't seem so
horrible to me, but the bloat of the spec does. In other words, the W3C is
increasingly unable to produce simple specs.

2) You are upset because it is harder and harder to divorce well-formed
documents from valid documents. In other words, the W3C is increasingly
unable to produce layered specs.

I can't imagine an objection to the notion that XML can be associated with a
schema, and when it is it becomes a valid document and the behavior of
certain associated specs is extended accordingly. For example, XPath can do
design-time type checking. If there is no schema, the document is
well-formed and this "baggage" is ignored.

Maybe you are being purposely provocative (I can relate, lord knows), but
the idea that XML+schema is somehow no longer in the spirit of XML is
absurd.

Matt

> -----Original Message-----
> From: Simon St.Laurent [mailto:simonstl@simonstl.com]
> Sent: Thursday, May 09, 2002 5:15 PM
> To: xml-dev@lists.xml.org
> Subject: [xml-dev] PSVI formalization
> 
> 
> Recent discussions here about XQuery, XPath 2.0, and their knotted
> relationships with W3C XML Schema have made me think a fair 
> amount about
> the relationship between XML and W3C XML Schema, particularly the
> Post-Schema Validation Infoset (PSVI), more deeply.  
> 
> There were a bunch of presentations last year about how XML + 
> XSD -> XML
> 2.0, something I found merely annoying then but which makes more sense
> now.  The community that craves these features is poorly 
> served in many
> ways by XML 1.0, with its text orientation, structures that 
> can be loose
> to the edge of complete unpredictability, and a human-readability
> requirement that is incredibly verbose but useful in many 
> cases only for
> debugging stages.
> 
> XML 1.0 is now more and more buried under layers of other processing,
> and the common foundation for W3C work moving forward appears 
> to be the
> PSVI - or at least an enormous amount of effort is going into
> integrating the PSVI with a large number of projects, and it 
> seems that
> most of the vendor and programmer excitement these days is focused on
> the PSVI, not the brutish markup that lurks underneath.
> 
> The PSVI seems to be what programmers and database folks want.  It
> offers strongly typed and highly structured information, already
> guaranteed to conform to their expectations.  It has the same flexible
> named hierarchies that XML offers, with none of the messy 
> concerns about
> character encodings, CDATA sections, or the limitations of text for
> storing binary information.
> 
> At the same time, the PSVI is pretty difficult to express in XML. 
> Layers of type information can make it complex to pin down how best to
> describe a particular piece of information.  Object-oriented 
> development
> manages that every day, but doesn't have to express the whole 
> hierarchy
> for every piece of information in a flat representation.  Given recent
> discussions of synthetic PSVIs, it's not always clear that
> XML+schema->PSVI.
> 
> I'm concluding from all of that that XML is not a good foundation for
> the kinds of information developers want from the PSVI, and that
> retrofitting XML to carry that information is perhaps the 
> root cause of
> the complexity explosion we're seeing in W3C XML Schema and
> specifications which build on it.  It seems to me that it 
> might be wiser
> to use the PSVI directly for more abstract information modeling rather
> than expecting XML representations to carry the load.
> 
> So where does this take us?  Developers who want to work with the PSVI
> should work with the PSVI, and not worry about XML.  The kind of
> interoperability the PSVI is designed to provide is very 
> different from
> the kind of interoperability that XML provides - a perfectly 
> reasonable
> conclusion given the different situations leading to the creation of
> their respective specifications.
> 
> Beyond that, it seems like some easily-exchanged representation of the
> PSVI is in order.  XML works, sort of, but it seems pretty 
> obvious that
> there are better approaches to representing information if 
> you have all
> the information the PSVI provides rather than a simple "all is text"
> approach.  This could easily be a binary format, though text 
> might also
> be an option.
> 
> XML has done a wonderful job of convincing the world that it 
> is possible
> to agree on base formats for some kinds of information, and 
> that generic
> tools (parsers, editors, etc.) can be useful for a wide variety of
> specific problems.  It seems reasonable to suggest that the lesson of
> XML is not "everyone must use angle brackets and text" but rather that
> "shared information formats are really useful when supported by a
> reasonable set of tools".
> 
> Given the immense bias in current XML work at the W3C toward 
> support for
> the PSVI, it seems like it might well be time to find an appropriate
> means of expression for the PSVI.  Conversions from strongly 
> typed PSVI
> to loosely typed XML should be trivial, while XML to PSVI should only
> require a W3C XML Schema (or other PSVI generator) to provide the
> necessary information.
> 
> PSVI processors could use or extend existing XML infrastructures,
> replacing only the bottom layer - the parser - and possibly developing
> its own structures for the layers above.  I suspect that 
> taking the PSVI
> to its fullest potential is going to involve a lot more work 
> than taking
> untyped markup to its fullest potential.  It's simply a larger set of
> problems.
> 
> A binary PSVI format could sure make XML-RPC (PSVI-RPC?) 
> messages a lot
> smaller.  All it takes is a spec, some free parsers, and some tools. 
> Maybe someday programmers will look back on XML as the bootstrap phase
> of the PSVI, while the occasional markup geek still pokes around CDATA
> sections.
> 
> -- 
> Simon St.Laurent
> Ring around the content, a pocket full of brackets
> Errors, errors, all fall down!
> http://simonstl.com
> 
> 
> -----------------------------------------------------------------
> The xml-dev list is sponsored by XML.org <http://www.xml.org>, an
> initiative of OASIS <http://www.oasis-open.org>
> 
> The list archives are at http://lists.xml.org/archives/xml-dev/
> 
> To subscribe or unsubscribe from this list use the subscription
> manager: <http://lists.xml.org/ob/adm.pl>
>
Follow-Ups:
- Re: [xml-dev] PSVI formalization
  - From: Tim Bray <tbray@textuality.com>
- RE: [xml-dev] PSVI formalization
  - From: "Simon St.Laurent" <simonstl@simonstl.com>
Prev by Date: Re: [xml-dev] PSVI formalization
Next by Date: RE: [xml-dev] PSVI formalization
Previous by thread: PSVI formalization
Next by thread: RE: [xml-dev] PSVI formalization
Index(es):
- Date
- Thread