Lists Home |
Date Index |
Jonathan Robie wrote:
> At 12:45 PM 12/4/2002 -0500, W. E. Perry wrote:
> >AOL to this, Uche.
> >I would actually go beyond your point:
> >"Certainly, if you want your data to outlast your code, and to be more
> >portable to unforeseen, future uses, you would do well to lower your own
> >level of class consciousness. Strong data typing in XML tends to pigeonhole
> >data to specific tools, environments and situations. This often raises the
> >total cost of managing that data."
> >It is not just over time, but right now, between utterly dissimilar systems
> >whose only nexus is the internetwork, that communication is possible only by
> >instantiating a common syntax into locally idiosyncratic semantics at each
> >end of the conversation.
> I still don't understand this point. Could someone please illustrate with
> an example that uses several kinds of software processing data that uses
> datatypes, and showing how the presence of datatypes in that data prevents
> it from being used except in "specific tools, environments, and situations"?
The way I see it, the problem is not XML annotated with data
types _per se_; rather it's the assumption by schema designers
that those data types will be available to all processes.
If the designer isn't careful, this assumption can easily
lead to document types that can only be processed by WXS-aware
Jonathan has asked (repeatedly) for concrete examples of how
typed XML causes interop problems. I don't have any such
examples since I haven't made that _particular_ mistake;
but in the general theme of overreliance on schema information
I've goofed many times.
For instance: I used to be a big fan of keying processing
off of #FIXED attributes in the DTD. This worked really
well in the SGML world, but with XML it limits you to using
DTD-aware processors (and making sure they can find the DTD,
even when disconnected from the Web, et cetera.) This
led to so many headaches that I now use different techniques
to do architectural forms. Lesson learned.
It seems to me that the "International Purchase Order" schema
in section 4 of the W3C XML Schema Primer  comes close to the
edge of that slippery slope. While _most_ of it can be processed
by WXS-oblivious tools, there are some tasks that can't be done
(or can't be done easily) without a type-annotated PSVI and full
schema information. For instance: write a program that extracts
all of the comments from a purchase order (see the schema fragment
in section 4.6), Now you could do this with an XSL transform that
extracted all the 'ipo:shipComment' and 'ipo:customerComment'
elements (since those are the only two elements defined to
be of that type), but that's fragile; if the schema is extended
to include other comment types, the transform will silently
break. Or similarly, find all the Vehicles in a document conforming
to the schema in section 4.7. Or just about any of the tasks
in section 1.9 of the W3C XML Query Use Cases document  --
because of the way the schema is designed, XQuery is probably
the *only* tool that can perform these tasks.
But the key issue is: if you want your data to outlast your code,
don't encode it in a way that's too tightly bound to any particular
process. Peeking at data types in the PSVI to make processing
easier is not necessarily a problem; schema designs that *require*
processes to do so are. This is just one instance of the general theme.
 <URL: http://www.w3.org/TR/2001/REC-xmlschema-0-20010502/primer.html#IPO >
 <URL: http://www.w3.org/TR/xmlquery-use-cases/#strong >