OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help



   Subtyping in XML

[ Lists Home | Date Index | Thread Index ]

Mike Champion wrote:
> 9/2/2002 8:02:30 PM, Paul Prescod <paul@prescod.net> wrote:
> >> ...
> >> "Inheritance is a complex type's only advantage, but you really
> >> don't want to use it."
> >
> >Yep! After a year of off-and-on research I concluded that trying to
> >import OO-style inheritance to schemas was a bad idea.
> Paul, did you write up the results of that research anywhere?
> I looked for a link on www.prescod.net and didn't find any
> rants about schemas and type inheritance.

No, that which is feasible and not feasible never became that clear in
my own mind. Let me be more precise in my conclusions. Basically after
working with type various approaches to unifying the OO model
(specifically inheritance) it became clear to me that unifying the OO
and tree grammar (i.e. DTD) models is quite difficult and very easy to
get wrong by accident.

I've seen bits of the argument expressed by various people in messages
over the years but never a single rant. 

I know you didn't ask for justification but I feel that having made the
claim I feel I should back it up if anyone was interested (which it
seems you are). Still, I'll be scattered, not organized.

Nevertheless, let's see what I can find.

I first started thinking about this stuff around 1997, I guess:

 * http://www.geocrawler.com/archives/3/318/1997/10/100/1765820/

The section on "Subclassing" suggests the gist of the problem. The core
sentences are:

"This is a little bit of an inversion from OOP, because in OOP a 
subclass must accept any 'input' that a parent class can. We think of 
content models as "accepting input".  

"Generally speaking, 
attributes seem more intrinsically amenable to concepts of subclassing 
than content, because they are "random access" in some sense, as are 
methods in OOP. Perhaps in adding OO features to SGML we will also  
choose to make attributes more powerful (for instance by allowing them 
to have content models and explicit substructure like elements)."

But we didn't make attributes more powerful. Instead XML Schema just
added inheritance and ignored the potential problems. Whatever middling
enthusiasm I have for the semantic web technologies derives from the
fact that they provide a much cleaner basis for inheritance,
extensibility and property-based data access (which is strongly
associated with OO).

Let me try to summarize the problem with XML Schema inheritance this
way: The defining characteristic of subtyping in OO languages is that if
the subtype is properly designed it *will not break code* written for
the supertype, whether the supertype predicted your extensions or not.
This can be achieved with XML Schema inheritance only if the people
writing code for the supertype practiced a high level of discipline. In
other words, subtyping "just works" for clients in OO languages. It
takes (IMO) unacceptable levels of discipline in XML Schema.

XML Schema is better than DCD was when it comes to inheritance, but it
is still somewhat susceptible to the issue I discussed here:

 * http://lists.xml.org/archives/xml-dev/199901/msg00517.html

No, you can't break client applications by extending a union in XML
Schema, but let me give a trivial example of where you could break a
naively created client application:

section = title, para+

Standard, fairly naive, XSLT says:

title -> chapter_title
para -> paragraph

Output schema says:

chapter_title, paragraph

Some yahoo (perhaps trying to crash your system) extends the section to:

section = title, para+, title

The schema says, "Yeah, that's a valid extension" (despite the fact that
it violates Liskov). The XSLT faithfully does it thing and returns:

chapter_title, paragraph, chapter_title

Now you've tricked the app into generating bad data without violating
the input schema. It is VERY DIFFICULT to trick an object oriented
program in this way because the extension mechanism is based on named
properties that *cannot* interfere with each other.

More recently, Don Box has been edging towards the same ideas:


"One of the features that really hooked my on XML Schema was derivation
by extension and xsi:type. This mechanism worked very similar to the
object marshaling and serialization world I had cut my teeth on, and for
several years, I viewed the XML type system through these glasses.
Obviously, as the years have passed, I've become slight more catholic in
my views thanks to the influence of people like Allen Brown, Matthew
Fuchs, Simon St. Laurent, and Martin Gudgin. 

Today, the top of my head blew off (yet again) while listening to Martin
Gudgin giving a talk on XML Schema to my team. Specifically, while he
was explaining some of the more esoteric aspects of derivation by
restriction, I saw the light."

Dare says:

 * http://lists.xml.org/archives/xml-dev/200206/msg00220.html

"With XQuery and XSLT one can attempt to process elements based on their
XSD types but with xsi:type one can both restrict and extend these types
in the instance document unbeknownst to the author of the processing
code. At first glance it seems like both these mechanisms do not
radically alter the content model in such a manner that carefully
written type aware processors will be rendered ineffective. 

However until applications start getting built there probably is no sure
way to tell if my fears are unfounded or not."

I think his fears are founded!

Henry Thompson says:

"Just as no-one would allow a mission-critical system involving
validation to do so against a client-supplied DTD (despite the fact
that, as you point out in your companion message, XML 1.0 _requires_
it to do so to be a conformant validating parser), but would instead
use their own, just so anyone writing a mission-critical application
involving schema-validity assessment will do so against their own
schema and either write it to 'block' the use xsi:type wrt extension,
or ignore any other schema hints, so any attempt to use foreign types
will fail (both of these strategies _are_ allowed by W3C XML Schema)."

 * http://lists.xml.org/archives/xml-dev/200206/msg00265.html

So basically he's saying that it isn't safe to allow the data provider
to nominate a schema that uses inheritance to extend your schema. If you
can't do that, then XML Schema inheritance is not really a mechanism for
improving the extensibility in XML. And XML's dirty little secret is
that it isn't really that great at extensibility after all. "Extending"
a document type can break applications which is more or less what
happens in binary formats too!

I'll repeat that this is the central issue that has got me looking past
XML to the semantic web technologies. They aren't trying to patch in
extensibility into the model later. It's core from the start. I'm pretty
confident that an application built around an RDF class can work with
any subclass without any fear of violating Liskov. By definition,
subtypes inherit constraints. Extensions can only be in dimensions that
do not violate constraints. But XML Schema does not make this promise.

To give another example, I would expect in RDF that if a property had a
maxOccurs of 10, it could not be expanded by the child. But in XML
Schema, you can use derivation by extension to add elements at the end
which will be collected by almost all naive processors. The only way a
smart processor can guard against this is to make sure to only process
the first 10 elements. But a central goal of schemas should be to
*relieve* applications of this kind of constraint-checking burden.

Basically, XML Schema inheritance is not in general a third-party safe
extensibility mechanism for XML, and as I recall it, that's what it was
supposed to be. If it isn't that, then its costs outweight its benefits,
in my opinion.
 Paul Prescod


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS