[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: Why Are Schemas Hard? (WAS RE: "Uh, what do I need this for" (wasRE: XML.COM: How I Learne d t o Love daBomb))

From: Nicolas LEHUEN <nicolas.lehuen@ubicco.com>
To: "'Bullard, Claude L (Len) '" <clbullar@ingr.com>,'Michael Brennan ' <Michael_Brennan@allegis.com>
Date: Thu, 23 Aug 2001 00:42:03 +0200

1. Schema are not inherently hard. Most people can capture the idea of an
XML schema. The trouble is with the schema languages. Important technologies
are building over the W3C XSDL, whereas all we need most often would be a
namespace-aware DTD. XSDL is far too complicated. There are many ways of
expressing the same thing. Namespace handling is not trivial, just have a
look at the thread about local elements. I have strong doubts about the fact
that we'll see tools that fully supports the spec one day (most of current
tools advertise a support of a "subset" of XSDL), the crucial question being
: what will be the intersection of the supported XSDL subsets of all those
tools ?

In fact, I often think that namespace-aware, XML-formatted DTDs are just
what we need. I was fascinated to see questions on this list (or elsewhere)
showing a 2 lines DTD construct that was not easily rendered in XML Schema
in less than 20 lines, with 2 or 3 possible variations.

The problem is not the verbosity, it's just that to my mind, XSDL is
overkill. Its editors did a great job of covering 100% of their
requirements, however my feeling is that they sacrified the simplicity of
implementation of 80% of schema just to be able to cover the last 20%. The
irony, moreover, is that I've seen some schema definition that could be done
in RELAX and could not be done in XML Schema.

If you list the achievements of XML Schema, you'll see that the spec tries
to solves many problems in a single document :

- complex types : a way to define a content model, much like in DTDs, but in
an XML format.
- complex types inheritance : allows the building of polymorphic content
models, providing a more elegant (though not the only one) solution to DTD
content modeling with entities.
- support of namespaces ; element definition can be assigned a namespace,
and there is a way to specify how a document containing elements from
different namespaces can be built and still validated. Naming conventions
and namespaces scopes are defined.

This is what I call a "namespace-aware, XML-formatted DTD". Stopping there
and refining the spec until stability is reached would have enabled us to
perform a great deal of tasks, but the spec does not focus on this part and
goes further :

- built-in simple types : data types for attributes or element contents.
- simple types extension : by inheritance and the use of facets, it is
possible to refine simple types.

That's perfect but current tools only know how to handle xsd:string. Some OO
binding tools can go further, but not everybody want to use such tools. If I
look at my preferred DOM API (W3C's or JDOM or dom4j or other non-Java
APIs), I have no support for other simple types (i.e. no automatic
parsing/serialisation of xsd:int or others types).

Then we've got document-wide contraints (AKA identity constraint
definitions) :

- the ability to declare a uniqueness constraint
- key and keyref : the ability to declare key paths within the document, and
keyrefs parts.

Then we have a contribution to the infoset :

- Post-Schema-Validation-Infoset : addition to the infoset done after the
schema validation is performed.

Last, we have a topping of additional features :

- attribute groups : saves some typing
- mixed content models : apart from in the xsd:string content type, there is
no way of defining a textual content model. Instead, it is possible to
declare a content model as "mixed", in which case some text can be
interleaved between child elements of the content model.
- model groups : idem, also acts as a work around for some syntax
definition. The problem is that without proper thinking, one can easily mix
model groups and inheritance and get some pretty ununderstandable schemas.
Model group cannot be marked as "mixed", only the content model that use
them can. This does nos ease the definition of document oriented schemas
like XHTML.
- non-DTD content models, like the "all" content model that accepts a set of
child content model in any order provided that all child are there*.

That's all, folks ! Well actually I'm sure I've forgotten a lot of important
things...

Personally, I think that the scope of this spec is too wide, and I'm
particularly concerned by the fact that relatively simple questions,
concerning for example the definition of the content model of a local
element, cannot be answered easily by reading the spec (which cannot be
qualified as crystal clear for some parts). Letting the first three point
mature and be widely implemented BEFORE introducing the rest would have been
wise. Now we've got a spec which is hard to use as a schema writer, and hard
to implement. That's why I am looking for simpler alternatives.

Regards,
Nicolas

* I hope I got it correctly. The spec states :
"If the {compositor} is all, then there must be a ·partition· of the
sequence into n sub-sequences where n is the length of {particles} such that
there is a one-to-one mapping between the sub-sequences and the {particles}
where each sub-sequence is ·valid· with respect to the corresponding
particle as defined in Element Sequence Locally Valid (Particle) (§3.9.4)."
Pretty clear, isn't it ? 

-----Message d'origine-----
De: Bullard, Claude L (Len)
A: Michael Brennan; Nicolas LEHUEN
Cc: 'xml-dev'
Date: 22/08/01 22:44
Objet: Why Are Schemas Hard? (WAS RE: "Uh, what do I need this for" (was RE:
XML.COM:  How I Learne	d t o Love daBomb))

1.  What about Schemas is hard?

2.  Are some of the issues about things Schemas cannot 
    represent (eg, the co-occurrence constraints)?

I'm missing something here.  With a product like 
XML Spy for sanity checking, I don't seem to find 
the Schemas hard to develop.  That doesn't say 
that implementing a system around a schema isn't 
hard, but I am curious what others are struggling with.
Yes, the spec is tough (they all are), but the primer isn't, there 
are dozens of web articles on learning schemas, and 
a week or so with a beta of an IDE seems to cover 
the sanity checks.

Len Bullard
Intergraph Public Safety
clbullar@ingr.com
http://www.mp3.com/LenBullard

Ekam sat.h, Vipraah bahudhaa vadanti.
Daamyata. Datta. Dayadhvam.h


-----Original Message-----
From: Michael Brennan [mailto:Michael_Brennan@allegis.com]

> From: Nicolas LEHUEN [mailto:nicolas.lehuen@ubicco.com]


> 5) Our greatest current problem is about schemas. As I've 
> wrote previously
> on this list, our approach raises a dire need of a simple 
> schema language,
> simple enough so that developers can write and use schemas 
> without having to
> read thousands of pages of specifications. We are currently 
> investigating
> languages such as RELAX NG, Schematron, Examplotron or a 
> custom language we
> named RESCALE to solve this problem.

We are in agreement there.

-----------------------------------------------------------------
The xml-dev list is sponsored by XML.org <http://www.xml.org>, an
initiative of OASIS <http://www.oasis-open.org>

The list archives are at http://lists.xml.org/archives/xml-dev/

To subscribe or unsubscribe from this elist use the subscription
manager: <http://lists.xml.org/ob/adm.pl>

Follow-Ups:
- RE: Why Are Schemas Hard? (WAS RE: "Uh, what do I need this for" (wasRE: XML.COM: How I Learne d t o Love daBomb))
  - From: "David E. Cleary" <davec@progress.com>

Prev by Date: Re: Namespaces,W3C XML Schema (was Re: ANN: SAX FiltersforNamespaceProcessing)
Next by Date: Transactional Web Services ? (was: a very long subject with weirdspaces inside)
Previous by thread: RE: Why Are Schemas Hard? (WAS RE: "Uh, what do I need this for" (wasRE: XML.COM: How I Learne d t o Love daBomb))
Next by thread: RE: Why Are Schemas Hard? (WAS RE: "Uh, what do I need this for" (wasRE: XML.COM: How I Learne d t o Love daBomb))
Index(es):
- Date
- Thread