Re: [xml-dev] Victory has been declared in the schema wars ...

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]
From: Rick Jelliffe <rjelliffe@allette.com.au>
To: xml-dev@lists.xml.org
Date: Fri, 01 Dec 2006 15:31:31 +1100
Costello, Roger L. wrote:
> Hey Rick,
>
> In the Schematron 1.5 specification you write:
>
> "Which is not, of course, to say that grammars are not quite useful
> when appropriate."
>
> When should a path-based language (e.g., Schematron) be used?  What
> should be the role of a path-based language in expressing constraints
> and validating XML instance documents?
>
> When should a grammar-based language (e.g., DTD, XSD, RNG) be used?
> What should be the role of a grammar-based language in expressing
> constraints and validating XML instance documents?
>   
It depends on whether you mean "should" from the POV of passive 
consumers of technology
and standards and infrastructure or "should" from the POV of active 
creators and pioneers of
technology and standards and infrastructure.

In the first case, "should" is limited by products and availability and 
skillsets and capital cycles;
no blue-sky-ing. You use whatever it takes to get the job done reliably 
with what is available to
you. By force of habit, marketing and expectations, people will 
typically try a grammar first, then
Schematron to wallpaper the cracks. That is reasonable. 

But I am more interested in the second
"should": there is no need to say "XSD is too fat" or "RELAX NG is too 
lean" or even
"Schematron is just right" because the motivation is not territorial or 
the achievement of
perfection but how to move on to a more effective basis than XSD and 
grammars currently
offer: people don't have enough energy after working through the XSD 
issues to build the
more complex WS-* systems that the vendors have invested heavily in. The 
equation is
simple: the fewer brain cells occupied by figuring out the schema and 
WSDL, the more
available for integration tasks.

But there is little awareness that regular grammars only can represent 
certain kinds of
structures; people tend to blame the particular flavour (XSD, RELAX NG, 
DTD) rather than
the whole class of technology. The litmus test is this: if you can say 
"My data is a simple tree
of information" without any qualms, then a grammar is probably perfectly 
workable for you.

But if you have reservations, such as "Well, we have a lot of internal 
links actually" or
"Well, it is really the serialization of a graph or data structure that 
is not really a tree at all" or
"Well, we have structures which are dependent on more than just the 
parent element and preceding
siblings" or "Well,  we have inherited constraints such as inclusions, 
exclusions, attributes of an element
that can also appear on any of the descendants" or "Well, we have 
structural requirements that
are scoped to the document rather than the particular context" or "Well, 
we are using standard
schemas that use wildcards but we have particular requirements for them" 
or "Its not a tree, but
multiple tree variants with some kind of selector attribute".  Or, 
"Well, its a tree but I really only
can define a portion of the constraints at this time: other people will 
be adopting and adjusting it".
Or "Well, it it is a tree, but it serializes an ER database and we don't 
care whether one-to-one relationships
are represented using child or parent containment, attributes or lD 
links or keys, but only one."
Or "Well, it is a tree, but we need to be justify the business reason 
why every constraint exists, otherwise
it is just a cost: grammars encourage us to have a lot of sequence 
constraints that our data does
have and which adds extra work when generating XML and extra tests when 
accepting XML."

IMHO, ultimately grammars will be a niche technology: an implementation 
technology or
optimization under the hood, a niche schema language when you have 
repeating structures that are
not tagged explicitly...perhaps even the internal format for an XML 
IDE.  I seem to be the only
person in the world who believes this, which must be embarrassing for 
everyone else. :-)

We can still have  XML, namespaces, XSLT2, XQuery2, XS:datatypes and so 
on; but the
validation and type attribution can be built into the same XPath process 
that does the XSLT,
XSLT2 or XQUERY2 rather than requiring a separate validation step.  Yet 
everyone knows
that XSD is a schema language that people use for everything *except* 
validation. But
grammars interpose a separate mapping step on modeling existing 
databases; I think this step is
basically fat that can and should be trimmed, in favour of a path-based 
system that allows
a direct mapping from the XML document to the data model in the kind of 
way I suggested
in the ER example in the previous email.

Summary: For now, if you have a simple tree and have grammar skills or 
applications, use
RELAX NG or XSD and use Schematron to wallpaper any little differences. 
If you have something
a little more complicated, plan on having a Schematron step as well as 
whatever grammar
validation you have, just as standard procedure. If you need to validate 
and you have anything
complex going on, just forget grammars and start off with Schematron. 
Especially if you have
XSLT skills and don't want to waste your brain space learning a big fat 
standard like XSD.

For the future...well, I think that if grammars are indeed fat, 
commercial logic and user
acceptability issues will drive grammars out in favour of 
query-implementable schema
languages. We are at the point in the investment cycle where the need to 
get a return
on the type-dependent technologies does not override the need to get a 
return on the
type-attributing technology; at some time in the future vendors will 
move from
"how can we monetize our investment in XSD by providing type-aware systems
like WS-* or XLinq or XQuery?"  to "How can we monetize our investment in
WS-* and XLinq or Xquery by choosing a less intrusive schema 
technology?" I don't
know whether this is 2007 or 2017, but I think it will happen.

Cheers
Rick Jelliffe
References:
- RE: [xml-dev] Victory has been declared in the schema wars ...
  - From: "Costello, Roger L." <costello@mitre.org>
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]