[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]
Re: [xml-dev] Victory has been declared in the schema wars ...
- From: Rick Jelliffe <rjelliffe@allette.com.au>
- To: xml-dev@lists.xml.org
- Date: Fri, 01 Dec 2006 15:31:31 +1100
Costello, Roger L. wrote:
> Hey Rick,
>
> In the Schematron 1.5 specification you write:
>
> "Which is not, of course, to say that grammars are not quite useful
> when appropriate."
>
> When should a path-based language (e.g., Schematron) be used? What
> should be the role of a path-based language in expressing constraints
> and validating XML instance documents?
>
> When should a grammar-based language (e.g., DTD, XSD, RNG) be used?
> What should be the role of a grammar-based language in expressing
> constraints and validating XML instance documents?
>
It depends on whether you mean "should" from the POV of passive
consumers of technology
and standards and infrastructure or "should" from the POV of active
creators and pioneers of
technology and standards and infrastructure.
In the first case, "should" is limited by products and availability and
skillsets and capital cycles;
no blue-sky-ing. You use whatever it takes to get the job done reliably
with what is available to
you. By force of habit, marketing and expectations, people will
typically try a grammar first, then
Schematron to wallpaper the cracks. That is reasonable.
But I am more interested in the second
"should": there is no need to say "XSD is too fat" or "RELAX NG is too
lean" or even
"Schematron is just right" because the motivation is not territorial or
the achievement of
perfection but how to move on to a more effective basis than XSD and
grammars currently
offer: people don't have enough energy after working through the XSD
issues to build the
more complex WS-* systems that the vendors have invested heavily in. The
equation is
simple: the fewer brain cells occupied by figuring out the schema and
WSDL, the more
available for integration tasks.
But there is little awareness that regular grammars only can represent
certain kinds of
structures; people tend to blame the particular flavour (XSD, RELAX NG,
DTD) rather than
the whole class of technology. The litmus test is this: if you can say
"My data is a simple tree
of information" without any qualms, then a grammar is probably perfectly
workable for you.
But if you have reservations, such as "Well, we have a lot of internal
links actually" or
"Well, it is really the serialization of a graph or data structure that
is not really a tree at all" or
"Well, we have structures which are dependent on more than just the
parent element and preceding
siblings" or "Well, we have inherited constraints such as inclusions,
exclusions, attributes of an element
that can also appear on any of the descendants" or "Well, we have
structural requirements that
are scoped to the document rather than the particular context" or "Well,
we are using standard
schemas that use wildcards but we have particular requirements for them"
or "Its not a tree, but
multiple tree variants with some kind of selector attribute". Or,
"Well, its a tree but I really only
can define a portion of the constraints at this time: other people will
be adopting and adjusting it".
Or "Well, it it is a tree, but it serializes an ER database and we don't
care whether one-to-one relationships
are represented using child or parent containment, attributes or lD
links or keys, but only one."
Or "Well, it is a tree, but we need to be justify the business reason
why every constraint exists, otherwise
it is just a cost: grammars encourage us to have a lot of sequence
constraints that our data does
have and which adds extra work when generating XML and extra tests when
accepting XML."
IMHO, ultimately grammars will be a niche technology: an implementation
technology or
optimization under the hood, a niche schema language when you have
repeating structures that are
not tagged explicitly...perhaps even the internal format for an XML
IDE. I seem to be the only
person in the world who believes this, which must be embarrassing for
everyone else. :-)
We can still have XML, namespaces, XSLT2, XQuery2, XS:datatypes and so
on; but the
validation and type attribution can be built into the same XPath process
that does the XSLT,
XSLT2 or XQUERY2 rather than requiring a separate validation step. Yet
everyone knows
that XSD is a schema language that people use for everything *except*
validation. But
grammars interpose a separate mapping step on modeling existing
databases; I think this step is
basically fat that can and should be trimmed, in favour of a path-based
system that allows
a direct mapping from the XML document to the data model in the kind of
way I suggested
in the ER example in the previous email.
Summary: For now, if you have a simple tree and have grammar skills or
applications, use
RELAX NG or XSD and use Schematron to wallpaper any little differences.
If you have something
a little more complicated, plan on having a Schematron step as well as
whatever grammar
validation you have, just as standard procedure. If you need to validate
and you have anything
complex going on, just forget grammars and start off with Schematron.
Especially if you have
XSLT skills and don't want to waste your brain space learning a big fat
standard like XSD.
For the future...well, I think that if grammars are indeed fat,
commercial logic and user
acceptability issues will drive grammars out in favour of
query-implementable schema
languages. We are at the point in the investment cycle where the need to
get a return
on the type-dependent technologies does not override the need to get a
return on the
type-attributing technology; at some time in the future vendors will
move from
"how can we monetize our investment in XSD by providing type-aware systems
like WS-* or XLinq or XQuery?" to "How can we monetize our investment in
WS-* and XLinq or Xquery by choosing a less intrusive schema
technology?" I don't
know whether this is 2007 or 2017, but I think it will happen.
Cheers
Rick Jelliffe
[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]