[
Lists Home |
Date Index |
Thread Index
]
- To: <xml-dev@lists.xml.org>
- Subject: Schematron versus Godzilla ( was Re: [xml-dev] xml schema)
- From: "Rick Jelliffe" <ricko@allette.com.au>
- Date: Sun, 1 Sep 2002 15:20:31 +1000
- References: <8BD7226E07DDFF49AF5EF4030ACE0B7E07A971C9@red-msg-06.redmond.corp.microsoft.com>
From: "Dare Obasanjo" <dareo@microsoft.com>
> Interestingly enough the X.12 Reference Model for XML Design also
> prohibits using xs:appinfo elements which means that they practises like
> embedding Schematron in W3C XML Schema is against their rules. I find it
> rather ironic that you are touting the document as a guideline of how to
> use W3C XML Schema.
Err, it would be more ironic if I ever had the view that Schematron (or any other
schema language) was the ONE TRUE WAY :-)
There are many good reasons why people might decide not to use Schematron
in certain circumstances:
* it does not work streaming: it may require some kind of tree and multiple passes
* it does not integrate currently with WXS' type system (though that may come),
* it is not supported by many commercial tools (though it has more open source
support than any other schema language)
* it is unconcerned with storage issues or defaulting
* people who don't know XPath but are expecting a DDL equivalent find XPaths
complicated and scary, and the ability to model many kinds of "business rules"
may be regarded as a weakness rather than a strength (i.e. they may feel
that Schematron is not really a "schema" language at all!)
* (the big one!) people are so caught up/freaked out by XML Schemas that
they fear that the additional stage of Schematron constraints only adds to
the conceptual complexity.
And, of course, there are many good reasons to use it:
* more simple
* more powerful
* clearer diagnostics
* validate different kinds of things to grammar-based schema languages
* easier (to use by people who do have XPath expertise)
* easier to implement (in fact, trivial to implement)
* works with most mainstream XSLT tools (such as MSXML 4)
* less abstraction and more use of general-purpose constructs
* you only need to express the constraints that your other schema
languages (DTD, RELAX, WXS, etc) don't have, allowing
a "best of both worlds" approach where there is a standard
universal schema (in some domain) and "localized" versions
to suit the individual use cases--this simplifies management
* since there is no reason that an arbitrary data model forms a
natural tree (though there is often some nice tree structure
that can be implied), there is little reason to expect that a tree-based
schema language (e.g. regular grammars) can fully or nicely
represent all the relations
* Schematron allows very idiomatic constraints to be expressed,
rather than imposing an idiom (contrast with XML Schemas,
which does not allow attributes to constrain content models)
* The phases mechanism supporting progressive validation
* people may indeed want to validate some "business-rules"
at the same time as they validate static structural rules.
Personally, I much prefer the DSDL approach of a simple
framework language invoking different little languages,
rather than the XML Schema-ish approach of <appinfo>.
The trouble with <appinfo> is that is assumes that other
schema languages are based on types or annotate the
conceptual structures in XML Schemas: there is no equivalent
in XML Schemas to a Schematron "pattern". So an embedded
Schematron schema (using <appinfo> annotations on element
declarations) may be useful and have practical advantages, but it is
only a fairly limited use of Schematron.
In the long run, I expect XML Schemas will move to
a more modular framework, align its grammars with
RELAX NG, factor out things that are niche requirements
(e.g. nil) into modules, factor out syntactic sugar, and allow
constraints expressed using fundamentally different paradigms
such as Schematron.
My understanding is that the XML Schema WG rather sees
the value of XML Schemas as laying in the components
and type lattice, far more than in the particulars of syntax
or organization. I cannot see why in the long run everyone
cannot win: the X.12 profile shows that, at least for a significant
bunch of users, XML Schemas goes too far, and that
layering XML Schemas to start with a RELAX NG-sized core
is enough for 80/20. Lets take that as the core, then
use a modular framework so that the people who need
types can have them, the people who need nils can have
them, the people who need co-occurrence constraints can
have them, and the people who need patterns can have them.
So it is no bad reflection on Schematron if one organization
feels that it is not suitable for them, any more than it is
a bad reflection on HTML if I write something using DOCBOOK.
X.12 want to define the minimal WXS approach; it seems
very consistent to me that if they don't recommend user-defined
datatypes or substitution groups, that they also would not
recommend annotations. I see it as a matter of layering: they
are more or less saying "the bottom line is this conservative
subset of WXS" which is pretty much a RELAX NG
subset too.
The more conservative the XML Schema is, the more that
Schematron will emerge as being appropriate as a nice layer
for additional constraints, supplementing (and simplifying)
the WXS schemas. In the longer term, I think
Schematron fits into process-centered Software Engineering
practises such as "quality circles". There needs to be a
process in place whereby downstream constraints can
be fed back to upstream validators, or where upstream
detectors can feed forward to downstream implementers.
Maybe best practise will emerge that software tools need to be
in place (from the outset!) to allow agility and rapid response to
feedback: this kind of constraint validation is very different from
the things that X.12 is concerned about. In otherwords, it is good to
go ahead with conservative WXS controlled by corporate or industry
standards, but also put in place a Schematron validator to allow
agile local constraints (such as "we don't accept this element"
or "this envelope must be addressed to us" or "invoices of that
amount are not supposed to come here" or "don't want to allow
content of more than a certain size because we have a
buffer overrun problem in our software, doh" or
"these jokers are sending bad data; we need to check it more"
or "these constraints are suitable for people sending data who
are setting up their systems, and need to get accurate diganostics
before going live" )
Cheers
Rick Jelliffe
|