[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]
Re: [xml-dev] RNG vs. XSD : is the use of abstract types andpolymorphism a good or bad thing for schemas for XML?
- From: Rick Jelliffe <rjelliffe@allette.com.au>
- To: "Costello, Roger L." <costello@mitre.org>, xml-dev@lists.xml.org
- Date: Thu, 15 Mar 2012 12:03:49 +1100
I like what Norm wrote.I'd add a couple of things.
XML came out of a desire to reduce the bloat of SGML; I think
instigators of RELAX NG harboured the same desire for RELAX NG: use a
powerful model with a few constructs, add some nice sugar (RNC), and
re-use external standards (XSD datatypes). This is essentially a
pragmatic view, along the "wrong is right" lines. But the essential
use of the schema language was for validation.
The instigators of XSD had a difference set of desires: superset of
DTDs, superset of inheritance, superset of datatypes plus support
database keys, conversion to and from object systems, be able to model
families of schemas like XHTML, and allow some kinds of internal model
checking (abstract, complex type inheritance). The intent was a schema
language for all seasons. The essential use of the schema language was
type assignment, as in data binding: documents build from systems
generated from an XSD schema would necessarily be valid. A big set of
requirements leads to a big technology leads to a big chance that any
particular adoption of the technology beyond simple validation would
(have to) leave bits out, making it a gamble whether your system could
support a schema. (XSD 1.1 improves the bang per buck over XSD 1.0,
but increases the risk of not being supported.)
When I look at them, I can understand both approaches, from the angle
that the more complex a system is, the more that extra layers of
abstraction are required. RELAX NG has five kinds of modularity
(pattterns, external ref, include, define, combine) while XSD has a
few more. So you would expect that the more complex your system is,
that RELAX NG will run out of steam for modeling power than XSD does.
However, as your system gets even more complex, XSD's modeling runs
out of steam too. For example, XSD brings little to the picture when
trying to reconcile dialects: it has no ability to say "this attribute
in schema A is the same as this element in schema B" for example.
So in order to cope with these you either need to abandon RELAX NG and
XSD or build extra layers on top. (The same goes for Schematron too.)
For example, the XBRL effort build modeling on top of XSD.
And the result? XSD starts with so much complexity, that anything
built on top that provides a superset of XSD's capabilities is liable
to be bloated and require a lot of buck per bang. Again, XBRL is an
example. Contrast that with RELAX NG: starting from the smaller neater
base allows neater layering of modeling features above it.
This is not an abstract issue. In my job I have to look at schema
issues where modeling is necessary, but both RELAX NG and XSD do not
provide any assistance. The difference is that RELAX NG was designed
in the expectation of semi-custom extra layers, while XSD was designed
in the expectation of integration into a vendor's toolchain: once you
buy into the toolchain you are stuck/blessed with whatever modeling
capabilities the vendor provides.
Next, two minor quibbles.
First I'd note that things like XSD's abstract type feature can be
(easily?) fitted on top of RELAX NG, using a Schematron schema: you
just check that a particular set of pattern names (LHS) are not used
directly in an element declaration (RHS). (And remember that XSD's
complex type derivation is used for model checking as a layer, not as
a method of reducing the number of declarations.) And a check that a
RELAX NG grammar is not ambiguous (something like XSD's UPA) can be
added on top as well, (though perhaps not with Schematron with XPath 1
since it is not good for tracing arbitrary chains)-- IIRC there was a
Japanese system that featured this. As well as the distinction between
what two schema languages support directly (and how well), there is a
useful distinction between what can be layered on top of them (and how
well). Are these better done as a layer or monolithic? Are these
better provided as a standard layer or left to the market to decide?
Secondly, the type derivation by single inheritance in XSD is not the
only game in town, as far as modeling. What complex type inheritance
does is say that X is also valid against base type Y and all the chain
of base types (with the wrinkle that for derivation by extension by
suffixation in XSD 1.0, you use (Y, *) as the base type. This kind
of extra check can be done on the schema, as I mentioned. But it can
also be done by validating the document separately with both X and Y.
And, indeed, this turns out to be a more powerful approach: for
example, there is a RELAX NG schema for HTML with one schema for
normal parent child validation, and another schema that handles
exclusion exceptions (e.g. that an html:a cannot have another html:a
underneath it at any level.) In order to do this constraint in
XHTML 1.0 you (unless there is some trick with <xsl:key>or something
obscure) you would have to have duplicate content types and because
the global element declarations were in use you would have to use
local declarations only. There are some modeling problems that are
better dealt with as parallel problems, or with multiple inheritance
(for which, in effect, the grammar needs to support ambiguity.)
If your problem is coping with long-lived dialects, and mapping
between concepts and dialects, DTDs, XML Schemas and RELAX NG don't
provide any help. (Schematron's abstract patterns or XBRL or OWL are
nearer, I think.) It think XSD's marketing tends to give the false
promise that it is big enough to scale to large issues, when it is
big enough to get in the way. RELAX NG's marketing is much clearer
(indeed, a premise of ISO DSDL): solve the small problems fist and
leaving the complex modeling to other layers.
Finally, what is interesting to me is that for the last several years,
over a variety of very large projects, I am seeing RELAX NG Compact
being used by content analysts because of its terseness, then
converted to XSD as needed. This avoids some of the dangerous corners
of XSD, but I think the main reason is generational (people who grew
up on DTDs) but also practical (being able to write a content model on
a single line, being able to see all the parts of a type on a single
screen, not scrabbling around diagrams, not needing to change tools or
views, not hanging around for the editing tool to suck in all the
schema and load it). So notional modeling power is one aspect, but
convenience and usability (given a set of experiences and
expectations) may trump power.
Cheers
Rick Jelliffe
[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]