Lists Home |
Date Index |
From: "Jonathan Borden" <firstname.lastname@example.org>
> Off the top of my head that's a few. For example, we've long been told that
> XML DTDs dropped the "&" construct because of the cost/benefit ratio -- cost
> for the implementation that is, because this construct is rather intuitive
> for people designing document formats (translation: it makes expressing
> certain document constraints _easy_). Now it seems that RELAXNG has brought
> back the "&" at least in the form of "interleave" yet without any great
> increase in complexity indeed the language is simpler than XML Schema. So
> that is a _good thing_ right?
There reason that WXS only allows & at the top-level is
because of concerns of exponential blowouts from allow & anywhere.
Let us look at three ways of implementing it:
1) A vector of bits for the & group: as you find the structure you
set the bit. No danger of explosion.
2) A state machine. Enormous danger of blowout, especially
as & groups are repeated and nested.
3) The RELAX NG-style derivative approach. No blowout?
AFAIK, at the time the WG made its decision, there was
no implementation of any validator using 3) for a markup
language at that time. And the WG wanted to reduce
the risk/effect of unintended explosion and allow simple
IIRC, one of the things that swayed the WG was that,
as with banning ambiguous content models, this is
a technical choice that can be relaxed without invalidating
any previous schemas.
When consider WXX, there are perhaps six logical classes:
1) decisions that cannot be changed
(type lattice, "simple" versus "complex", named types, PSVI?,
provide type system for XQuery, non-modular)
2) decisions that can be relaxed
(ambiguity, & groups)
3) decisions that are incoherent or mistaken and must be changed
4) decisions that are incomplete and probably may be completed sometime
5) decisions that are broken and should be changed
(type lattice, "simple" versus "complex", named types, PSVI,
provide type system for XQuery, non-modular ?)
6) decisions that are taken to prevent bloat
My experience is that the Schema WG is very keen on fixing 3), and
sympathetic to 2), and would like to do 4) as better approaches
and analysis emerges. I think we can expect WXS to improve
incrementally over the next few years, especially with cross-pollenation
from DSDL and XQuery.
However, there seem no chance that 1) and 5) can be resolved, because
they go to the technical heart of WXS, in the minds of the leading
stakeholders in the Schema WG (if I may assume the turban of a
If they cannot be resolved, and the needs/worldview that drive both are not
going to go away either, then what? We should accept plurality, and
indeed embrace it: DSDL can help WXS! It can give the W3C Schema
WG a story to tell for features that are extraneous to its stakeholders
requirements: the sixth category above.
For example, at the ISO DSDL meeting we had a suggested requirement
from a large European publishing house that we can validate than
a mixed content element in the Dutch language should only contain
Dutch characters. That is something that a publishing company may
well want to do: for example, so that they can guarantee that their
data can be repurposed to Dutch mobile phones or to fit in with
their fonts. But it was something turned down by WXS when it
was requested there.
Should WXS add this now? I say no, because it would bloat WXS with publishing
requirements that are beyond what is needed for XQuery. Should ISO
DSDL have this? I would say yes, because we want to support
publishing and provide a good coverage of features for people whose
schemas needs are not well expressable using WXS.
Similarly, WXS helps ISO DSDL by providing a full-bodied, well-oaked
schema language with strong character. ISO DSDL is not about forcing
anyone to give up WXS. Instead it is about providing a modular
framework which allows you to extend WXS (if that is your base
schema language) with other constraints, and a platform to help
you migrate over time between schema languages (e.g. from
DTDs to WXS to RELAX NG) or when different parts of documents
use different schema languages. (In the publishing industry, corporate
amalgamations and data-sharing are very common: there is no guarantee
that everyone will be using the same schema let alone the same