Lists Home |
Date Index |
> I was wondering what were the advantages of mixing typechecking and
> validation in a schema language.
In the point-of view (POV) of some standards-makers, "validity must mean validity"
on any platform. So there should be no subsets of validity, and therefore
no modularization. (Similarly, supersets/annotations are not particular of interest.)
If my document is validatable and valid on one system, it should be validatable
and valid on another.
W3C XML Schemas 1.x is highly influenced by this POV.
The unfortunate result is, of course, that a large integrated technology
is more difficult to document and express than a technology modularized from
the start into discrete little languages. (It is not the integration per se that
causes the problem: there is not really any difference between a standard
that specifies a lot of optional small modules and a standard that specifies
the same small modules but requires them. Instead it is that an integrated
system does not force the standards-makers to develop of a framework/discipline
for/of modularity, and so leads to spaghetti specs.)
Indeed, insisting that "validity is validity" has paradoxically had the opposite
effect, because of the ratty compatability of implementations in the first
few years of XSD so far: validity on one system is often not validity on
A diametrically opposed POV is that
* There are lots of different kinds of validity and validation, for different
industries and users (Olivier's point)
* Different uses require different schema language characteristics.
Schema languages can be divided into three basic kinds:
- those that can be streamed, such as RELAX NG and uniqueness-checking;
- those that require two passes (or to save information on the document pass
and then check that information in another pass) , such as IDREF
or KEYREF systems
- those that require access to the document tree, such as Schematron.
Applications have similar requirements to: for example, for performance
an application may have to stream because the document is too large
for memory or because throughput rate needs to be maximized.
* We are more likely to get high-quality implementations fast by allowing
smaller languages that can be implemented by a single motivated person
each, and then open-sourced.