[
Lists Home |
Date Index |
Thread Index
]
From: "Murray Spork" <m.spork@qut.edu.au>
> I am however starting to question the value of doing an "all-at-once"
> validation - or if I shouldn't just do the validation in a 2 stage
> process - validate against Main first (ignoring any child elements of
> Stuff) - then extracting the Stuff element and validating it seperately
> against its own schema. This is the question I intend to ask on
> xmlschema-dev - what approach do people think is better?
If you are a human validating documents, it is definitely better to
be able to validate in stages:
* in layers (e.g. well-formedness, then structures, then co-occurrence
constraints, then datatypes, then uniqueness, then references, to pick one possibility);
* in islands (e.g. my tables are all correct, then my prose sections are
all correct, then my metadata sections are all correct, then my
cross-references are all OK);
* by entity, or
* by severity level.
So your suggestion of validating some elements first, then others
is good. You can do this in XML Schemas, but it does not provide
any real support for it. (Contrast with Schematron, which provides
"phases" to give language-level support of validating according to
a test plan.)
For programmers, if we implement systems that are at the limit
of our comprehension, we are begging for bugs and unmaintainability.
If we look at non-XML validators, we can see that features for
staged validation is important. For example, SP (i.e., for SGML)
provides a lot of options to customize which reports are generated.
Looking at Java validators, you can see that tools like
AntiC, JLint and ManMachine's wonderful metrics program
JStyle (not the indenter of the same name) provide
a good degree of user-selection of which problems are
reported.
But many of the XML libraries provide terrible support for
validation. Xerces 2 for Java did not even report line numbers
until recently, for example. And when errors are given, they
are directed at programmers or gurus. It is laughable to see
error messages with the word "null" in it; what on earth
is a normal user supposed to make of a programming term.
Often errors are incorrect anyway. A beta tester for our
upcoming product, reported that when faced with this
<!DOCTYPE x PUBLIC "xxx">
the error message comes to the effect that "a space is required before
the system identifier". But there is no system identifier there!
I would say, apart from the understandable immaturity of XML libraries,
there are two causes promoting this problem. First, the Draconian
error policy of XML combined with the limitations of grammar-based
languages or validators (where it may be difficult to get back on track after
a parsing error has been found) tends to force people to work validating
in document order. But that may not be order in which the user wants
to be working in. Second, the focus on validation as a (contractual
act of) QA, of acceptance testing with a binary result, tends to
sideline the needs of people who need incremental validation.
With XML Schemas, it would be nice for validation
APIs to let us query the schema, for example, to ask "does this
element have more than one definition" (e.g. several local ones,
or a global and a local) and, if so, to allow an element to be
validated with the "or" union of all the types. You might need
this if you are validating an entity without the parent, and you
don't want to care about the state of construction of the parent,
for example.
So, in general, I suggest using XML Schemas very conservatively,
and using Schematron for as many of fiddly bits as possible.
Cheers
Rick Jelliffe
|