[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]
Re: [xml-dev] A Taxonomy of Deviance?
- From: "Rick Jelliffe" <rjelliffe@allette.com.au>
- To: "Eric van der Vlist" <vdv@dyomedea.com>
- Date: Sun, 10 Sep 2006 03:20:39 +1000 (EST)
Hi deviance fans:
There are seven mechanisms in common use:
* many systems have some notion of recoverable error and unrecoverable
error, for validating a document against a schema: the actual rules could
be formulated as a useful academic exercise.
* valid laxly, strictly or skippingly, from XSD: this relates to how
unknown elements or namespaces or schemas are handled. (ISO DSDL has an
analog in the Namespace-based validation dispatching language)
* feasible validity, which James Clark implemented in Jing on my
suggestion*: all components are optional (in turn, this requires that the
implementation be cool with ambiguity: RELAX NG is fine with this, but
FSM-based XSD implementations would not be)
* derivation by restriction, from XSD: all documents valid against the
derived schema are valid against the base; this is a less extreme kind of
feasible validity
* partial content models: in which a content model must be correct in
sequence and cardinality as far as it goes (the DOM group suggested
adding something like this, IIRC, {Boston?} and some XML editors use it.
In practise most structured editors operate in this kind of mode.
* derivation by extension, from XSD: differences between elements only
appear at certain point (e.g. the end of a content model), which is quite
related to partial content models
* on the Schematron side, the main thing is a notion of "phases": this is
where you group your constraints to allow progressive validation. Since
Schematron validation does not halt on the first error (in any
implementation I know of) it is useful to have such a mechanism so that
you are not swamped by useless errors. Schematron also allows assertions
to have a "role" attribute, for categorizing the kinds of errors in any
kind of home made or formal taxonomy of deviance (great expression!)
My company Topologi's tools support a notion of progressive validation:
* delimiter correct
* WF (with or without entity inclusion)
* feasible validity (DTD or RELAX NG)
* valid (DTD or schema or RELAX NG or Examplotron)
* extended (schematron)
We support two methods for feasible validity: the one is the content model
reinterpretation (using Jing) and the second is "usage schemas" where we
extract every XPath possible in a document set (say, three step XPaths not
absolute ones) and validate that a document in progress matches that. This
allows you to sample a document set and then check that the new document
is marked up in the same way as the old one: hence the name "usage
schema"--it helps whenever kitchen sink DTDs or new staff are used. (We
are also reworking our tool for trimming DTDs based on the usage schemas.)
IIRC, Murata-san's research interests in forest automata and documents
were based on the idea that if your schema langage was grounded in a
rigorous formalism then it becomes possible if not easy to use set
operations and other related formalisms to categorize, combine and dissect
schemas and documents. RELAX NG comes out of this (interleave excepted,
IIRC.)
In 2000/2001 did a prototype of converting schemas into logic expressions
and validating with a logic system. (Actually, I converted schemas into
path lists, same as usage schemas, then convert those to logic.) Michael
McQueen explores logic ideas a lot further than I ever went, in a paper at
an Extreme XML conference. Logic systems allow a lot of possibilities in
this regard: for example for moving into document repair.
Cheers
Rick Jelliffe
* I have a couple of notes from 1999 that may or may not be useful for
bibliographic purposes: "Weak Validation"
http://xml.ascc.net/en/utf-8/weakvalid.html has the basics of feasible
validity, "Richer Anonymous Content Types"
http://xml.ascc.net/en/utf-8/anonymous.html has the basics of XSD's ALL
(called UNIQUE), and "Validate This! Content Models on Differen Targets"
http://xml.ascc.net/en/utf-8/OtherValid.html has the basics of RELAX NG's
incorporation of text and attributes into content models.
[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]