Lists Home |
Date Index |
From: "Ronald Bourret" <email@example.com>
> Joe English wrote:
> > The general idea of Infoset augmentation is I think very useful,
> > but I'm starting to think that doing it as part of validation
> > is not a good idea.
> I definitely agree. Interest in an augmented infoset is orthogonal to
> validation. You presumably want an augmented infoset for meta-data
> driven programming. This has nothing to do with whether you believe your
> document is valid or needs to be checked.
> Also orthogonal is the addition of data (as opposed to metadata). The
> easiest example is adding default values. This has nothing to do with
> data types (the most common form of augmentation) or validation.
But surely there is a pragmatic consideration too: rather than doing
three traversals of data, one can kill three birds with one stone.
So perhaps the issue is not so much that specs (such as DTDs and
XML Schemas) bundle data-typing, value augmentation and validation
together, as an insult against the God of Layering, but that the specs
do not bring out that these three functions are Visitors (factoring out
uniqueness & keyrefs for argument).
Much of the talk of layering seems to be based on the desirability of pipelined
sequences of data processors rather than visitor-based layering.
But since most XML process is just based on namespace+name
(with a little requiring the parent element,
(And, to prevent flames, I note that pipes which use the parsed event stream,
ESIS-style, do can much of the the efficiency of multiple visitors on a tree.)
There is a case to be made that, for implementability reasons, it is actually
good to bundle together as many orthogonal functions that can act as
visitors on the same traversal of the infoset. That a particular technology
selects a particular set of orthogonal functions as some kind of nice
package is the choice of the designer; orthogonality means that functions
should be able to be presented, specified and implemented without
regard to each other, not that a technology should not bundle different
kinds of functionality together.
Schematron is interesting in this regard, because it does not
specify either the traversal order/mechanism for getting nodes,
nor the function (validation,augmentation, data-typing) to
be performed on selected nodes. Instead it filters nodes
and provides the parameters that the particular visitor can use.
The particular Schematron implementation decides whether
to validate (the most common use), data-type (using the
role attribute) or to transform/augment (e.g. schematron-rdf which
generates an RDF version of rules linked to the instance