Lists Home |
Date Index |
Discussion on how to "validate an update" / "revalidate after update".
Brian Candler wrote:
>On Wed, Jul 28, 2004 at 10:12:20AM -0700, Jeff Greif wrote:
>>Unfortunately, the 'minimum set of checks' depends upon what you're
>Precisely - I'm trying to establish what would be a good thing to validate
>against for this purpose :-)
My guess is Structural constraints as given by DTDs and W3C XML Schemas
are slightly easier to revalidate.
For Relax NG, it depends on the kind of schema - it can be easy, but it
can also be hard.
For crazy scientists, the former two are top-down deterministically
recognizable, the latter not.
The simplest example is maybe this:
<f><a/>...<a/></f><b/> is valid
<f><b/>...<b/></f><a/> is valid
everything else is not.
This description (some might call it co-occurrence constraint) is not
top-down deterministically recognizable: when you check the children of
f, you have to remember that you need to check a b later, something
which deterministic top-down tree automata cannot. instead
nondeterministic ones (like the ones that correspond to Relax, they just
"guess" what they need to expect. The implementation must then be
bottom-up, i.e. from leaves to root)
Whether inserting an <a/> under f is ok, depends on the node to the
right of the f. These dependencies can get arbitrarily nested, making
revalidation of the whole tree necessary.
>>A feasible conceptual mechanism (leaving aside performance problems on large
>>files) would be to update a PSVI-decorated DOM, and revalidate (using DOM
>>Level 3 APIs) the nodes rooting the altered subtrees.
>I had not come across DOM validation until now. The spec mentions W3C Schema
>and DTD almost in passing; is it intended to work with any validation
>mechanism, or just these two? For example, with Schematron, it seems quite
>difficult to determine what are the valid children you could add at this
>point in the tree.
As I see it, having a DOM tree is conceptually equivalent to having
whatever other sort of tree (though DOM is quite bloated).
Here's the idea from above, a bit elaborated: Assume your DOM is valid
before the update.
Your validator should then have visited every node, carrying some
information down there.
Modify it so that it stores this information "in" every node that it visits.
Now, when you do the update, you replace the "validation information" in
every node from the path to the root by "unknown".
When you revalidate the whole tree, you will not need to descend in
The rest depends on the validation mechanism:
DTDs are based on regular expression (even deterministic), so all you
need to store is the state number of a (deterministic) finite state
XML Schema is a bit trickier, but essentially the same.
Relax NG is based on tree regular expressions, so you will store the
configuration of a finite state tree automaton of some sort.
Assuming the thing you inserted was valid, in the former two cases, it
is enough to run the dfa on the "rest" of the sequence.
For Relax NG, you might (depending on your schema and your validation
implementation) in the worst case need to traverse great portions of the
entire tree, because of the "non-local" constraints you need to check.
If you use somehing like pushdown hedge automata, you might get away
with only traversing the right-upper half, but you need to store a pair
of a state and a list of states in every node, which can be expensive
for deeply nested XML trees.