Lists Home |
Date Index |
- To: <firstname.lastname@example.org>
- Subject: For those who wanted to know the algorithm for checking maliciousschemas..
- From: Murali Mani <mani@CS.UCLA.EDU>
- Date: Fri, 3 Oct 2003 07:20:27 -0700 (PDT)
This is one of the algorithms.. I explain for DTDs, but it can be modified
to be used for any schema language that I would classify as based on
regular tree grammars, including W3C XML-Schema, RELAX-NG and its family
and so on..
---------- Forwarded message ----------
Date: Fri, 3 Oct 2003 06:26:50 -0700 (PDT)
From: Murali Mani <email@example.com>
Subject: Re: [xml-dev] Managing Innovation
This problem is in the same league as the one which Henry Thompson
mentioned. Henry Thompson showed once that if we use fixed values for
IDREFs, then checking whether a non-empty document exists is NP-complete.
I will show how to determine whether given a DTD and a doctype declaration
R, we can determine whether the DTD will produce only infinite documents
with root R. I use an approach similar (rather identical) to "reachability
in a directed graph"
First, for each element, you have already calculated the set of elements
that *must* occur as children. This is possible using quite simple
algorithms which look at the content model for each element. I can give an
algorithm for this also if you want.
Basically, what you would already have is,
<!ELEMENT a (b, c?)>
then you know that a must have b as a child.
Suppose you have that for every element. Let us denote this set for any
element A, as NecChildSet (A)
Now for each element, we find the set of descendant elements that the
element *must* have. For an element, A, I will call this set of descendant
elements as NecDescSet (A)
You can compute NecDescSet (A) using a recursive algorithm as:
NecDescSet (A) = NecChildSet (A) UNION
NecDescSet (B), for every B \in NecChildSet (A)
Once you have NecDescSet (A) for every element in DTD, you can find
malicious elements. These are elements whose NecDescSet containts itself.
Now suppose the doctype declaration says root is R, now we know that the
DTD produces non-infinite documents with root R, *if and only if*
NecDescSet (R) does not contain any malicious elements.