[
Lists Home |
Date Index |
Thread Index
]
Steven Ericsson Zenith wrote:
> I am very happy to see this discussion - but I am moved to argue against
> this assertion approach.
>
> First, from any formal point of view, unbounded and non-terminating
> recursive types are undesirable - they are, essentially, "irrational" in
> every sense of that word. "unbounded" simply cannot - in any pragmatic
> sense involving finite resource systems - mean "infinite."
That's exactly backwards. From a formal point of view, unbounded and
unlimited depth recursion is highly desirable. If it were an issue, why
doesn't the Internet set an upper bound on session length, why don't
relational databases require you to specify a maximum number of rows per
table, etc.?
From a practical point of view, a session can't go on forever, a table
can't have an infinite number of rows and you can't instantiate an
infinite size document. But this has little to do with data modeling.
> I don't understand why a validator taking into account maxOccurs=N
> should be inefficient - someone needs to explain that to me. How hard
> can that be?
Let's write <element name="a" minOccurs="1" maxOccurs="4"> using the
shorthand "a{1,4}".
This is equivalent to "a a? a? a?" in standard regular expression
notation. So an expression like "a{0,30000}" has 30,000 terms when
unfolded! If you try to implement this expression with a state
transition matrix, as some do, you will find yourself with 30,000^2 terms.
If you try to implement the expression as a deterministic finite state
machine, it might seem less problematic. A minimal DFM for the
expression is:
Match Success Fail
Start a 1 -
1 a 2 Next
2 a 3 Next
3 a Next Next
So for the expression "a{0,30000}" you 'only' have 30,000 states. But
producing a DFM from an ambiguous expression is not trivial; algorithms
can require exponential time, space or both.
If you try to use derivative parsing and your parser operates by
unfolding the expression (as Jing does) you will not only have 30,000
terms in the original pattern but the expression can expand
combinatorially as the parser matches to allow for the possibility it
might have to "back up", e.g., in the expression "a{0,30000} b |
a{0,29999} c".
Finally, you can implement this as a non-deterministic FSM with some
constant times 30,000 states. But parsing with an NFM requires
backtracking, which in general can require exponential time.
Special-case hackery can do quite a bit better, e.g., using explicit
counters. A DFM with counters has only one state which can be re-entered
29,999 times. I described an analogous approach in my note on bounded
constraint derivative parsing a couple of years ago. But these
approaches aren't much used in practice today, primarily because of the
lack of formal methods for deriving FSM with counters and/or lack of
widespread knowledge.
Bob Foster
http://xmlbuddy.com/
> I can imagine allowing recursive types but that they be constrained in
> any instance of the type. So I would suggest that the term "unbounded"
> be an specifiable constant in a 1.1 schema. I would like to see the
> depth constraints on recursion as a part of a broader constraints system
> in 1.1.
>
> Sorry that I have not been paying close attention - I have been busy.
>
> With respect,
> Steven
> --
> Dr. Steven Ericsson-Zenith
|