Bob Foster wrote:
That's
exactly backwards. From a formal point of view, unbounded and unlimited
depth recursion is highly desirable. If it were an issue, why doesn't
the Internet set an upper bound on session length, why don't relational
databases require you to specify a maximum number of rows per table,
etc.?
The same applies in these cases - the
behavior of relational database implementation and the Internet are not
authoritative and certainly not ideal formally.
From a
practical point of view, a session can't go on forever, a table can't
have an infinite number of rows and you can't instantiate an infinite
size document. But this has little to do with data modeling.
What is your point, exactly? The same as
above applies here.
Thanks for your explanation of the parsing issue. It isn't clear to me
still - forgive me for being dense. You are talking about validating
the schema, or the instance? Why are you doing these expansions?
With respect,
Steven
> I don't understand why a validator taking into account maxOccurs=N
> should be inefficient - someone needs to explain that to me. How
hard
> can that be?
Let's write <element name="a" minOccurs="1" maxOccurs="4"> using
the shorthand "a{1,4}".
This is equivalent to "a a? a? a?" in standard regular expression
notation. So an expression like "a{0,30000}" has 30,000 terms when
unfolded! If you try to implement this expression with a state
transition matrix, as some do, you will find yourself with 30,000^2
terms.
If you try to implement the expression as a deterministic finite state
machine, it might seem less problematic. A minimal DFM for the
expression is:
Match Success Fail
Start a 1 -
1 a 2 Next
2 a 3 Next
3 a Next Next
So for the expression "a{0,30000}" you 'only' have 30,000 states. But
producing a DFM from an ambiguous expression is not trivial; algorithms
can require exponential time, space or both.
If you try to use derivative parsing and your parser operates by
unfolding the expression (as Jing does) you will not only have 30,000
terms in the original pattern but the expression can expand
combinatorially as the parser matches to allow for the possibility it
might have to "back up", e.g., in the expression "a{0,30000} b |
a{0,29999} c".
Finally, you can implement this as a non-deterministic FSM with some
constant times 30,000 states. But parsing with an NFM requires
backtracking, which in general can require exponential time.
Special-case hackery can do quite a bit better, e.g., using explicit
counters. A DFM with counters has only one state which can be
re-entered 29,999 times. I described an analogous approach in my note
on bounded constraint derivative parsing a couple of years ago. But
these approaches aren't much used in practice today, primarily because
of the lack of formal methods for deriving FSM with counters and/or
lack of widespread knowledge.
Bob Foster
http://xmlbuddy.com/
> I can imagine allowing recursive types but that they be
constrained in
> any instance of the type. So I would suggest that the term
"unbounded"
> be an specifiable constant in a 1.1 schema. I would like to see
the
> depth constraints on recursion as a part of a broader constraints
system
> in 1.1.
>
> Sorry that I have not been paying close attention - I have been
busy.
>
> With respect,
> Steven
> --
> Dr. Steven Ericsson-Zenith
|