OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help



   Re: [xml-dev] [Summary] Constrain the Number of Occurrences of Elementsi

[ Lists Home | Date Index | Thread Index ]

Bob Foster wrote:
That's exactly backwards. From a formal point of view, unbounded and unlimited depth recursion is highly desirable. If it were an issue, why doesn't the Internet set an upper bound on session length, why don't relational databases require you to specify a maximum number of rows per table, etc.?
The same applies in these cases - the behavior of relational database implementation and the Internet are not authoritative and certainly not ideal formally.
From a practical point of view, a session can't go on forever, a table can't have an infinite number of rows and you can't instantiate an infinite size document. But this has little to do with data modeling.
What is your point, exactly? The same as above applies here.

Thanks for your explanation of the parsing issue.  It isn't clear to me still - forgive me for being dense.  You are talking about validating the schema, or the instance?  Why are you doing these expansions?

With respect,

> I don't understand why a validator taking into account maxOccurs=N
> should be inefficient - someone needs to explain that to me.  How hard
> can that be?

Let's write <element name="a" minOccurs="1" maxOccurs="4"> using the shorthand "a{1,4}".

This is equivalent to "a a? a? a?" in standard regular expression notation. So an expression like "a{0,30000}" has 30,000 terms when unfolded! If you try to implement this expression with a state transition matrix, as some do, you will find yourself with 30,000^2 terms.

If you try to implement the expression as a deterministic finite state machine, it might seem less problematic. A minimal DFM for the expression is:

       Match     Success   Fail
Start    a          1       -
1        a          2      Next
2        a          3      Next
3        a         Next    Next

So for the expression "a{0,30000}" you 'only' have 30,000 states. But producing a DFM from an ambiguous expression is not trivial; algorithms can require exponential time, space or both.

If you try to use derivative parsing and your parser operates by unfolding the expression (as Jing does) you will not only have 30,000 terms in the original pattern but the expression can expand combinatorially as the parser matches to allow for the possibility it might have to "back up", e.g., in the expression "a{0,30000} b | a{0,29999} c".

Finally, you can implement this as a non-deterministic FSM with some constant times 30,000 states. But parsing with an NFM requires backtracking, which in general can require exponential time.

Special-case hackery can do quite a bit better, e.g., using explicit counters. A DFM with counters has only one state which can be re-entered 29,999 times. I described an analogous approach in my note on bounded constraint derivative parsing a couple of years ago. But these approaches aren't much used in practice today, primarily because of the lack of formal methods for deriving FSM with counters and/or lack of widespread knowledge.

Bob Foster

> I can imagine allowing recursive types but that they be constrained in
> any instance of the type.  So I would suggest that the term "unbounded"
> be an specifiable constant in a 1.1 schema. I would like to see the
> depth constraints on recursion as a part of a broader constraints system
> in 1.1.
> Sorry that I have not been paying close attention - I have been busy.
> With respect,
> Steven
> --
> Dr. Steven Ericsson-Zenith


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS