OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

 


 

   Re: [xml-dev] [Summary] Constrain the Number of Occurrences of Elementsi

[ Lists Home | Date Index | Thread Index ]

Steven Ericsson Zenith wrote:
 > I am very happy to see this discussion - but I am moved to argue against
 > this assertion approach.
 >
 > First, from any formal point of view, unbounded and non-terminating
 > recursive types are undesirable - they are, essentially, "irrational" in
 > every sense of that word.  "unbounded" simply cannot - in any pragmatic
 > sense involving finite resource systems - mean "infinite."

That's exactly backwards. From a formal point of view, unbounded and 
unlimited depth recursion is highly desirable. If it were an issue, why 
doesn't the Internet set an upper bound on session length, why don't 
relational databases require you to specify a maximum number of rows per 
table, etc.?

 From a practical point of view, a session can't go on forever, a table 
can't have an infinite number of rows and you can't instantiate an 
infinite size document. But this has little to do with data modeling.

 > I don't understand why a validator taking into account maxOccurs=N
 > should be inefficient - someone needs to explain that to me.  How hard
 > can that be?

Let's write <element name="a" minOccurs="1" maxOccurs="4"> using the 
shorthand "a{1,4}".

This is equivalent to "a a? a? a?" in standard regular expression 
notation. So an expression like "a{0,30000}" has 30,000 terms when 
unfolded! If you try to implement this expression with a state 
transition matrix, as some do, you will find yourself with 30,000^2 terms.

If you try to implement the expression as a deterministic finite state 
machine, it might seem less problematic. A minimal DFM for the 
expression is:

        Match     Success   Fail
Start    a          1       -
1        a          2      Next
2        a          3      Next
3        a         Next    Next

So for the expression "a{0,30000}" you 'only' have 30,000 states. But 
producing a DFM from an ambiguous expression is not trivial; algorithms 
can require exponential time, space or both.

If you try to use derivative parsing and your parser operates by 
unfolding the expression (as Jing does) you will not only have 30,000 
terms in the original pattern but the expression can expand 
combinatorially as the parser matches to allow for the possibility it 
might have to "back up", e.g., in the expression "a{0,30000} b | 
a{0,29999} c".

Finally, you can implement this as a non-deterministic FSM with some 
constant times 30,000 states. But parsing with an NFM requires 
backtracking, which in general can require exponential time.

Special-case hackery can do quite a bit better, e.g., using explicit 
counters. A DFM with counters has only one state which can be re-entered 
29,999 times. I described an analogous approach in my note on bounded 
constraint derivative parsing a couple of years ago. But these 
approaches aren't much used in practice today, primarily because of the 
lack of formal methods for deriving FSM with counters and/or lack of 
widespread knowledge.

Bob Foster
http://xmlbuddy.com/

 > I can imagine allowing recursive types but that they be constrained in
 > any instance of the type.  So I would suggest that the term "unbounded"
 > be an specifiable constant in a 1.1 schema. I would like to see the
 > depth constraints on recursion as a part of a broader constraints system
 > in 1.1.
 >
 > Sorry that I have not been paying close attention - I have been busy.
 >
 > With respect,
 > Steven
 > --
 > Dr. Steven Ericsson-Zenith






 

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS