xml-dev - Re: [xml-dev] [Summary] Constrain the Number of Occurrences of Elements

Re: [xml-dev] [Summary] Constrain the Number of Occurrences of Elements

[ Lists Home | Date Index | Thread Index ]

To: Steven Ericsson Zenith <steven@semeiosis.com>
Subject: Re: [xml-dev] [Summary] Constrain the Number of Occurrences of Elements in your XML Schema
From: Steven Ericsson Zenith <steven@semeiosis.com>
Date: Mon, 8 Aug 2005 20:10:11 -0700
Cc: "Roger L. Costello" <costello@mitre.org>,Bob Foster <bob@objfac.com>,"'Rick Jelliffe'" <ricko@allette.com.au>,"'XML Developers List'" <xml-dev@lists.xml.org>
In-reply-to: <42F80965.1070103@semeiosis.com>
References: <200508081157.j78BvfS15116@smtp-bedford.mitre.org> <42F79807.400@semeiosis.com> <42F7CFB4.9020506@objfac.com> <42F80965.1070103@semeiosis.com>

To be clear here. I consider termination to be a desirable formal property of description in data models and programs. I also consider formal pragmatics an essential ingredient of computable systems. Hence, I do not "disapprove" of recursive types (perhaps this is consistent with Bob's data model request) but seek the expression of a pragmatic constraint on the depth of an instance.

Steven

On Aug 8, 2005, at 6:39 PM, Steven Ericsson Zenith wrote:

Bob Foster wrote:

That's exactly backwards. From a formal point of view, unbounded and unlimited depth recursion is highly desirable. If it were an issue, why doesn't the Internet set an upper bound on session length, why don't relational databases require you to specify a maximum number of rows per table, etc.?

The same applies in these cases - the behavior of relational database implementation and the Internet are not authoritative and certainly not ideal formally.

From a practical point of view, a session can't go on forever, a table can't have an infinite number of rows and you can't instantiate an infinite size document. But this has little to do with data modeling.

What is your point, exactly? The same as above applies here.

Thanks for your explanation of the parsing issue. It isn't clear to me still - forgive me for being dense. You are talking about validating the schema, or the instance? Why are you doing these expansions?

With respect,
Steven

> I don't understand why a validator taking into account maxOccurs=N
> should be inefficient - someone needs to explain that to me. How hard
> can that be?

Let's write <element name="a" minOccurs="1" maxOccurs="4"> using the shorthand "a{1,4}".

This is equivalent to "a a? a? a?" in standard regular expression notation. So an expression like "a{0,30000}" has 30,000 terms when unfolded! If you try to implement this expression with a state transition matrix, as some do, you will find yourself with 30,000^2 terms.

If you try to implement the expression as a deterministic finite state machine, it might seem less problematic. A minimal DFM for the expression is:

       Match     Success   Fail
Start    a          1       -
1        a          2      Next
2        a          3      Next
3        a         Next    Next

So for the expression "a{0,30000}" you 'only' have 30,000 states. But producing a DFM from an ambiguous expression is not trivial; algorithms can require exponential time, space or both.

If you try to use derivative parsing and your parser operates by unfolding the expression (as Jing does) you will not only have 30,000 terms in the original pattern but the expression can expand combinatorially as the parser matches to allow for the possibility it might have to "back up", e.g., in the expression "a{0,30000} b | a{0,29999} c".

Finally, you can implement this as a non-deterministic FSM with some constant times 30,000 states. But parsing with an NFM requires backtracking, which in general can require exponential time.

Special-case hackery can do quite a bit better, e.g., using explicit counters. A DFM with counters has only one state which can be re-entered 29,999 times. I described an analogous approach in my note on bounded constraint derivative parsing a couple of years ago. But these approaches aren't much used in practice today, primarily because of the lack of formal methods for deriving FSM with counters and/or lack of widespread knowledge.

Bob Foster
http://xmlbuddy.com/

> I can imagine allowing recursive types but that they be constrained in
> any instance of the type. So I would suggest that the term "unbounded"
> be an specifiable constant in a 1.1 schema. I would like to see the
> depth constraints on recursion as a part of a broader constraints system
> in 1.1.
>
> Sorry that I have not been paying close attention - I have been busy.
>
> With respect,
> Steven
> --
> Dr. Steven Ericsson-Zenith

References:
- [Summary] Constrain the Number of Occurrences of Elements in your XML Schema
  - From: "Roger L. Costello" <costello@mitre.org>
- Re: [xml-dev] [Summary] Constrain the Number of Occurrences of Elementsin your XML Schema
  - From: Steven Ericsson Zenith <steven@semeiosis.com>
- Re: [xml-dev] [Summary] Constrain the Number of Occurrences of Elementsin your XML Schema
  - From: Bob Foster <bob@objfac.com>
- Re: [xml-dev] [Summary] Constrain the Number of Occurrences of Elementsin your XML Schema
  - From: Steven Ericsson Zenith <steven@semeiosis.com>

Prev by Date: Re: [xml-dev] [Summary] Constrain the Number of Occurrences of Elementsin your XML Schema
Next by Date: Word table to XML
Previous by thread: Re: [xml-dev] [Summary] Constrain the Number of Occurrences of Elementsin your XML Schema
Next by thread: RE: [xml-dev] [Summary] Constrain the Number of Occurrences of Elements in your XML Schema
Index(es):
- Date
- Thread