OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

 


 

   Re: [xml-dev] [Summary] Constrain the Number of Occurrences of Elementsi

[ Lists Home | Date Index | Thread Index ]

I agree that Greg Hunt made some good points about "operational 
constraints," but patching up your solution with Schematron doesn't 
address them.

Still to be discussed:

- What different constraints might be suitable for models of the data 
store vs. models of transactions against the store?

- Different kinds of transaction (e.g., the traditional "batch" and 
"interactive") might impose different constraints.

- What constraints might be better expressed as implementation bounds 
vs. data model? E.g., depth of recursion or sheer number of elements may 
be a problem regardless of element type.

Bob Foster
http://xmlbuddy.com/

Roger L. Costello wrote:
 > Hi Folks,
 >
 > Outstanding discussion!  Many thanks for all the comments.  I think that
 > this is an important issue.  Below I have tried to summarize the
 > discussion (it doesn't include the most recent comments).  Also, at the
 > bottom of this message I have a proposal for getting the best of both
 > viewpoints by using a combination of XML Schemas and Schematron.  /Roger
 >
 >
 >     Issue
 >
 > /Should unbounded occurrences be permitted in an XML Schema?/
 >
 >
 >     Two Approaches to Allowing an Unbounded Number of Occurrences
 >
 > There are two approaches in XML Schemas for allowing an unbounded number
 > of occurrences. Below I discuss these two approaches. Following that I
 > discuss two viewpoints on whether unbounded number of occurrences should
 > or should not be permitted. I then discuss the advantages and
 > disadvantages of each viewpoint. Finally, I propose a compromise of the
 > two viewpoints.
 >
 >
 >     Allowing Unbounded Occurrences using maxOccurs
 >
 > The first approach to allowing an unbounded number of occurrences is to
 > explicitly state that you want an unbounded number of occurrences by
 > using maxOccurs="unbounded". For example, the following declaration says
 > that Bookstore can contain an unbounded number of Book elements:
 >
 > <element name="Bookstore">
 >     <complexType>
 >         <sequence>
 >             <element name="Book" type="..." *maxOccurs="unbounded"*/>
 >         </sequence>
 >     </complexType>
 > </element>
 >
 >
 >     Allowing Unbounded Occurrences using a Recursive Expression
 >
 > The second approach to allowing an unbounded number of occurrences is
 > less obvious. Unboundedness occurs implicitly when you create a
 > recursive structure. In the following example there is no limit to the
 > depth of the Section elements. That is, a Section can contain a Section
 > which contains a Section which contains a Section ...
 >
 > <element name="Section" type="SectionType"/>
 >
 > <complexType name="SectionType">
 >     <sequence>
 >         <element name="Title" type="..."/>
 >         <element name="Section" type="SectionType"/>
 >     </sequence>
 > </complexType>
 >
 >
 >     Comparison of the Two Approaches
 >
 > Both of the above approaches allow an unbounded number of occurrences.
 > Let's compare the two approaches:
 >
 >    1. *Explicit vs Implicit:* With the first approach you explicitly
 >       state that you are allowing an unbounded number of occurrences.
 >       With the second approach unboundedness is implicit. Although it is
 >       obvious in the example presented, in a large Schema where the
 >       recursion extends through many complexTypes it may not be obvious
 >       that an unbounded number of occurrences are being allowed.
 >
 >    2. *Ability to "throttle back" on the Number of Occurrences:* With
 >       the first approach is it easy to reduce the number of occurrences.
 >       If you don't want an unbounded number of occurrences, and want,
 >       say, only 10 occurrences then you simply specify maxOccurs="10".
 >       With the second approach there is no means to control the depth of
 >       the recursion. There is no means to say, "There cannot be more
 >       than 10 deep Section elements".
 >
 >
 >     Permit Unbounded Occurrences or Constrain the Occurrences?
 >
 > Now that we have seen the two approaches for allowing an unbounded
 > number of occurrences we return to the main issue: when designing an XML
 > Schema should you permit an unbounded number of occurrences? There are
 > two viewpoints on this issue. One viewpoint is that you should design
 > your Schemas to permit an unbounded occurrences. The other viewpoint is
 > that you should not permit an unbounded number of occurrences.
 >
 > To keep the discussion concrete, let's take the above Bookstore Schema
 > as the example. Suppose that it is decided that Bookstore should not
 > allow more than 30,000 Books. Should the Schema be designed to allow an
 > unbounded number of Books:
 >
 > <element name="Bookstore">
 >     <complexType>
 >         <sequence>
 >             <element name="Book" type="..." *maxOccurs="unbounded"*/>
 >         </sequence>
 >     </complexType>
 > </element>
 >
 > Or, should the Schema be designed to explicitly state 30,000 as the 
maximum:
 >
 > <element name="Bookstore">
 >     <complexType>
 >         <sequence>
 >             <element name="Book" type="..." *maxOccurs="30000"*/>
 >         </sequence>
 >     </complexType>
 > </element>
 >
 >
 >     Viewpoint 1: Permit an Unbounded Number of Occurrences
 >
 > This viewpoint says that it's better to use maxOccurs="unbounded".
 >
 > There is a technical problem with setting maxOccurs="30000". Michael Kay
 > nicely summarizes the problem: "the classical algorithms for turning
 > grammars into finite state machines produce very inefficient machines
 > when there are occurrence limits that are large but finite. Many schema
 > processors break or consume seriously large amounts of memory if you
 > specify a maxOccurs value (other than unbounded) that's greater than a
 > couple of hundred." In other words, a Schema validator will choke if you
 > specify maxOccurs="30000".
 >
 > If the Bookstore wants, at a later date, to expand to accommodate, say,
 > 35,000 Books then no change will be needed to the Schema.
 >
 > The example being considered is just one element. A Schema is, of
 > course, usually comprised of many elements. Suppose that each element is
 > constrained as precisely as possible. Then a document may be rejected
 > because one element exceeded its constraints while all other others were
 > within their constraints.
 >
 >
 >     Viewpoint 2: Constrain the Number of Occurrences
 >
 > This viewpoint says that it's better to use maxOccurs="30000".
 >
 > It is important to distinguish between *theoretical constraints* and
 > *practical constraints*. Theoretically, a Bookstore may have an
 > unbounded number of Books, but for performance/capacity/security reasons
 > this Bookstore can only handle 30,000 Books.
 >
 > Another example of the difference between theoretical and practical
 > limits: theoretically an HTML document may have an infinite number of
 > <p> elements, but in practice a browser supports only a specific maximum.
 >
 > Every system has practical constraints. There are many possible reasons
 > for the constraint such as performance, capacity, or security
 > constraints. The particulars are irrelevant. What is relevant, however,
 > is that it is guaranteed that no system is infinite and consequently all
 > systems have practical constraints.
 >
 > Expressing the practical limits in an XML Schema are especially
 > important for Service Level Agreements (SLAs).
 >
 > So, there are theoretical constraints and there are practical
 > constraints. And as we've seen, typically the two are not identical. The
 > purpose of an XML Schema is to express constraints. Which constraints
 > should a Schema express - theoretical constraints or practical
 > constraints? Viewpoint 2 says that a Schema should express the practical
 > constraints. Thus, for the Bookstore example, set maxOccurs="30000".
 >
 > See Greg Hunt's messages for an excellent discussion of this viewpoint.
 >
 >
 >     Viewpoint 1: Permit an Unbounded Number of Occurrences
 >
 >     *
 >
 >
 >           Advantages
 >
 >           o *Schema Validator Efficiency:* Schema validators are more
 >             efficient when you use maxOccurs="unbounded" than when you
 >             set maxOccurs to a large number
 >           o *Accommodates Growth:* as a system is expanded to support
 >             larger amounts of data, there is no need to change the 
Schema.
 >     *
 >
 >
 >           Disadvantages
 >
 >           o *Pushes the Constraint-Checking Problem to Another Part of
 >             the System:* if the practical limits are not expressed in
 >             the Schema, then they have to be expressed somewhere else -
 >             somewhere less visible, less maintainable, and with less
 >             tool support.
 >           o *Vulnerable to Denial of Service Attack:* XML guards depend
 >             upon an XML Schema to indicate whether an XML instance
 >             document should be allowed to pass. It will be unable to
 >             detect a denial of service attack since the Schema sets a
 >             theoretical limit and not a practical limit.
 >
 >
 >     Viewpoint 2: Constrain the Number of Occurrences
 >
 >     *
 >
 >
 >           Advantages
 >
 >           o *Constraints are Visible, Maintainable, and with Tool
 >             Support:* since the practical limits are expressed in the
 >             Schema, they are visible, maintainable, and with tool 
support.
 >           o *Prevent Denial of Service Attack:* XML guards depend upon
 >             an XML Schema to indicate whether an XML instance document
 >             should be allowed to pass. It will be able to detect a
 >             denial of service attack since the Schema sets a practical
 >             limit and not a theoretical limit.
 >     *
 >
 >
 >           Disadvantages
 >
 >           o *Schema Validator is Inefficient:* Schema validators are not
 >             efficient when you set maxOccurs to a large number
 >           o *Requires Periodic Update:* as a system is expanded to
 >             support larger amounts of data, the Schema will need to be
 >             updated.
 >           o *Exposing System Limits:* by setting maxOccurs="30000" you
 >             are giving information to people about the limitations of
 >             your system.
 >
 >
 >     Proposal: Constraining Data without the Validator Inefficiencies
 >
 > As was described above the current implementation of Schema validators
 > are very inefficient when you specify a large number for the value of
 > maxOccurs. So, even if you want to express practical limits you cannot,
 > due to this Schema validator implementation problem.
 >
 > Here is a proposal to avoid the inefficiency: express the theoretical
 > limits using XML Schemas and express the practical limits using
 > Schematron assertions. Let's consider the Bookstore example. Below I
 > show that XML Schemas is to indicate that Bookstore is comprised of an
 > unbounded number of Books, and I use Schematron to indicate that the
 > practical limit to the number of Books must not exceed 30,000:
 >
 > <element name="Bookstore">
 >     <complexType>
 >         <sequence>
 >             <element name="Book" type="..." *maxOccurs="unbounded"*>
 >                 *<schematron:assert test="count(Book) <= 30000"/>*
 >             </element>
 >         </sequence>
 >     </complexType>
 > </element>
 >
 > Comments?
 >
 >
 >






 

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS