OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

 


 

   Re: [xml-dev] Constrain the Number of Occurrences of Elements inyour XML

[ Lists Home | Date Index | Thread Index ]

Um, have you noticed the consequences of setting maxOccurs="30000" in 
today's validators? I've seen out-of-memory errors with maxOccurs="1000".

There is a way to avoid the quadratic blowup (probably more than one). I 
talked about one in:

http://jroller.com/comments/bobfoster/FullSpeedAhead/derivatives_of_bounded_repitition

and I believe C. M. Sperberg-McQueen is giving a presentation at the 
next Extreme that covers the topic, but right now, that's really not 
good advice.

Bob Foster
http://xmlbuddy.com/

Roger L. Costello wrote:
 > Hi Folks,
 >
 > Below I have jotted down a few thoughts regarding XML Schemas which
 > permit an unbounded number of occurrences.  Namely, I recommend against
 > using maxOccurs="unbounded" in an XML Schema.  I am interested in
 > hearing your thoughts on this.  /Roger
 >
 >
 >
 >   Constrain the Number of Occurrences of Elements in your XML Schema
 >
 > *by Roger L. Costello*
 > August 5, 2005
 >
 >
 >     Constrain your Data!
 >
 > In this message I will argue that you should never create XML Schemas
 > that permit an unbounded number of occurrences.
 >
 > There are two ways in XML Schemas to permit an unbounded number of
 > occurrences. The first way is to explicitly state that you are
 > permitting an unbounded number of occurrences. For example, this
 > declaration says that Bookstore can contain an unbounded number of Book
 > elements:
 >
 > <element name="Bookstore">
 >     <complexType>
 >         <sequence>
 >             <element name="Book" type="..." *maxOccurs="unbounded"*/>
 >         </sequence>
 >     </complexType>
 > </element>
 >
 > The second way of permitting an unbounded number of occurrences is less
 > obvious. Unboundedness occurs implicitly when you create a recursive
 > structure. In this example there is no limit to the depth of the Section
 > elements. That is, a Section can contain a Section which contains a
 > Section which contains a Section ...
 >
 > <element name="Section" type="SectionType"/>
 >
 > <complexType name="SectionType">
 >     <sequence>
 >         <element name="Title" type="..."/>
 >         <element name="Section" type="SectionType"/>
 >     </sequence>
 > </complexType>
 >
 > Both of the above forms permit an unbounded number of occurrences. I
 > recommend that you never use either form. That is, never declare an
 > element with maxOccurs="unbounded", and never declare a recursive
 > structure. Below I explain why.
 >
 >
 >     Writing a Journal Article? Your Word Count is Limited!
 >
 > The situation with specifying the number of occurrences of an element in
 > an XML Schema is analogous to the situation with specifying the number
 > of words authors can use in an article.
 >
 > Suppose that you want to write an article for a journal. How many words
 > can you use in your article? All journals have an upper limit on the
 > number of words that you can use. Why don't the journals set the word
 > limit to unbounded? Answer: there are editors that have to check the
 > articles for correctness, readability, etc. The editors have limited
 > resources (i.e., time). Thus, it is necessary to limit the length of the
 > article. Perhaps at a later date the journal will increase the word
 > limit (perhaps they hire some full-time editors). But they always have a
 > definite upper limit. They never allow articles of unbounded length. The
 > reason is because of limited resources.
 >
 >
 >     Error! Infinite Loop!
 >
 > The situation with specifying the number of occurrences of an element in
 > an XML Schema is analogous to an infinite loop in programming languages.
 > Why are infinite loops deemed "bad" in programming languages, yet
 > unbounded occurrences are embraced in data?
 >
 > Let's see why infinite loops are bad in programming languages. Suppose
 > that a program has a loop, and a computer begins to process the loop. It
 > requires a certain amount of resources (memory, cpu cycles) for the
 > computer to perform one iteration. Two iterations will require a bit
 > more resources. Three iterations require still more. ... Infinite
 > iterations require infinite resources. Thus, infinite loops are bad
 > because they require infinite resources.
 >
 > The situation is analogous with data. Consider the Bookstore declaration
 > above. It declares that an unbounded number of Book elements are
 > permitted within Bookstore. A program that must process XML instances
 > conforming to the declaration must have the necessary resources (memory,
 > cpu cycles). To process one Book element will require a certain amount
 > of resources. To process a second Book element will require a bit more
 > resources. A third book will require still more resources. ... Infinite
 > Books require infinite resources. Even though XML instance documents are
 > always finite, the schema indicates that there is a "potential" for an
 > infinite number of Book elements. A program that is designed to process
 > "any" XML instance document that conforms to the schema must therefore
 > have an infinite amount of resources.
 >
 >
 >     Okay, then what Value should I use for maxOccurs?
 >
 > "Suppose that I anticipate that Bookstore will never have more than
 > 30,000 Books, so I set maxOccurs='30000'. After some time the
 > requirements change and BookStore now needs to be able to hold 35,000
 > Books. Won't I have to change the Schema every time my needs change?
 > Wouldn't it be easier if I simply declared maxOccurs='unbounded'?"
 >
 > Answer: yes, you will need to change the Schema whenever your
 > requirements change. Yes, it is easier to simply declare
 > maxOccurs='unbounded'. But don't do it! The number that you use for
 > maxOccurs should be as big as your programs are willing and able to cope
 > with, and no more. If at some point the number of actual books exceeds
 > that number then they must either (1) extend your program's resources to
 > handle the expanded number, or (2) refuse to allow more books.
 >
 >
 >     Recap
 >
 >    1. Don't use maxOccurs="unbounded"
 >    2. Don't use recursive constructions
 >    3. Set maxOccurs to a number no larger than the amount of resources
 >       you have available






 

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS