[
Lists Home |
Date Index |
Thread Index
]
Um, have you noticed the consequences of setting maxOccurs="30000" in
today's validators? I've seen out-of-memory errors with maxOccurs="1000".
There is a way to avoid the quadratic blowup (probably more than one). I
talked about one in:
http://jroller.com/comments/bobfoster/FullSpeedAhead/derivatives_of_bounded_repitition
and I believe C. M. Sperberg-McQueen is giving a presentation at the
next Extreme that covers the topic, but right now, that's really not
good advice.
Bob Foster
http://xmlbuddy.com/
Roger L. Costello wrote:
> Hi Folks,
>
> Below I have jotted down a few thoughts regarding XML Schemas which
> permit an unbounded number of occurrences. Namely, I recommend against
> using maxOccurs="unbounded" in an XML Schema. I am interested in
> hearing your thoughts on this. /Roger
>
>
>
> Constrain the Number of Occurrences of Elements in your XML Schema
>
> *by Roger L. Costello*
> August 5, 2005
>
>
> Constrain your Data!
>
> In this message I will argue that you should never create XML Schemas
> that permit an unbounded number of occurrences.
>
> There are two ways in XML Schemas to permit an unbounded number of
> occurrences. The first way is to explicitly state that you are
> permitting an unbounded number of occurrences. For example, this
> declaration says that Bookstore can contain an unbounded number of Book
> elements:
>
> <element name="Bookstore">
> <complexType>
> <sequence>
> <element name="Book" type="..." *maxOccurs="unbounded"*/>
> </sequence>
> </complexType>
> </element>
>
> The second way of permitting an unbounded number of occurrences is less
> obvious. Unboundedness occurs implicitly when you create a recursive
> structure. In this example there is no limit to the depth of the Section
> elements. That is, a Section can contain a Section which contains a
> Section which contains a Section ...
>
> <element name="Section" type="SectionType"/>
>
> <complexType name="SectionType">
> <sequence>
> <element name="Title" type="..."/>
> <element name="Section" type="SectionType"/>
> </sequence>
> </complexType>
>
> Both of the above forms permit an unbounded number of occurrences. I
> recommend that you never use either form. That is, never declare an
> element with maxOccurs="unbounded", and never declare a recursive
> structure. Below I explain why.
>
>
> Writing a Journal Article? Your Word Count is Limited!
>
> The situation with specifying the number of occurrences of an element in
> an XML Schema is analogous to the situation with specifying the number
> of words authors can use in an article.
>
> Suppose that you want to write an article for a journal. How many words
> can you use in your article? All journals have an upper limit on the
> number of words that you can use. Why don't the journals set the word
> limit to unbounded? Answer: there are editors that have to check the
> articles for correctness, readability, etc. The editors have limited
> resources (i.e., time). Thus, it is necessary to limit the length of the
> article. Perhaps at a later date the journal will increase the word
> limit (perhaps they hire some full-time editors). But they always have a
> definite upper limit. They never allow articles of unbounded length. The
> reason is because of limited resources.
>
>
> Error! Infinite Loop!
>
> The situation with specifying the number of occurrences of an element in
> an XML Schema is analogous to an infinite loop in programming languages.
> Why are infinite loops deemed "bad" in programming languages, yet
> unbounded occurrences are embraced in data?
>
> Let's see why infinite loops are bad in programming languages. Suppose
> that a program has a loop, and a computer begins to process the loop. It
> requires a certain amount of resources (memory, cpu cycles) for the
> computer to perform one iteration. Two iterations will require a bit
> more resources. Three iterations require still more. ... Infinite
> iterations require infinite resources. Thus, infinite loops are bad
> because they require infinite resources.
>
> The situation is analogous with data. Consider the Bookstore declaration
> above. It declares that an unbounded number of Book elements are
> permitted within Bookstore. A program that must process XML instances
> conforming to the declaration must have the necessary resources (memory,
> cpu cycles). To process one Book element will require a certain amount
> of resources. To process a second Book element will require a bit more
> resources. A third book will require still more resources. ... Infinite
> Books require infinite resources. Even though XML instance documents are
> always finite, the schema indicates that there is a "potential" for an
> infinite number of Book elements. A program that is designed to process
> "any" XML instance document that conforms to the schema must therefore
> have an infinite amount of resources.
>
>
> Okay, then what Value should I use for maxOccurs?
>
> "Suppose that I anticipate that Bookstore will never have more than
> 30,000 Books, so I set maxOccurs='30000'. After some time the
> requirements change and BookStore now needs to be able to hold 35,000
> Books. Won't I have to change the Schema every time my needs change?
> Wouldn't it be easier if I simply declared maxOccurs='unbounded'?"
>
> Answer: yes, you will need to change the Schema whenever your
> requirements change. Yes, it is easier to simply declare
> maxOccurs='unbounded'. But don't do it! The number that you use for
> maxOccurs should be as big as your programs are willing and able to cope
> with, and no more. If at some point the number of actual books exceeds
> that number then they must either (1) extend your program's resources to
> handle the expanded number, or (2) refuse to allow more books.
>
>
> Recap
>
> 1. Don't use maxOccurs="unbounded"
> 2. Don't use recursive constructions
> 3. Set maxOccurs to a number no larger than the amount of resources
> you have available
|