Constrain the Number of Occurrences of Elements in your XML
Schema
by Roger L. Costello
August 5, 2005
Constrain your Data!
In this message I will argue that you should never create XML Schemas that
permit an unbounded number of occurrences.
There are two ways in XML Schemas to permit an unbounded number of
occurrences. The first way is to explicitly state that you are permitting an
unbounded number of occurrences. For example, this declaration says that
Bookstore can contain an unbounded number of Book elements:
<element name="Bookstore">
<complexType>
<sequence>
<element name="Book" type="..." maxOccurs="unbounded"/>
</sequence>
</complexType>
</element>
The second way of permitting an unbounded number of occurrences is less
obvious. Unboundedness occurs implicitly when you create a recursive
structure. In this example there is no limit to the depth of the Section
elements. That is, a Section can contain a Section which contains a Section
which contains a Section ...
<element name="Section" type="SectionType"/>
<complexType name="SectionType">
<sequence>
<element name="Title" type="..."/>
<element name="Section" type="SectionType"/>
</sequence>
</complexType>
Both of the above forms permit an unbounded number of occurrences. I
recommend that you never use either form. That is, never declare an element
with maxOccurs="unbounded", and never declare a recursive structure. Below I
explain why.
Writing a Journal Article? Your Word Count is Limited!
The situation with specifying the number of occurrences of an element in an
XML Schema is analogous to the situation with specifying the number of words
authors can use in an article.
Suppose that you want to write an article for a journal. How many words can
you use in your article? All journals have an upper limit on the number of
words that you can use. Why don't the journals set the word limit to
unbounded? Answer: there are editors that have to check the articles for
correctness, readability, etc. The editors have limited resources (i.e.,
time). Thus, it is necessary to limit the length of the article. Perhaps at a
later date the journal will increase the word limit (perhaps they hire some
full-time editors). But they always have a definite upper limit. They never
allow articles of unbounded length. The reason is because of limited
resources.
Error! Infinite Loop!
The situation with specifying the number of occurrences of an element in an
XML Schema is analogous to an infinite loop in programming languages. Why are
infinite loops deemed "bad" in programming languages, yet unbounded
occurrences are embraced in data?
Let's see why infinite loops are bad in programming languages. Suppose that
a program has a loop, and a computer begins to process the loop. It requires a
certain amount of resources (memory, cpu cycles) for the computer to perform
one iteration. Two iterations will require a bit more resources. Three
iterations require still more. ... Infinite iterations require infinite
resources. Thus, infinite loops are bad because they require infinite
resources.
The situation is analogous with data. Consider the Bookstore declaration
above. It declares that an unbounded number of Book elements are permitted
within Bookstore. A program that must process XML instances conforming to the
declaration must have the necessary resources (memory, cpu cycles). To process
one Book element will require a certain amount of resources. To process a
second Book element will require a bit more resources. A third book will
require still more resources. ... Infinite Books require infinite resources.
Even though XML instance documents are always finite, the schema indicates
that there is a "potential" for an infinite number of Book elements. A program
that is designed to process "any" XML instance document that conforms to the
schema must therefore have an infinite amount of resources.
Okay, then what Value should I use for maxOccurs?
"Suppose that I anticipate that Bookstore will never have more than 30,000
Books, so I set maxOccurs='30000'. After some time the requirements change and
BookStore now needs to be able to hold 35,000 Books. Won't I have to change
the Schema every time my needs change? Wouldn't it be easier if I simply
declared maxOccurs='unbounded'?"
Answer: yes, you will need to change the Schema whenever your requirements
change. Yes, it is easier to simply declare maxOccurs='unbounded'. But don't
do it! The number that you use for maxOccurs should be as big as your programs
are willing and able to cope with, and no more. If at some point the number of
actual books exceeds that number then they must either (1) extend your
program's resources to handle the expanded number, or (2) refuse to allow more
books.
Recap
- Don't use maxOccurs="unbounded"
- Don't use recursive constructions
- Set maxOccurs to a number no larger than the amount of resources you
have available