Lists Home |
Date Index |
You see to be arguing that transient system limitations (memory constraints, CPU processing power, etc) should end up making their way into schemas of formal specifications. Using that argument what should have been the limit on <p> elements in HTML or <item> elements in RSS based on the typical machine's processing power from the 1990s? After putting those limitations in their schemas would we have had to rev them every 18 months to account for Moore's law?
PITHY WORDS OF WISDOM
A meeting is an event at which the minutes are kept and the hours are lost.
From: Greg Hunt [mailto:firstname.lastname@example.org]
Sent: Fri 8/5/2005 8:51 PM
Subject: Re: [xml-dev] Constrain the Number of Occurrences of Elements in your XML Schema
I'd argue that there are two kinds of constraint here. One is the
theoretical data model, and I'm inclined to agree that too much detail
there is a bad thing. The second constraint is a statement of what the
actual system is willing to, or can do. We need both kinds of
constraint but are forced to express them in using the same attributes
and that is where the problem arises, a conflict between system
specification and data model.
The second type of constraint, the practical capacity constraint, is
important if we are to distribute and/or enforce statements about
performance and capacity. For example I may regard maxOccurs=3
differently to maxOccurs="unbounded". Specifying unbounded may be true
for the data model when 3 is the real operational number. The example
of the number of <p> elements that Mozilla can handle illustrates this.
If, in some hypothetical system, I have some hard browser UI performance
requirement, that requirement will be on specified hardware with
specified software and I will feel free to specify the number of <p>s
based on that, regardless of the data model for the English language
saying that maxOccurs is unbounded for paragraphs. The idea that we
have to handle anything that is thrown at us means either that we have
no performance constraints, no software cost constraints, or that we
have to have any amount of capacity sitting around waiting for
arbitrarily large documents.
There was an example of image processing given earlier, talking about
TIFFs and file sizes, but in any production system there should be
something that says that the file format is TIFF AND also something that
says that the maximum size is 1GB (or whatever it is). Businesses
cannot undertake to process things that consume arbitrary amounts of
The problem of public standards is an interesting one and experience
with EDI is worth looking at. Typically the basic EDI message type is
constrained further, outside of the EDI specification, by a local,
partner specific profile, before it can be used for interchange. A few
months ago I was looking at the EDI profile for interchange with a very
large retail chain. This profile specified a subset of the formal
standard: they specified exactly how many occurrences of what were
acceptable and what literal values were acceptable where. This wasn't
theoretical data model stuff, this was "do it this way and we will do
business with you". Isn't this what we need to be able to support in
XML? If we don't put these limits in the schema, they just have to go
somewhere else, somewhere less visible, less maintainable, and with less
tool support. How SHOULD we do this if we aren't using the schema for
validation of these constraints?
Joe English wrote:
>Roger L. Costello wrote:
>>Below I have jotted down a few thoughts regarding XML Schemas which permit
>>an unbounded number of occurrences. Namely, I recommend against using
>>maxOccurs="unbounded" in an XML Schema. I am interested in hearing your
>>thoughts on this.
>My thoughts lead to the exact opposite conclusion:
>you should never use anything *except* maxOccurs="unbounded"
>(or maxOccurs="1") in a schema.
>With very few exceptions, any attempt to devise a suitable
>upper bound for any 'maxOccurs' value is bound to involve
>wild-ass-guessery. How many paragraphs should one allow
>in an HTML document? You can take this from a business logic
>standpoint ("what's the longest web page anyone is ever
>going to want to produce"), or a processing standpoint
>(how many <p> elements can Mozilla cope with? What about
>MSIE? Does your answer change depending on how old the
>user's computer is?), but you'll never be able to come up
>with a satisfactory number. Whatever number you choose
>will either be too large as a meaningful resource constraint,
>or it will be too small for some existing or future document.
>What you advocate is reminiscent of the QUANTITY and CAPACITY
>sections of the SGML declaration. These were a perpetual annnoyance
>(the SGML declaration was the first thing that got dumped
>when XML was being designed), and as far as I know they
>never did anybody any good (i.e., they were never an accurate
>indication of how large a document any particular application
>could actually handle).
>>1. Don't use maxOccurs="unbounded"
>>2. Don't use recursive constructions
>>3. Set maxOccurs to a number no larger than the amount of resources you
>I'd argue the exact opposite, mostly because (3) is in
>practice impossible to answer, and rarely worth even trying
>to answer in the first place.
>There are only three sensible cardinalities: zero, one, and many.
>There are only four sensible cardinality constraints: mandatory,
>optional, mandatory+repeatable and mandatory+optional.
>(Corrolary: "?", "+", and "*" operators as found in DTDs and Relax NG
>are far more appropriate than WXS' separate minOccurs= and maxOccurs=
>The xml-dev list is sponsored by XML.org <http://www.xml.org>, an
>initiative of OASIS <http://www.oasis-open.org>
>The list archives are at http://lists.xml.org/archives/xml-dev/
>To subscribe or unsubscribe from this list use the subscription
The xml-dev list is sponsored by XML.org <http://www.xml.org>, an
initiative of OASIS <http://www.oasis-open.org>
The list archives are at http://lists.xml.org/archives/xml-dev/
To subscribe or unsubscribe from this list use the subscription