Lists Home |
Date Index |
- To: "'XML Developers List'" <email@example.com>
- Subject: Re: [xml-dev] Constrain the Number of Occurrences of Elements in your XML Schema
- From: firstname.lastname@example.org
- Date: Fri, 05 Aug 2005 19:03:57 +0000
I'm going to come in here in agreement with those against the limits, although maybe with different reasons. Even though the intent may be good - to prevent processing overload, it seems, from what Roger said, limiting MaxOccurs and recursion often won't accomplish this in real life cases. Futhermore, the software needs to be able to deal with too-large data sets/documents anyway, no matter what the schema says, because what if someone sent you too large a document anyway?
One of the problems with the limits is that it's really hard to know how big a valid document is going to be. At least, this is so for many practical documents that have lots of optional components, or that have PCDATA chunks of large but variable size. If you put in limits that restrict a single element, you could still have a document overload your processor because other parts of the document were too large or complex.
Inversely, if you restricted every part, then you might not be able to have a valid document when all the optional parts but one were very small, but that one was over the limit. In this case, the restriction would prevent you from using documents that were otherwise perfectly OK to process.
In other words, the combinatorial aspects of these limitations make it hard or impossible to apply to many practical schemas and still allow pretty large or complex documents. Naturally, some simple schemas will allow this to be done, but Roger is talking about a general rule.
Second, the limits Roger mentioned are as much imposed by a particular processing implementation as anything else. You might as well say, don't have a data set too large for memory, because a processor can't sort it efficiently. Well, yes, but all the serious databases know how to sort efficiently using the disk when the data set is too large for memory.
What Roger is proposing is just like restricting the size of a data set to the anticipated memory size of the processing computer. Really, now, when was the last time you looked to see how many data elements your database can sort? Very few people need to do that, and figure out how to deal with it.
Same for photo processing. Should we limit TIFF files to 1 GB by spec, because Photoshop might not be able to process one on a 1 GB computer? No, if I run into trouble because I don't have enough memory for my pictures, I make them smaller, get more memory, or don't try to process them. Why should it be different for schema definitions?
We all know there can be practical reasons for limiting sizes - for example, not processing a URL of indefinite length. In that example, too, the length is not constrained by the spec itself (i.e., rfc), but practical software does impose a limit.
I conclude that these kinds of schema limits should not be mandated as a general rule, though they may be good things in particular circumstances.