[
Lists Home |
Date Index |
Thread Index
]
I agree that Greg Hunt made some good points about "operational
constraints," but patching up your solution with Schematron doesn't
address them.
Still to be discussed:
- What different constraints might be suitable for models of the data
store vs. models of transactions against the store?
- Different kinds of transaction (e.g., the traditional "batch" and
"interactive") might impose different constraints.
- What constraints might be better expressed as implementation bounds
vs. data model? E.g., depth of recursion or sheer number of elements may
be a problem regardless of element type.
Bob Foster
http://xmlbuddy.com/
Roger L. Costello wrote:
> Hi Folks,
>
> Outstanding discussion! Many thanks for all the comments. I think that
> this is an important issue. Below I have tried to summarize the
> discussion (it doesn't include the most recent comments). Also, at the
> bottom of this message I have a proposal for getting the best of both
> viewpoints by using a combination of XML Schemas and Schematron. /Roger
>
>
> Issue
>
> /Should unbounded occurrences be permitted in an XML Schema?/
>
>
> Two Approaches to Allowing an Unbounded Number of Occurrences
>
> There are two approaches in XML Schemas for allowing an unbounded number
> of occurrences. Below I discuss these two approaches. Following that I
> discuss two viewpoints on whether unbounded number of occurrences should
> or should not be permitted. I then discuss the advantages and
> disadvantages of each viewpoint. Finally, I propose a compromise of the
> two viewpoints.
>
>
> Allowing Unbounded Occurrences using maxOccurs
>
> The first approach to allowing an unbounded number of occurrences is to
> explicitly state that you want an unbounded number of occurrences by
> using maxOccurs="unbounded". For example, the following declaration says
> that Bookstore can contain an unbounded number of Book elements:
>
> <element name="Bookstore">
> <complexType>
> <sequence>
> <element name="Book" type="..." *maxOccurs="unbounded"*/>
> </sequence>
> </complexType>
> </element>
>
>
> Allowing Unbounded Occurrences using a Recursive Expression
>
> The second approach to allowing an unbounded number of occurrences is
> less obvious. Unboundedness occurs implicitly when you create a
> recursive structure. In the following example there is no limit to the
> depth of the Section elements. That is, a Section can contain a Section
> which contains a Section which contains a Section ...
>
> <element name="Section" type="SectionType"/>
>
> <complexType name="SectionType">
> <sequence>
> <element name="Title" type="..."/>
> <element name="Section" type="SectionType"/>
> </sequence>
> </complexType>
>
>
> Comparison of the Two Approaches
>
> Both of the above approaches allow an unbounded number of occurrences.
> Let's compare the two approaches:
>
> 1. *Explicit vs Implicit:* With the first approach you explicitly
> state that you are allowing an unbounded number of occurrences.
> With the second approach unboundedness is implicit. Although it is
> obvious in the example presented, in a large Schema where the
> recursion extends through many complexTypes it may not be obvious
> that an unbounded number of occurrences are being allowed.
>
> 2. *Ability to "throttle back" on the Number of Occurrences:* With
> the first approach is it easy to reduce the number of occurrences.
> If you don't want an unbounded number of occurrences, and want,
> say, only 10 occurrences then you simply specify maxOccurs="10".
> With the second approach there is no means to control the depth of
> the recursion. There is no means to say, "There cannot be more
> than 10 deep Section elements".
>
>
> Permit Unbounded Occurrences or Constrain the Occurrences?
>
> Now that we have seen the two approaches for allowing an unbounded
> number of occurrences we return to the main issue: when designing an XML
> Schema should you permit an unbounded number of occurrences? There are
> two viewpoints on this issue. One viewpoint is that you should design
> your Schemas to permit an unbounded occurrences. The other viewpoint is
> that you should not permit an unbounded number of occurrences.
>
> To keep the discussion concrete, let's take the above Bookstore Schema
> as the example. Suppose that it is decided that Bookstore should not
> allow more than 30,000 Books. Should the Schema be designed to allow an
> unbounded number of Books:
>
> <element name="Bookstore">
> <complexType>
> <sequence>
> <element name="Book" type="..." *maxOccurs="unbounded"*/>
> </sequence>
> </complexType>
> </element>
>
> Or, should the Schema be designed to explicitly state 30,000 as the
maximum:
>
> <element name="Bookstore">
> <complexType>
> <sequence>
> <element name="Book" type="..." *maxOccurs="30000"*/>
> </sequence>
> </complexType>
> </element>
>
>
> Viewpoint 1: Permit an Unbounded Number of Occurrences
>
> This viewpoint says that it's better to use maxOccurs="unbounded".
>
> There is a technical problem with setting maxOccurs="30000". Michael Kay
> nicely summarizes the problem: "the classical algorithms for turning
> grammars into finite state machines produce very inefficient machines
> when there are occurrence limits that are large but finite. Many schema
> processors break or consume seriously large amounts of memory if you
> specify a maxOccurs value (other than unbounded) that's greater than a
> couple of hundred." In other words, a Schema validator will choke if you
> specify maxOccurs="30000".
>
> If the Bookstore wants, at a later date, to expand to accommodate, say,
> 35,000 Books then no change will be needed to the Schema.
>
> The example being considered is just one element. A Schema is, of
> course, usually comprised of many elements. Suppose that each element is
> constrained as precisely as possible. Then a document may be rejected
> because one element exceeded its constraints while all other others were
> within their constraints.
>
>
> Viewpoint 2: Constrain the Number of Occurrences
>
> This viewpoint says that it's better to use maxOccurs="30000".
>
> It is important to distinguish between *theoretical constraints* and
> *practical constraints*. Theoretically, a Bookstore may have an
> unbounded number of Books, but for performance/capacity/security reasons
> this Bookstore can only handle 30,000 Books.
>
> Another example of the difference between theoretical and practical
> limits: theoretically an HTML document may have an infinite number of
> <p> elements, but in practice a browser supports only a specific maximum.
>
> Every system has practical constraints. There are many possible reasons
> for the constraint such as performance, capacity, or security
> constraints. The particulars are irrelevant. What is relevant, however,
> is that it is guaranteed that no system is infinite and consequently all
> systems have practical constraints.
>
> Expressing the practical limits in an XML Schema are especially
> important for Service Level Agreements (SLAs).
>
> So, there are theoretical constraints and there are practical
> constraints. And as we've seen, typically the two are not identical. The
> purpose of an XML Schema is to express constraints. Which constraints
> should a Schema express - theoretical constraints or practical
> constraints? Viewpoint 2 says that a Schema should express the practical
> constraints. Thus, for the Bookstore example, set maxOccurs="30000".
>
> See Greg Hunt's messages for an excellent discussion of this viewpoint.
>
>
> Viewpoint 1: Permit an Unbounded Number of Occurrences
>
> *
>
>
> Advantages
>
> o *Schema Validator Efficiency:* Schema validators are more
> efficient when you use maxOccurs="unbounded" than when you
> set maxOccurs to a large number
> o *Accommodates Growth:* as a system is expanded to support
> larger amounts of data, there is no need to change the
Schema.
> *
>
>
> Disadvantages
>
> o *Pushes the Constraint-Checking Problem to Another Part of
> the System:* if the practical limits are not expressed in
> the Schema, then they have to be expressed somewhere else -
> somewhere less visible, less maintainable, and with less
> tool support.
> o *Vulnerable to Denial of Service Attack:* XML guards depend
> upon an XML Schema to indicate whether an XML instance
> document should be allowed to pass. It will be unable to
> detect a denial of service attack since the Schema sets a
> theoretical limit and not a practical limit.
>
>
> Viewpoint 2: Constrain the Number of Occurrences
>
> *
>
>
> Advantages
>
> o *Constraints are Visible, Maintainable, and with Tool
> Support:* since the practical limits are expressed in the
> Schema, they are visible, maintainable, and with tool
support.
> o *Prevent Denial of Service Attack:* XML guards depend upon
> an XML Schema to indicate whether an XML instance document
> should be allowed to pass. It will be able to detect a
> denial of service attack since the Schema sets a practical
> limit and not a theoretical limit.
> *
>
>
> Disadvantages
>
> o *Schema Validator is Inefficient:* Schema validators are not
> efficient when you set maxOccurs to a large number
> o *Requires Periodic Update:* as a system is expanded to
> support larger amounts of data, the Schema will need to be
> updated.
> o *Exposing System Limits:* by setting maxOccurs="30000" you
> are giving information to people about the limitations of
> your system.
>
>
> Proposal: Constraining Data without the Validator Inefficiencies
>
> As was described above the current implementation of Schema validators
> are very inefficient when you specify a large number for the value of
> maxOccurs. So, even if you want to express practical limits you cannot,
> due to this Schema validator implementation problem.
>
> Here is a proposal to avoid the inefficiency: express the theoretical
> limits using XML Schemas and express the practical limits using
> Schematron assertions. Let's consider the Bookstore example. Below I
> show that XML Schemas is to indicate that Bookstore is comprised of an
> unbounded number of Books, and I use Schematron to indicate that the
> practical limit to the number of Books must not exceed 30,000:
>
> <element name="Bookstore">
> <complexType>
> <sequence>
> <element name="Book" type="..." *maxOccurs="unbounded"*>
> *<schematron:assert test="count(Book) <= 30000"/>*
> </element>
> </sequence>
> </complexType>
> </element>
>
> Comments?
>
>
>
|