[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: XML Schemas: Best Practices
- From: Caroline Clewlow <cclewlow@eris.dera.gov.uk>
- To: "Roger L. Costello" <costello@mitre.org>
- Date: Wed, 17 Jan 2001 09:40:37 +0000
This is definately a topic of interest to me - the two options you give
both have their advantages but I can see the point you make regarding the
higher level of control to be gained from using the 'any' element.
Allowing the use of a derived type is in danger of giving too much scope
for virtually any number of elements to be added.
Sorry I'm not adding much to the discussion here...... I just realised I've
done nothing more than agree with your suggestions !
As an aside - various papers I have looked at have specified XML schema to
be a 'closed' content model. Although these techniques do not make the
model strictly 'open', this level of extensibility and control over such
would seem to suggest a half way house...... perhaps semi-open ;-)
Regards
Caroline
"Roger L. Costello" wrote:
> Hi Folks,
>
> I would like to start on a new issue. I think that this issue will
> generate a lot of interest, as it is critical to designing robust
> schemas.
>
> Issue: What is Best Practice for creating extensible content models?
>
> Below I have jotted down some initial thoughts on this subject. I
> am sure that I have missed many techniques for creating extensible
> content models. What are your thoughts on this topic?
>
> Techniques for Creating Extensible Content Models
>
> [1] Use types to create extensible content models. Consider this
> schema snippet:
>
> <element name="BookCatalogue">
> <complexType>
> <sequence>
> <element name="Book" minOccurs="0"
> maxOccurs="unbounded">
> <complexType>
> <sequence>
> <element name="Title" type="string"/>
> <element name="Author" type="string"/>
> <element name="Date" type="year"/>
> <element name="ISBN" type="string"/>
> <element name="Publisher" type="string"/>
> </sequence>
> </complexType>
> </element>
> </sequence>
> </complexType>
> </element>
>
> This schema snippet dictates that in instance documents <Book> elements
> must always be comprised of exactly 5 elements <Title>, <Author>,
> <Date>, <ISBN>, and <Publisher>. For example:
>
> <Book>
> <Title>The First and Last Freedom</Title>
> <Author>J. Krishnamurti</Author>
> <Date>1954</Date>
> <ISBN>0-06-064831-7</ISBN>
> <Publisher>Harper & Row</Publisher>
> </Book>
>
> The schema creates instance documents that are completely static and
> non extensible.
>
> On the other hand, consider this version of the schema, where I have
> defined Book's content model with a type definition:
>
> <complexType name="BookType">
> <sequence>
> <element name="Title" type="string"/>
> <element name="Author" type="string"/>
> <element name="Date" type="year"/>
> <element name="ISBN" type="string"/>
> <element name="Publisher" type="string"/>
> </sequence>
> </complexType>
> <element name="BookCatalogue">
> <complexType>
> <sequence>
> <element name="Book" type="c:BookType" minOccurs="0"
> maxOccurs="unbounded"/>
> </sequence>
> </complexType>
> </element>
>
> Recall that via the mechanism of type substitutability, the contents
> of <Book> can be substituted by any type that derives from BookType.
> For example, if we create a type which derives from BookType:
>
> <complexType name="BookTypePlusReviewer">
> <complexContent>
> <extension base="c:BookType" >
> <sequence>
> <element name="Reviewer" type="string"/>
> </sequence>
> </extension>
> </complexContent>
> </complexType>
>
> then instance documents can create a <Book> element that
> contains a <Reviewer> element, along with the other five elements:
>
> <Book xsi:type="BookTypePlusReviewer">
> <Title>My Life and Times</Title>
> <Author>Paul McCartney</Author>
> <Date>1998</Date>
> <ISBN>94303-12021-43892</ISBN>
> <Publisher>McMillin Publishing</Publisher>
> <Reviewer>Roger Costello</Reviewer>
> </Book>
>
> In my example, I defined BookTypePlusReviewer within the same
> schema as BookType. In general, however, this may not be the case.
> Other schemas can import the BookCatalogue schema and define types
> which derive from BookType. Thus, the contents of Book may be
> extended, without modifying the BookCatalogue schema!
>
> This type substitutability mechanism is a powerful extensibility
> mechanism. However, it suffers from two problems:
>
> [1] Location Restricted Extensibility: The extensibility is restricted
> to appending elements onto the end of the content model
> (after the <Publisher> element). What if we wanted to extend
> <Book> by adding elements to the beginning (before <Title>), or in
> the middle, etc? We can't do it with this mechanism.
>
> [2] Unexpected Extensibility: If you look at the declaration for Book:
>
> <element name="Book" type="c:BookType" minOccurs="0"
> maxOccurs="unbounded"/>
>
> and the definition for BookType:
>
> <complexType name="BookType">
> <sequence>
> <element name="Title" type="string"/>
> <element name="Author" type="string"/>
> <element name="Date" type="year"/>
> <element name="ISBN" type="string"/>
> <element name="Publisher" type="string"/>
> </sequence>
> </complexType>
>
> it is easy to be fooled into thinking that in instance documents the
> <Book> elements will always contain just <Title>, <Author>, <Date>,
> <ISBN>, and <Publisher>. It is easy to forget that someone could
> extend the content model using the type substitutability mechanism.
> Extensibility is unexpected! Consequently, if you write a program to
> process BookCatalogue instance documents, you may forget to take into
> account the fact that a <Book> element may contain more than five
> children.
>
> It would be nice if there was a way to explicitly flag places where
> extensibility may occur: "hey, instance documents may extend <Book> at
> this point, so be sure to write your code taking this possibility into
> account." In addition, it would be nice if we could extend Book's
> content model at locations other than just the end ... The <any>
> element gives us these capabilities beautifully:
>
> <element name="BookCatalogue">
> <complexType>
> <sequence>
> <element name="Book" type="minOccurs="0"
> maxOccurs="unbounded">
> <complexType>
> <sequence>
> <element name="Title" type="string"/>
> <element name="Author" type="string"/>
> <element name="Date" type="year"/>
> <element name="ISBN" type="string"/>
> <element name="Publisher" type="string"/>
> <any namespace="##any" minOccurs="0"/>
> </sequence>
> </complexType>
> </element>
> </sequence>
> </complexType>
> </element>
>
> In this version of the schema I have made explicit the fact that after
> the <Publication> element any well-formed XML element may occur and
> the XML element may come from any namespace.
>
> Note that I could have put the <any> element within a BookType:
>
> <complexType name="BookType">
> <sequence>
> <element name="Title" type="string"/>
> <element name="Author" type="string"/>
> <element name="Date" type="year"/>
> <element name="ISBN" type="string"/>
> <element name="Publisher" type="string"/>
> <any namespace="##any" minOccurs="0" maxOccurs="1"/>
> </sequence>
> </complexType>
>
> and then declared Book to be of type BookType:
>
> <element name="Book" type="c:BookType" minOccurs="0"
> maxOccurs="unbounded"/>
>
> However, then we are back to the "unexpected extensibility" problem.
> Namely, after the <Publication> element any well-formed XML element
> may occur. After that, anything could be present.
>
> Thus, I chose not to use a type so that I could control the
> extensibility.
>
> There is another way to control the extensibility and still use a type.
> I can use the BookType and add a block attribute to Book:
>
> <element name="Book" type="c:BookType" block="#all"
> minOccurs="0" maxOccurs="unbounded"/>
>
> The block attribute prohibits derived types from being used in
> Book's content model. I prefer this later way of controlling
> extensibility than the in-line version because it creates a reusable
> component (BookType), and yet we still have control over the
> extensibility.
>
> With the <any> element we have complete control over where, and how
> much extensibility we want to allow. For example, suppose that we
> want to enable there to be at most two new elements at the top of
> Book's content model. Here's how to specify that using the <any>
> element:
>
> <complexType name="BookType">
> <sequence>
> <any namespace="##any" minOccurs="0" maxOccurs="2"/>
> <element name="Title" type="string"/>
> <element name="Author" type="string"/>
> <element name="Date" type="year"/>
> <element name="ISBN" type="string"/>
> <element name="Publisher" type="string"/>
> </sequence>
> </complexType>
>
> Note how I have placed the <any> element at the top of the content
> model, and have set maxOccurs="2". Thus, in instance documents the
> <Book> content will always end with <Title>, <Author>, <Date>, <ISBN>,
> and <Publisher>. Prior to that, two well-formed XML elements may
> occur.
>
> I must admit that I am biased towards using the <any> element as a
> mechanism for achieving content model extensibility. It provides much
> greater control for where extensibility occurs and how much occurs. In
> addition, I like the fact that it alerts me to where extensibility may
> occur, so I can write my programs to process the content model
> appropriately. I don't like surprises in my data.
>
> What are your thoughts on this topic? I am sure that in my bias, I
> am missing some disadvantages of using the <any> element. Can you
> think of any disadvantages? What other techniques are there for
> extending content models? /Roger