[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: XML Schemas: Best Practices

From: Caroline Clewlow <cclewlow@eris.dera.gov.uk>
To: "Roger L. Costello" <costello@mitre.org>
Date: Wed, 17 Jan 2001 09:40:37 +0000
This is definately a topic of interest to me - the two options you give
both have their advantages but I can see the point you make regarding the
higher level of control to be gained from using the 'any' element.
Allowing the use of a derived type is in danger of giving too much scope
for virtually any number of elements to be added.

Sorry I'm not adding much to the discussion here...... I just realised I've
done nothing more than agree with your suggestions !

As an aside - various papers I have looked at have specified XML schema to
be a 'closed' content model.  Although these techniques do not make the
model strictly 'open', this level of extensibility and control over such
would seem to suggest a half way house...... perhaps semi-open ;-)

Regards

Caroline

"Roger L. Costello" wrote:

> Hi Folks,
>
> I would like to start on a new issue.  I think that this issue will
> generate a lot of interest, as it is critical to designing robust
> schemas.
>
> Issue: What is Best Practice for creating extensible content models?
>
> Below I have jotted down some initial thoughts on this subject.  I
> am sure that I have missed many techniques for creating extensible
> content models. What are your thoughts on this topic?
>
> Techniques for Creating Extensible Content Models
>
> [1] Use types to create extensible content models.  Consider this
>     schema snippet:
>
>     <element name="BookCatalogue">
>         <complexType>
>              <sequence>
>                  <element name="Book" minOccurs="0"
>                           maxOccurs="unbounded">
>                      <complexType>
>                          <sequence>
>                              <element name="Title" type="string"/>
>                              <element name="Author" type="string"/>
>                              <element name="Date" type="year"/>
>                              <element name="ISBN" type="string"/>
>                              <element name="Publisher" type="string"/>
>                          </sequence>
>                      </complexType>
>                  </element>
>             </sequence>
>         </complexType>
>     </element>
>
> This schema snippet dictates that in instance documents <Book> elements
> must always be comprised of exactly 5 elements <Title>, <Author>,
> <Date>, <ISBN>, and <Publisher>.  For example:
>
>      <Book>
>           <Title>The First and Last Freedom</Title>
>           <Author>J. Krishnamurti</Author>
>           <Date>1954</Date>
>           <ISBN>0-06-064831-7</ISBN>
>           <Publisher>Harper &amp; Row</Publisher>
>      </Book>
>
> The schema creates instance documents that are completely static and
> non extensible.
>
> On the other hand, consider this version of the schema, where I have
> defined Book's content model with a type definition:
>
>      <complexType name="BookType">
>         <sequence>
>             <element name="Title" type="string"/>
>             <element name="Author" type="string"/>
>             <element name="Date" type="year"/>
>             <element name="ISBN" type="string"/>
>             <element name="Publisher" type="string"/>
>         </sequence>
>     </complexType>
>     <element name="BookCatalogue">
>         <complexType>
>              <sequence>
>                  <element name="Book" type="c:BookType" minOccurs="0"
>                           maxOccurs="unbounded"/>
>             </sequence>
>         </complexType>
>     </element>
>
> Recall that via the mechanism of type substitutability, the contents
> of <Book> can be substituted by any type that derives from BookType.
> For example, if we create a type which derives from BookType:
>
>     <complexType name="BookTypePlusReviewer">
>         <complexContent>
>             <extension base="c:BookType" >
>                 <sequence>
>                     <element name="Reviewer" type="string"/>
>                 </sequence>
>             </extension>
>         </complexContent>
>     </complexType>
>
> then instance documents can create a <Book> element that
> contains a <Reviewer> element, along with the other five elements:
>
>         <Book xsi:type="BookTypePlusReviewer">
>              <Title>My Life and Times</Title>
>              <Author>Paul McCartney</Author>
>              <Date>1998</Date>
>              <ISBN>94303-12021-43892</ISBN>
>              <Publisher>McMillin Publishing</Publisher>
>              <Reviewer>Roger Costello</Reviewer>
>         </Book>
>
> In my example, I defined BookTypePlusReviewer within the same
> schema as BookType.  In general, however, this may not be the case.
> Other schemas can import the BookCatalogue schema and define types
> which derive from BookType.  Thus, the contents of Book may be
> extended, without modifying the BookCatalogue schema!
>
> This type substitutability mechanism is a powerful extensibility
> mechanism.  However, it suffers from two problems:
>
> [1] Location Restricted Extensibility: The extensibility is restricted
>     to appending elements onto the end of the content model
>     (after the <Publisher> element).  What if we wanted to extend
>     <Book> by adding elements to the beginning (before <Title>), or in
>     the middle, etc?  We can't do it with this mechanism.
>
> [2] Unexpected Extensibility: If you look at the declaration for Book:
>
>      <element name="Book" type="c:BookType" minOccurs="0"
>               maxOccurs="unbounded"/>
>
> and the definition for BookType:
>
>      <complexType name="BookType">
>         <sequence>
>             <element name="Title" type="string"/>
>             <element name="Author" type="string"/>
>             <element name="Date" type="year"/>
>             <element name="ISBN" type="string"/>
>             <element name="Publisher" type="string"/>
>         </sequence>
>     </complexType>
>
> it is easy to be fooled into thinking that in instance documents the
> <Book> elements will always contain just <Title>, <Author>, <Date>,
> <ISBN>, and <Publisher>.  It is easy to forget that someone could
> extend the content model using the type substitutability mechanism.
> Extensibility is unexpected! Consequently, if you write a program to
> process BookCatalogue instance documents, you may forget to take into
> account the fact that a <Book> element may contain more than five
> children.
>
> It would be nice if there was a way to explicitly flag places where
> extensibility may occur: "hey, instance documents may extend <Book> at
> this point, so be sure to write your code taking this possibility into
> account."  In addition, it would be nice if we could extend Book's
> content model at locations other than just the end ... The <any>
> element gives us these capabilities beautifully:
>
>     <element name="BookCatalogue">
>         <complexType>
>              <sequence>
>                  <element name="Book" type="minOccurs="0"
>                           maxOccurs="unbounded">
>                      <complexType>
>                          <sequence>
>                              <element name="Title" type="string"/>
>                              <element name="Author" type="string"/>
>                              <element name="Date" type="year"/>
>                              <element name="ISBN" type="string"/>
>                              <element name="Publisher" type="string"/>
>                              <any namespace="##any" minOccurs="0"/>
>                          </sequence>
>                      </complexType>
>                  </element>
>             </sequence>
>         </complexType>
>     </element>
>
> In this version of the schema I have made explicit the fact that after
> the <Publication> element any well-formed XML element may occur and
> the XML element may come from any namespace.
>
> Note that I could have put the <any> element within a BookType:
>
>      <complexType name="BookType">
>         <sequence>
>             <element name="Title" type="string"/>
>             <element name="Author" type="string"/>
>             <element name="Date" type="year"/>
>             <element name="ISBN" type="string"/>
>             <element name="Publisher" type="string"/>
>             <any namespace="##any" minOccurs="0" maxOccurs="1"/>
>         </sequence>
>     </complexType>
>
> and then declared Book to be of type BookType:
>
>     <element name="Book" type="c:BookType" minOccurs="0"
>              maxOccurs="unbounded"/>
>
> However, then we are back to the "unexpected extensibility" problem.
> Namely, after the <Publication> element any well-formed XML element
> may occur.  After that, anything could be present.
>
> Thus, I chose not to use a type so that I could control the
> extensibility.
>
> There is another way to control the extensibility and still use a type.
> I can use the BookType and add a block attribute to Book:
>
>     <element name="Book" type="c:BookType" block="#all"
>              minOccurs="0" maxOccurs="unbounded"/>
>
> The block attribute prohibits derived types from being used in
> Book's content model. I prefer this later way of controlling
> extensibility than the in-line version because it creates a reusable
> component (BookType), and yet we still have control over the
> extensibility.
>
> With the <any> element we have complete control over where, and how
> much extensibility we want to allow.  For example, suppose that we
> want to enable there to be at most two new elements at the top of
> Book's content model.  Here's how to specify that using the <any>
> element:
>
>      <complexType name="BookType">
>         <sequence>
>             <any namespace="##any" minOccurs="0" maxOccurs="2"/>
>             <element name="Title" type="string"/>
>             <element name="Author" type="string"/>
>             <element name="Date" type="year"/>
>             <element name="ISBN" type="string"/>
>             <element name="Publisher" type="string"/>
>         </sequence>
>     </complexType>
>
> Note how I have placed the <any> element at the top of the content
> model, and have set maxOccurs="2".  Thus, in instance documents the
> <Book> content will always end with <Title>, <Author>, <Date>, <ISBN>,
> and <Publisher>.  Prior to that, two well-formed XML elements may
> occur.
>
> I must admit that I am biased towards using the <any> element as a
> mechanism for achieving content model extensibility.  It provides much
> greater control for where extensibility occurs and how much occurs.  In
> addition, I like the fact that it alerts me to where extensibility may
> occur, so I can write my programs to process the content model
> appropriately.  I don't like surprises in my data.
>
> What are your thoughts on this topic?  I am sure that in my bias, I
> am missing some disadvantages of using the <any> element.  Can you
> think of any disadvantages? What other techniques are there for
> extending content models?  /Roger
References:
- Re: XML Schemas: Best Practices
  - From: "Roger L. Costello" <costello@mitre.org>
- Re: XML Schemas: Best Practices
  - From: Eddie Robertsson <eddie@allette.com.au>
- Re: XML Schemas: Best Practices
  - From: "Roger L. Costello" <costello@mitre.org>
- Re: XML Schemas: Best Practices
  - From: "Roger L. Costello" <costello@mitre.org>
- Re: XML Schemas: Best Practices
  - From: "Roger L. Costello" <costello@mitre.org>
- Re: XML Schemas: Best Practices
  - From: "Roger L. Costello" <costello@mitre.org>
Prev by Date: Re: How could RDDL be distributed ?
Next by Date: Re: DTD Inheritance..
Previous by thread: Re: XML Schemas: Best Practices
Next by thread: Re: XML Schemas: Best Practices
Index(es):
- Date
- Thread