[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: XML Schemas: Best Practices

From: "Roger L. Costello" <costello@mitre.org>
To: xml-dev@lists.xml.org
Date: Fri, 05 Jan 2001 09:41:38 -0500
"Arnold, Curt" wrote:

> ... projection pattern ... aggregation pattern ... decorator pattern

After reading your message Curt, I studied and implemented the design
patterns - projection, aggregation, and decorator.  I discovered that
each pattern could be implemented using the three methods that I
described in Wednesday's message.

Implementing the three methods for each pattern served to be very
useful - it brought clarity to the issue.  In implementing
each of the patterns I found the same question arising: What is the 
Best Practice for implementing a container element that is to be
comprised of variable content?  

- For the projection pattern the question was how to implement 
variable content comprised of specialized as well as generic elements.

- For the aggregation pattern the question was how to implement 
specialized variable content that was embedded within a generic  
element. 

- For the decorator pattern the question was how to implement 
specialized variable content which contained a generic element.

As I see it, the patterns are an instance document Best Practice
issue (what's the best way to design an instance document), whereas 
the 3 implementation methods are a schema Best Practice issue (what's
the best way to design a schema). Thus, for this discussion I would
like to focus on the methods rather than on the patterns.

Below I have summarized the three methods and incorporated the 
excellent points that Curt and Len made on the pros and cons of each 
method.  There are several questions remaining, which I have 
interspersed in the summary.

SUMMARY

Problem Statement. Design an XML Schema for a container element 
(Catalogue) which is to be comprised of variable content (Book,
or Magazine, or ...)

    <Catalogue>
        - variable content -
    </Catalogue>

Ideally, the components in the variable content section may come
from disjoint sources, i.e., from other, independently developed
schemas.

Example of <Catalogue> containing variable content:

    <Catalogue>
        <Book> ... </Book>
        <Magazine> ... </Magazine>
        <Book> ... </Book>
    </Catalogue>   

Below are three methods for implementing Catalogue.

******************************************************************
Method 1. Use an abstract element and element substitution to
implement variable content.

Method Description:

There are four XML Schema concepts that must be understood for
implementing this method:
 
- an element can be declared abstract.

- abstract elements cannot be instantiated in instance documents.  

- in instance documents the abstract element must be substituted by
  non abstract elements which are in a substitutionGroup with 
  the abstract element.

- elements may be in the substitutionGroup with the abstract element
  iff their type is the same as, or derives from the abstract element's
  type.

Method Implementation:

Declare an abstract element (Publication):

    <element name="Publication" abstract="true" 
             type="c:PublicationType"/>

Declare the container element (Catalogue) to have as its contents the 
abstract element:
 

    <element name="Catalogue">
        <complexType>
            <sequence>
                <element ref="c:Publication" maxOccurs="unbounded"/>
            </sequence>
        </complexType>
    </element>

Declare the elements that are to be in the variable content section 
(Book and Magazine) and put them in a substitutionGroup with the 
abstract element:

    <element name="Book" substitutionGroup="c:Publication" 
             type="c:BookType"/>
    <element name="Magazine" substitutionGroup="c:Publication" 
             type="c:MagazineType"/>

In order for Book and Magazine to substitute for Publication, BookType 
and MagazineType must derive from PublicationType.  Here are the type 
definitions:

PublicationType - the base type:

    <complexType name="PublicationType">
        <sequence>
            <element name="Title" type="string"/>
            <element name="Author" type="string" maxOccurs="unbounded"/>
            <element name="Date" type="year"/>
        </sequence>
    </complexType>

BookType - extends PublicationType by adding two new elements, ISBN
and Publisher:

    <complexType name="BookType">
        <complexContent>
            <extension base="c:PublicationType" >
                <sequence>
                    <element name="ISBN" type="string"/>
                    <element name="Publisher" type="string"/>
                </sequence>
            </extension>
        </complexContent>
    </complexType>

MagazineType - restricts PublicationType by dropping the Author 
element:

    <complexType name="MagazineType">
        <complexContent>
            <restriction base="c:PublicationType">
                <sequence>
                    <element name="Title" type="string"/>
                    <element name="Author" type="string" 
                             minOccurs="0" maxOccurs="0"/>
                    <element name="Date" type="year"/>
                </sequence>
            </restriction>
        </complexContent>
    </complexType>

Method Advantages:

- This method allows you to easily extend the set of elements that 
  may be used in the variable content section simply by adding new 
  elements to the abstract element's substitutionGroup.

Method Disadvantages:

- The type of the elements that are to be used in the variable 
  content section must all descend from the abstract element's type.  
  Further, the elements must be in a substitutionGroup with the 
  abstract element. These requirements represent severe restrictions 
  on the usefulness of this method. The variable content section cannot 
  contain elements whose type does not derive from the abstract 
  element's type, or is not in a substitutionGroup with the abstract
  element - as would typically be the case with independently developed 
  components.  For example, suppose another schema author creates a 
  "Newspaper" element, with a type that does not descend from 
  PublicationType, nor is it in the substitutionGroup with Publication.
  Thus, <Catalogue> would not be able to contain the <Newspaper> 
  element.  The elements in the variable content section are all tied 
  to the same type hierarchy tree.  Thus, they are dependent and 
  coupled.

- Oftentimes the variable content section will contain elements that 
  are conceptually related but structurally vastly different. The base 
  type (the abstract element's type) should contain items common to all
  the variable content elements.  To allow for elements that may be 
  very dissimilar the base type would need to have very little 
  structure. This defeats the purpose of inheritance.

Question:

- In the second disadvantage above I state: "This defeats the purpose 
  of inheritance."  This seems like a very weak statement.  Can you
  provide a stronger statement telling why it is bad that the base
  type has little structure?

- Have you noticed that I like to name things?  Well, I would like to 
  put a name to this method (and to all three methods).  Any 
  suggestions?

******************************************************************
Method 2. Use a repeatable <choice> element to achieve variable
content.

Method Description:

This method is quite straightforward - simply list within a <choice> 
element all the components which can appear in the variable content 
section, and embed the <choice> element in the container element.

Method Implementation:

Declare within a <choice> element all the elements that may appear in 
the variable content section (Book, Magazine).  Embed the <choice> 
element within the container element (Catalogue):

    <element name="Catalogue">
        <complexType>
            <choice minOccurs="0" maxOccurs="unbounded">
                <element ref="c:Book"/>
                <element ref="c:Magazine"/>
            </choice>
        </complexType>
    </element>

    <element name="Book" type="c:BookType"/>
    <element name="Magazine" type="c:MagazineType"/>

Method Advantages:

- The elements in the variable content section do not need a common 
  type ancestry.  Thus, the variable content section can contain
  dissimilar, independent, loosely coupled elements.

Method Disadvantages:

- The <choice> element allows you to group together dissimilar
  elements.  While that has been touted as an advantage, it is really  
  a double edged sword.  The elements in the variable choice section 
  have no type hierarchy to bind them together, to provide coherence
  among the elements.

- With method 1 you can easily extend the set of elements that may be
  used in the variable content section by creating a new element and
  putting it in the substitutionGroup with the abstract element.
  Immediately instance documents could then start using the new
element.  
  With method 2, in addition to creating the new component, you must 
  also list the element in the <choice> element. So method 2 requires 
  a two-step process to adding a new element to the set of elements 
  available in the variable content section.  This is a bit more 
  error prone.

Questions:

- I am not sure that I believe the last sentence: "This is a bit more 
  error prone."  Do you?  

- Curt, you stated in your message that the disadvantage of this method
  is, "does not let people to extend your schema easily."  Can you 
  please elaborate on what you mean by this?

- Again, I would like to see a name for this method.  Suggestions?

******************************************************************
Method 3. Use an abstract type and type substitution to achieve
variable content.

Method Description:

There are three XML Schema concepts that must be understood for
implementing this method:
 
- a complexType can be declared abstract.  

- an element declared to be of an abstract type cannot have its content
  instantiated in instance documents (the element can be instantiated,
  but its content may not).

- in instance documents the element with the abstract type must have 
  its content substituted by content from a non abstract type which 
  derives from the abstract type.

Method Implementation:

Define an abstract base type (PublicationType):  

    <complexType name="PublicationType" abstract="true">
        <sequence>
            <element name="Title" type="string"/>
            <element name="Author" type="string" maxOccurs="unbounded"/>
            <element name="Date" type="year"/>
        </sequence>
    </complexType>

Declare the container element (Catalogue) to contain a base element
(Publication), which is of of the abstract base type:

    <element name="Catalogue">
        <complexType>
            <sequence>
                <element name="Publication" type="c:PublicationType" 
                         minOccurs="0" maxOccurs="unbounded"/>
            </sequence>
        </complexType>
    </element>

In instance documents, the content of <Publication> can only be of a 
non abstract type which derives from PublicationType, such as BookType
or MagazineType (we saw these type definitions in Method 1 above).

With this method instance documents will look different than we saw
with the above two methods.  Namely, <Catalogue> will not contain 
variable content.  Instead, it will always contain the same element
(Publication).  However, that element will contain variable content:

    <Catalogue>
        <Publication xsi:type="Book"> ... </Publication>
        <Publication xsi:type="Magazine"> ... </Publication>
        <Publication xsi:type="Book"> ... </Publication>
    </Catalogue>

Method Advantages:

- Similar benefits to method 1.  Namely, this method allows you to 
  easily extend the set of elements that may be used in the variable 
  content section simply by creating new types which derive from the 
  abstract base type.

Method Disadvantages:

- Similar weaknesses to method 1.  Namely, all types must descend from
  the abstract type.  This requirement prohibits the use of types
  which do not descend from the abstract type, as would typically be
  the situation when the type is in another, independently developed 
  schema.  

- This method has the additional weakness of not being as "clean"
  as the other methods in the instance documents, e.g., 
  <Publication xsi:type="Book"> is not as clean as <Book>
 
Questions:

- The second disadvantage listed above is mighty weak.  "Clean" is
  subjective.  Can you think of a stronger statement?

- Name for this method?

Wrap-up Questions:

What would be your recommendation for "Best Practice for implementing 
a container element that is to be comprised of variable content?"  
Which of the above methods would you recommend using? 

Based upon the above discussion I am tempted to recommend: "use 
method 2 - repeatable <choice> element - because it enables the 
variable content section to contain components from disjoint sources". 
I feel that this benefit outweighs its disadvantages. What are 
your thoughts on this?

This is a pretty cool issue.  Thanks a lot Curt and Len for shedding 
light on the pitfalls and advantages of each method!  /Roger
Follow-Ups:
- Re: XML Schemas: Best Practices
  - From: Rick Jelliffe <ricko@allette.com.au>
- Re: XML Schemas: Best Practices
  - From: Eddie Robertsson <eddie@allette.com.au>
Prev by Date: Re: Traffic Analysis and Namespace Dereferencing
Next by Date: Re: Resource Gloss (Human Readable)
Previous by thread: RE: XML Schemas: Best Practices
Next by thread: Re: XML Schemas: Best Practices
Index(es):
- Date
- Thread