[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: XML Schemas: Best Practices
- From: "Roger L. Costello" <costello@mitre.org>
- To: xml-dev@lists.xml.org
- Date: Fri, 05 Jan 2001 09:41:38 -0500
"Arnold, Curt" wrote:
> ... projection pattern ... aggregation pattern ... decorator pattern
After reading your message Curt, I studied and implemented the design
patterns - projection, aggregation, and decorator. I discovered that
each pattern could be implemented using the three methods that I
described in Wednesday's message.
Implementing the three methods for each pattern served to be very
useful - it brought clarity to the issue. In implementing
each of the patterns I found the same question arising: What is the
Best Practice for implementing a container element that is to be
comprised of variable content?
- For the projection pattern the question was how to implement
variable content comprised of specialized as well as generic elements.
- For the aggregation pattern the question was how to implement
specialized variable content that was embedded within a generic
element.
- For the decorator pattern the question was how to implement
specialized variable content which contained a generic element.
As I see it, the patterns are an instance document Best Practice
issue (what's the best way to design an instance document), whereas
the 3 implementation methods are a schema Best Practice issue (what's
the best way to design a schema). Thus, for this discussion I would
like to focus on the methods rather than on the patterns.
Below I have summarized the three methods and incorporated the
excellent points that Curt and Len made on the pros and cons of each
method. There are several questions remaining, which I have
interspersed in the summary.
SUMMARY
Problem Statement. Design an XML Schema for a container element
(Catalogue) which is to be comprised of variable content (Book,
or Magazine, or ...)
<Catalogue>
- variable content -
</Catalogue>
Ideally, the components in the variable content section may come
from disjoint sources, i.e., from other, independently developed
schemas.
Example of <Catalogue> containing variable content:
<Catalogue>
<Book> ... </Book>
<Magazine> ... </Magazine>
<Book> ... </Book>
</Catalogue>
Below are three methods for implementing Catalogue.
******************************************************************
Method 1. Use an abstract element and element substitution to
implement variable content.
Method Description:
There are four XML Schema concepts that must be understood for
implementing this method:
- an element can be declared abstract.
- abstract elements cannot be instantiated in instance documents.
- in instance documents the abstract element must be substituted by
non abstract elements which are in a substitutionGroup with
the abstract element.
- elements may be in the substitutionGroup with the abstract element
iff their type is the same as, or derives from the abstract element's
type.
Method Implementation:
Declare an abstract element (Publication):
<element name="Publication" abstract="true"
type="c:PublicationType"/>
Declare the container element (Catalogue) to have as its contents the
abstract element:
<element name="Catalogue">
<complexType>
<sequence>
<element ref="c:Publication" maxOccurs="unbounded"/>
</sequence>
</complexType>
</element>
Declare the elements that are to be in the variable content section
(Book and Magazine) and put them in a substitutionGroup with the
abstract element:
<element name="Book" substitutionGroup="c:Publication"
type="c:BookType"/>
<element name="Magazine" substitutionGroup="c:Publication"
type="c:MagazineType"/>
In order for Book and Magazine to substitute for Publication, BookType
and MagazineType must derive from PublicationType. Here are the type
definitions:
PublicationType - the base type:
<complexType name="PublicationType">
<sequence>
<element name="Title" type="string"/>
<element name="Author" type="string" maxOccurs="unbounded"/>
<element name="Date" type="year"/>
</sequence>
</complexType>
BookType - extends PublicationType by adding two new elements, ISBN
and Publisher:
<complexType name="BookType">
<complexContent>
<extension base="c:PublicationType" >
<sequence>
<element name="ISBN" type="string"/>
<element name="Publisher" type="string"/>
</sequence>
</extension>
</complexContent>
</complexType>
MagazineType - restricts PublicationType by dropping the Author
element:
<complexType name="MagazineType">
<complexContent>
<restriction base="c:PublicationType">
<sequence>
<element name="Title" type="string"/>
<element name="Author" type="string"
minOccurs="0" maxOccurs="0"/>
<element name="Date" type="year"/>
</sequence>
</restriction>
</complexContent>
</complexType>
Method Advantages:
- This method allows you to easily extend the set of elements that
may be used in the variable content section simply by adding new
elements to the abstract element's substitutionGroup.
Method Disadvantages:
- The type of the elements that are to be used in the variable
content section must all descend from the abstract element's type.
Further, the elements must be in a substitutionGroup with the
abstract element. These requirements represent severe restrictions
on the usefulness of this method. The variable content section cannot
contain elements whose type does not derive from the abstract
element's type, or is not in a substitutionGroup with the abstract
element - as would typically be the case with independently developed
components. For example, suppose another schema author creates a
"Newspaper" element, with a type that does not descend from
PublicationType, nor is it in the substitutionGroup with Publication.
Thus, <Catalogue> would not be able to contain the <Newspaper>
element. The elements in the variable content section are all tied
to the same type hierarchy tree. Thus, they are dependent and
coupled.
- Oftentimes the variable content section will contain elements that
are conceptually related but structurally vastly different. The base
type (the abstract element's type) should contain items common to all
the variable content elements. To allow for elements that may be
very dissimilar the base type would need to have very little
structure. This defeats the purpose of inheritance.
Question:
- In the second disadvantage above I state: "This defeats the purpose
of inheritance." This seems like a very weak statement. Can you
provide a stronger statement telling why it is bad that the base
type has little structure?
- Have you noticed that I like to name things? Well, I would like to
put a name to this method (and to all three methods). Any
suggestions?
******************************************************************
Method 2. Use a repeatable <choice> element to achieve variable
content.
Method Description:
This method is quite straightforward - simply list within a <choice>
element all the components which can appear in the variable content
section, and embed the <choice> element in the container element.
Method Implementation:
Declare within a <choice> element all the elements that may appear in
the variable content section (Book, Magazine). Embed the <choice>
element within the container element (Catalogue):
<element name="Catalogue">
<complexType>
<choice minOccurs="0" maxOccurs="unbounded">
<element ref="c:Book"/>
<element ref="c:Magazine"/>
</choice>
</complexType>
</element>
<element name="Book" type="c:BookType"/>
<element name="Magazine" type="c:MagazineType"/>
Method Advantages:
- The elements in the variable content section do not need a common
type ancestry. Thus, the variable content section can contain
dissimilar, independent, loosely coupled elements.
Method Disadvantages:
- The <choice> element allows you to group together dissimilar
elements. While that has been touted as an advantage, it is really
a double edged sword. The elements in the variable choice section
have no type hierarchy to bind them together, to provide coherence
among the elements.
- With method 1 you can easily extend the set of elements that may be
used in the variable content section by creating a new element and
putting it in the substitutionGroup with the abstract element.
Immediately instance documents could then start using the new
element.
With method 2, in addition to creating the new component, you must
also list the element in the <choice> element. So method 2 requires
a two-step process to adding a new element to the set of elements
available in the variable content section. This is a bit more
error prone.
Questions:
- I am not sure that I believe the last sentence: "This is a bit more
error prone." Do you?
- Curt, you stated in your message that the disadvantage of this method
is, "does not let people to extend your schema easily." Can you
please elaborate on what you mean by this?
- Again, I would like to see a name for this method. Suggestions?
******************************************************************
Method 3. Use an abstract type and type substitution to achieve
variable content.
Method Description:
There are three XML Schema concepts that must be understood for
implementing this method:
- a complexType can be declared abstract.
- an element declared to be of an abstract type cannot have its content
instantiated in instance documents (the element can be instantiated,
but its content may not).
- in instance documents the element with the abstract type must have
its content substituted by content from a non abstract type which
derives from the abstract type.
Method Implementation:
Define an abstract base type (PublicationType):
<complexType name="PublicationType" abstract="true">
<sequence>
<element name="Title" type="string"/>
<element name="Author" type="string" maxOccurs="unbounded"/>
<element name="Date" type="year"/>
</sequence>
</complexType>
Declare the container element (Catalogue) to contain a base element
(Publication), which is of of the abstract base type:
<element name="Catalogue">
<complexType>
<sequence>
<element name="Publication" type="c:PublicationType"
minOccurs="0" maxOccurs="unbounded"/>
</sequence>
</complexType>
</element>
In instance documents, the content of <Publication> can only be of a
non abstract type which derives from PublicationType, such as BookType
or MagazineType (we saw these type definitions in Method 1 above).
With this method instance documents will look different than we saw
with the above two methods. Namely, <Catalogue> will not contain
variable content. Instead, it will always contain the same element
(Publication). However, that element will contain variable content:
<Catalogue>
<Publication xsi:type="Book"> ... </Publication>
<Publication xsi:type="Magazine"> ... </Publication>
<Publication xsi:type="Book"> ... </Publication>
</Catalogue>
Method Advantages:
- Similar benefits to method 1. Namely, this method allows you to
easily extend the set of elements that may be used in the variable
content section simply by creating new types which derive from the
abstract base type.
Method Disadvantages:
- Similar weaknesses to method 1. Namely, all types must descend from
the abstract type. This requirement prohibits the use of types
which do not descend from the abstract type, as would typically be
the situation when the type is in another, independently developed
schema.
- This method has the additional weakness of not being as "clean"
as the other methods in the instance documents, e.g.,
<Publication xsi:type="Book"> is not as clean as <Book>
Questions:
- The second disadvantage listed above is mighty weak. "Clean" is
subjective. Can you think of a stronger statement?
- Name for this method?
Wrap-up Questions:
What would be your recommendation for "Best Practice for implementing
a container element that is to be comprised of variable content?"
Which of the above methods would you recommend using?
Based upon the above discussion I am tempted to recommend: "use
method 2 - repeatable <choice> element - because it enables the
variable content section to contain components from disjoint sources".
I feel that this benefit outweighs its disadvantages. What are
your thoughts on this?
This is a pretty cool issue. Thanks a lot Curt and Len for shedding
light on the pitfalls and advantages of each method! /Roger