[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: XML Schemas: Best Practices
- From: "Roger L. Costello" <costello@mitre.org>
- To: xml-dev@lists.xml.org
- Date: Wed, 03 Jan 2001 15:28:31 -0500
Hi Folks,
Just before the holidays I sent out a message requesting that we
develop a set of use cases for element substitution. Curt Arnold
and Len Bullard convinced me that there is a bigger picture to this
issue.
Thus, I would like to try to address that bigger issue. [Curt, Len,
please help me out and let me know if I have captured the bigger
issue.] Here it is...
Issue: What is the Best Practice in designing for variable
(substitutable) content? [I believe that this is called polymorphic
substitution. Is that correct?]
EXAMPLE. To demonstrate this issue, suppose that we wish to design
a Catalogue element to contain variable content - Books and/or
Magazines. Here's a sample instance document where Catalogue
contains a Book then a Magazine then another Book:
<?xml version="1.0"?>
<Catalogue>
<Book>
<Title>Illusions The Adventures of a
Reluctant Messiah</Title>
<Author>Richard Bach</Author>
<Date>1977</Date>
<ISBN>0-440-34319-4</ISBN>
<Publisher>Dell Publishing Co.</Publisher>
</Book>
<Magazine>
<Title>Natural Health</Title>
<Date>1999</Date>
</Magazine>
<Book>
<Title>The First and Last Freedom</Title>
<Author>J. Krishnamurti</Author>
<Date>1954</Date>
<ISBN>0-06-064831-7</ISBN>
<Publisher>Harper & Row</Publisher>
</Book>
</Catalogue>
As you can see, the content of <Catalogue> is variable - it contains
a mixture of <Book> and <Magazine> elements. The issue is: how do we
design Catalogue to support variable content?
XML Schemas gives us several methods for achieving variable content. I
would like for us to expose those methods and elucidate their pros and
cons.
Below I get into all the details of each method. Before doing so, I
will briefly summarize the methods and questions to be addressed.
BRIEF SUMMARY
Method 1. Use an abstract element and element substitution to achieve
variable content.
Method 2. Use a repeatable <choice> element to achieve variable content.
Method 3. Use an abstract type and type substitution to achieve
variable content.
Question [A]
Do each of these methods represent a design pattern? What name
would you give to each of these patterns? Would the following names
be appropriate:
Method 1. Curt Arnold used the term "Projection" in an earlier message.
Would this be the correct term for this pattern? [I don't know what
a projection is. Can someone please explain?]
Method 2. Also in his earlier message Curt used the term "Aggregation".
Would this be the correct term for this pattern?
Method 3. Curt also used the term "Decorator". Would this be the
correct term for this pattern?
Question [B]
What are the advantages and disadvantages of each method? When would
one method be preferred over another?
Question [C]
I have been able to think of only these 3 methods for achieving
variable content. There may be other methods. Can you think of
other methods?
Question [D]
This fourth question I will save till after the detailed discussion ...
Now let's examine in detail the methods for achieving variable content.
DETAILED DISCUSSION
Before looking at Method 1, observe in the above instance document
the nature of the elements which are contained in Catalogue - they
are all Publication type elements, i.e., a Book Publication or a
Magazine Publication. So, in implementing this variable content we
should be able to take advantage of inheritance...
****************************************************************
Method 1. Use an abstract element and element substitution to achieve
variable content.
Here's how to implement this method for our Catalogue example:
Declare the Catalogue element to contain an abstract Publication
element:
<element name="Catalogue">
<complexType>
<sequence>
<element ref="c:Publication" maxOccurs="unbounded"/>
</sequence>
</complexType>
</element>
<element name="Publication" abstract="true"
type="c:PublicationType"/>
Recall that abstract elements cannot appear in instance documents.
Only non abstract elements which are in a substitutionGroup with
the abstract element can appear in instance documents. Next, I declare
Book and Magazine, and put them in a substitutionGroup with
Publication:
<element name="Book" substitutionGroup="c:Publication"
type="c:BookType"/>
<element name="Magazine" substitutionGroup="c:Publication"
type="c:MagazineType"/>
These two elements - Book and Magazine - are substitutable for
the abstract Publication element, and thus can appear in the
content of Catalogue.
Note that the Publication element is of type PublicationType, Book
is of type BookType, and Magazine is of type MagazineType. A
requirement for an element to be substitutable for another element
(called the "head" element) is that the element have the same type
as the head element, or a type that is derived from the head
element's type. Thus, in order for Book and Magazine to substitute
for Publication, BookType and MagazineType must derive from
PublicationType. Here are the type definitions:
PublicationType - the base type:
<complexType name="PublicationType">
<sequence>
<element name="Title" type="string"/>
<element name="Author" type="string" maxOccurs="unbounded"/>
<element name="Date" type="year"/>
</sequence>
</complexType>
BookType - extends PublicationType by adding two new elements, ISBN
and Publisher:
<complexType name="BookType">
<complexContent>
<extension base="c:PublicationType" >
<sequence>
<element name="ISBN" type="string"/>
<element name="Publisher" type="string"/>
</sequence>
</extension>
</complexContent>
</complexType>
MagazineType - restricts PublicationType by dropping the Author
element:
<complexType name="MagazineType">
<complexContent>
<restriction base="c:PublicationType">
<sequence>
<element name="Title" type="string"/>
<element name="Author" type="string" minOccurs="0"
maxOccurs="0"/>
<element name="Date" type="year"/>
</sequence>
</restriction>
</complexContent>
</complexType>
The complete schema for Method 1 is shown at the bottom of this
message.
Summary. In this first method we see that - through the combination
of declaring Catalogue to contain an abstract Publication element,
and declaring Book and Magazine to be substitutable for
Publication - we are able to achieve variable content for
Catalogue.
Questions:
[1.A] Does this method represent a general pattern for achieving
variable content? If so, what name would you give to this pattern?
Would you call this "projection"?
[1.B] What are the pros and cons of this method (pattern)?
****************************************************************
Method 2. Use a repeatable <choice> element to create variable content
Here's how to implement this second method:
Declare the Catalogue element to contain multiple occurrences
of either Book or Magazine:
<element name="Catalogue">
<complexType>
<choice minOccurs="0" maxOccurs="unbounded">
<element ref="c:Book"/>
<element ref="c:Magazine"/>
</choice>
</complexType>
</element>
<element name="Book" type="c:BookType"/>
<element name="Magazine" type="c:MagazineType"/>
BookType and MagazineType are defined the same way as was done in
method 1. The complete schema for this method is shown at the bottom
of this message.
This method has a direct analogue in DTDs:
<!ELEMENT Catalogue (Book | Magazine)*>
Questions:
[2.A] What name would you give to this method for achieving variable
content? Would you call this "aggregation"?
[2.B] What are the pros and cons of this method?
****************************************************************
Method 3. Use an abstract type and type substitution to achieve
variable content
With this method we define PublicationType to be abstract. We declare
Catalogue to contain Publication, which is of type PublicationType:
<element name="Catalogue">
<complexType>
<sequence>
<element name="Publication" type="c:PublicationType"
minOccurs="0" maxOccurs="unbounded"/>
</sequence>
</complexType>
</element>
In instance documents, the content of <Publication> can only be of a
non abstract type which derives from PublicationType, such as BookType
or MagazineType (we saw these type definitions in Method 1 above).
With this method instance documents will look different than we saw
with the above two methods. Namely, <Catalogue> will contain multiple
<Publication ...> elements, where ... is replaced by a type indicator
attribute:
<?xml version="1.0"?>
<Catalogue>
<Publication xsi:type="BookType">
<Title>Illusions The Adventures of a
Reluctant Messiah</Title>
<Author>Richard Bach</Author>
<Date>1977</Date>
<ISBN>0-440-34319-4</ISBN>
<Publisher>Dell Publishing Co.</Publisher>
</Publication>
<Publication xsi:type="MagazineType">
<Title>Natural Health</Title>
<Date>1999</Date>
</Publication>
<Publication xsi:type="BookType">
<Title>The First and Last Freedom</Title>
<Author>J. Krishnamurti</Author>
<Date>1954</Date>
<ISBN>0-06-064831-7</ISBN>
<Publisher>Harper & Row</Publisher>
</Publication>
</Catalogue>
The complete schema for this method is shown at the bottom of this
message.
Questions:
[3.A] What name would you give to this method for achieving variable
content?
[3.B] What are the pros and cons of this method?
This completes the discussion of the three methods. Are we missing
any? Can you think of other methods for implementing variable
(substitutable) content?
Now for the question alluded to above ...
Question [D]
I don't feel that I have nailed the issue right on the head, i.e.,
my issue description above is not very clear (I don't think that
you really know what the issue is until you read through the examples.
That's not good.)
Here are some other ideas for describing the issue:
- What is the Best Practice for creating substitutable content?
-> I don't think that this is any better than the original version.
- What is the Best Practice for dealing with inheritance?
-> No, that is too vague and doesn't really capture the issue.
Too many holidays ... I don't seem to be able to come up with a
crisp description of the issue. Can you help?
Below are the complete schemas for each of the three methods.
Following that are the instance documents. /Roger
P.S. I look forward to working with you in 2001 to develop a complete
set of Best Practices for XML Schemas.
****************************************************************
Method 1 Schema:
<?xml version="1.0"?>
<schema xmlns="http://www.w3.org/2000/10/XMLSchema"
targetNamespace="http://www.catalogue.org"
elementFormDefault="qualified"
xmlns:xsi="http://www.w3.org/2000/10/XMLSchema-instance"
xsi:schemaLocation=
"http://www.w3.org/2000/10/XMLSchema
http://www.w3.org/2000/10/XMLSchema.xsd"
xmlns:c="http://www.catalogue.org">
<complexType name="PublicationType">
<sequence>
<element name="Title" type="string"/>
<element name="Author" type="string" maxOccurs="unbounded"/>
<element name="Date" type="year"/>
</sequence>
</complexType>
<complexType name="BookType">
<complexContent>
<extension base="c:PublicationType" >
<sequence>
<element name="ISBN" type="string"/>
<element name="Publisher" type="string"/>
</sequence>
</extension>
</complexContent>
</complexType>
<complexType name="MagazineType">
<complexContent>
<restriction base="c:PublicationType">
<sequence>
<element name="Title" type="string"/>
<element name="Author" type="string" minOccurs="0"
maxOccurs="0"/>
<element name="Date" type="year"/>
</sequence>
</restriction>
</complexContent>
</complexType>
<element name="Publication" abstract="true"
type="c:PublicationType"/>
<element name="Book" substitutionGroup="c:Publication"
type="c:BookType"/>
<element name="Magazine" substitutionGroup="c:Publication"
type="c:MagazineType"/>
<element name="Catalogue">
<complexType>
<sequence>
<element ref="c:Publication" maxOccurs="unbounded"/>
</sequence>
</complexType>
</element>
</schema>
****************************************************************
Method 2 Schema:
<?xml version="1.0"?>
<schema xmlns="http://www.w3.org/2000/10/XMLSchema"
targetNamespace="http://www.catalogue.org"
elementFormDefault="qualified"
xmlns:xsi="http://www.w3.org/2000/10/XMLSchema-instance"
xsi:schemaLocation=
"http://www.w3.org/2000/10/XMLSchema
http://www.w3.org/2000/10/XMLSchema.xsd"
xmlns:c="http://www.catalogue.org">
<complexType name="PublicationType">
<sequence>
<element name="Title" type="string"/>
<element name="Author" type="string" maxOccurs="unbounded"/>
<element name="Date" type="year"/>
</sequence>
</complexType>
<complexType name="BookType">
<complexContent>
<extension base="c:PublicationType" >
<sequence>
<element name="ISBN" type="string"/>
<element name="Publisher" type="string"/>
</sequence>
</extension>
</complexContent>
</complexType>
<complexType name="MagazineType">
<complexContent>
<restriction base="c:PublicationType">
<sequence>
<element name="Title" type="string"/>
<element name="Author" type="string" minOccurs="0"
maxOccurs="0"/>
<element name="Date" type="year"/>
</sequence>
</restriction>
</complexContent>
</complexType>
<element name="Book" type="c:BookType"/>
<element name="Magazine" type="c:MagazineType"/>
<element name="Catalogue">
<complexType>
<choice minOccurs="0" maxOccurs="unbounded">
<element ref="c:Book"/>
<element ref="c:Magazine"/>
</choice>
</complexType>
</element>
</schema>
****************************************************************
Method 3 Schema:
<?xml version="1.0"?>
<schema xmlns="http://www.w3.org/2000/10/XMLSchema"
targetNamespace="http://www.catalogue.org"
elementFormDefault="qualified"
xmlns:xsi="http://www.w3.org/2000/10/XMLSchema-instance"
xsi:schemaLocation=
"http://www.w3.org/2000/10/XMLSchema
http://www.w3.org/2000/10/XMLSchema.xsd"
xmlns:c="http://www.catalogue.org">
<complexType name="PublicationType" abstract="true">
<sequence>
<element name="Title" type="string"/>
<element name="Author" type="string" maxOccurs="unbounded"/>
<element name="Date" type="year"/>
</sequence>
</complexType>
<complexType name="BookType">
<complexContent>
<extension base="c:PublicationType" >
<sequence>
<element name="ISBN" type="string"/>
<element name="Publisher" type="string"/>
</sequence>
</extension>
</complexContent>
</complexType>
<complexType name="MagazineType">
<complexContent>
<restriction base="c:PublicationType">
<sequence>
<element name="Title" type="string"/>
<element name="Author" type="string" minOccurs="0"
maxOccurs="0"/>
<element name="Date" type="year"/>
</sequence>
</restriction>
</complexContent>
</complexType>
<element name="Catalogue">
<complexType>
<sequence>
<element name="Publication" type="c:PublicationType"
minOccurs="0" maxOccurs="unbounded"/>
</sequence>
</complexType>
</element>
</schema>
****************************************************************
Instance Document for Method 1 & 2:
<?xml version="1.0"?>
<Catalogue xmlns="http://www.catalogue.org"
xmlns:xsi="http://www.w3.org/2000/10/XMLSchema-instance"
xsi:schemaLocation=
"http://www.catalogue.org
Catalogue.xsd">
<Book>
<Title>Illusions The Adventures of a
Reluctant Messiah</Title>
<Author>Richard Bach</Author>
<Date>1977</Date>
<ISBN>0-440-34319-4</ISBN>
<Publisher>Dell Publishing Co.</Publisher>
</Book>
<Magazine>
<Title>Natural Health</Title>
<Date>1999</Date>
</Magazine>
<Book>
<Title>The First and Last Freedom</Title>
<Author>J. Krishnamurti</Author>
<Date>1954</Date>
<ISBN>0-06-064831-7</ISBN>
<Publisher>Harper & Row</Publisher>
</Book>
</Catalogue>
****************************************************************
Instance Document for Method 3:
<?xml version="1.0"?>
<Catalogue xmlns="http://www.catalogue.org"
xmlns:xsi="http://www.w3.org/2000/10/XMLSchema-instance"
xsi:schemaLocation=
"http://www.catalogue.org
Catalogue.xsd">
<Publication xsi:type="BookType">
<Title>Illusions The Adventures of a
Reluctant Messiah</Title>
<Author>Richard Bach</Author>
<Date>1977</Date>
<ISBN>0-440-34319-4</ISBN>
<Publisher>Dell Publishing Co.</Publisher>
</Publication>
<Publication xsi:type="MagazineType">
<Title>Natural Health</Title>
<Date>1999</Date>
</Publication>
<Publication xsi:type="BookType">
<Title>The First and Last Freedom</Title>
<Author>J. Krishnamurti</Author>
<Date>1954</Date>
<ISBN>0-06-064831-7</ISBN>
<Publisher>Harper & Row</Publisher>
</Publication>
</Catalogue>