xml-dev - Re: XML Schemas: Best Practices

Re: XML Schemas: Best Practices
[ Lists Home | Date Index | Thread Index ]
From: "Roger L. Costello" <costello@mitre.org>
To: xml-dev@lists.xml.org
Date: Wed, 18 Oct 2000 15:54:59 -0400
Hi Folks,

[I have created an online version of yesterday's discussion of Global
versus Local.  It is at: http://www.xfront.com/GlobalVersusLocal.html. 
I will be updating it with todays excellent messages.]

Recall yesterday we examined two polar opposite design approaches for
dealing with the issue of when to declare an element/type globally and
when to declare it locally.  The two design approaches are briefly
described here:

(1) Create a single component which contains nested components (boxes
within boxes)
(2) Create individual components and aggregate them together (separate
boxes)  [Thanks to Toivo for the box analogy!]

We also discussed the characteristics of each design approach.  The most
noteworthy characteristic of each design is:

(1) The first design approach facilitates hiding (localizing) namespace
complexities within the schema
(2) The second design approach facilitates component reuse.

After sending the message yesterday I realized that the discussion
tended to imply that you could have one or the other but not both (i.e.,
your schema could hide namespace complexities or it could have component
reuse, but not both).  Today Caroline Clewlow pointed out that you can
have both [thanks Caroline!].  Let's consider the Book example again:

   <Book>
        <Title>Illusions</Title>
        <Author>Richard Bach</Author>
    </Book>

How can we hide (localize) the namespaces of Title and Author, and yet
also have component reuse?

Caroline's suggestion is to create a global type definition and nest the
Title and Author declarations within it:

    <complexType name="Publication">
        <sequence>
            <element name="Title" type="string"
                            minOccurs="1" maxOccurs="1"/> 
            <element name="Author" type="string"
                            minOccurs="1" maxOccurs="1"/>
        </sequence>
    </complexType>

    <element name="Book" type="cat:Publication"/>

With this design approach we obtain both the benefits of:

- hiding (localizing) the namespace complexity of Title and Author
- reuse of the Publication type

We can carry this idea a step further.  We can obtain even greater reuse
(and still retain the benefit of hiding namespace complexity) by
creating type definitions for Title and Author:

Third Design:

    <simpleType name="Title">
        <restriction base="string">
            <enumeration value="Mr."/> 
            <enumeration value="Mrs."/> 
            <enumeration value="Dr."/>
        </restriction>
    </simpleType>

    <simpleType name="Name">
        <restriction base="string">
            <minLength value="1"/> 
        </restriction>
    </simpleType>

    <complexType name="Publication">
        <sequence>
            <element name="Title" type="cat:Title"
                            minOccurs="1" maxOccurs="1"/> 
            <element name="Author" type="cat:Name"
                            minOccurs="1" maxOccurs="1"/>
        </sequence>
    </complexType>

    <element name="Book" type="cat:Publication"/>

This design has:

- maximized reuse (there are four reusable components - the Title type,
the Name type, the Publication type, and the Book element)

- maximized the potential to hide (localize) namespaces  [note how I
phrased this: "maximize the potential …"  Whether, in fact, the
namespaces of Title and Author are hidden or exposed, is determined by
the elementFormDefault "switch"].

This is quite nice!  I see general design guidelines emerging …

- Design your schema to maximize the potential for hiding (localizing)
namespace complexities.  

- Use elementFormDefault to act as a switch in controlling namespace
exposure - if you want element namespaces exposed in instance documents,
simply turn the elementFormDefault switch to  "on" (i.e, set
elementFormDefault= "qualified"); if you don't want element namespaces
exposed in instance documents, simply turn the elementFormDefault switch
to "off" (i.e., set elementFormDefault="unqualified").

- Design your schema to maximize reuse.

- Use type definitions as the main form of component reuse.  Place
element declarations within type definitions - in so doing, you maximize
the potential for hiding (localizing) namespace complexities.

Let's now compare this design approach (which I have called the Third
Design) with the design approach that we presented yesterday, which I
called the Second Design (the separate boxes design).  Recall the
example that was given yesterday:

Second Design:

    <element name="Title" type="string"/>

    <element name="Author" type="string"/>

    <complexType name="Publication">
        <sequence>
            <element ref="cat:Title" 
                     minOccurs="1" maxOccurs="1"/> 
            <element ref="cat:Author" 
                     minOccurs="1" maxOccurs="1"/>
        </sequence>
    </complexType>

    <element name="Book" type="cat:Publication"/>

The Second Design approach maximizes reuse but it has absolutely no
potential for namespace hiding.  You might argue, "hey, suppose that I
want to expose namespaces in instance documents (and we have seen cases
where this is desired), so this is a good design for me."  Let's think
about this for a moment.  What if at a later date you change your mind
and wish to hide namespaces (what if your users hate seeing all those
namespace qualifiers in the instance documents)?  You will need to
redesign your schema (possibly scraping it and starting over).  Better
to adopt the Third Design approach.  That way you control whether
namespaces are hidden or exposed by simply setting the value of
elementFormDefault.  No redesign of your schema is needed as you switch
from exposing to hiding, or vice versa.

[That said …  your particular project may need to sacrifice the ability
to turn on/off namespace exposure because you require instance documents
to be able to use synonyms/aliases.  In such circumstances the Second
Design approach may be the only viable alternative.]

Here is a brief summary of the characteristics of the Third Design
approach.

Third Design Characteristics:

[1] Maximum reuse. The primary component of reuse are type definitions.

[2] Maximum namespace hiding. Element declarations are hidden within the
types.

This list of characteristics needs to be expanded on.  Can someone give
more of the characteristics of this Third Design approach?

[We really need to give names to these design approaches.  Here are the
suggestions which have been put forward:

First Design (the boxes within boxes design):  
    - The Compact Design, or 
    - The Hierarchical Design, or
    - The Russian Doll Design, or
    - Composition

Second Design (the separate boxes design): 
    - The Global Scope Design, or 
    - The Componentized Design. or
    - The Salami Design, or
    - The Shared Aggregation Design

Third Design (the maximize reuse and namespace hiding design):  
    - ???

Do you like the ones suggested?  If so, which ones?  If not, what do you
suggest? (I am eager to see what names people come up with for the Third
Design approach!) Let me know your preferences.  Otherwise, yours truly
will select a name.]
…

In today's mail, Jon Cleaver pointed out a difference between the First
and Second Design approaches in terms of their impact on component
changes.  I have summarized his comments below [Jon, let me know if I
have not accurately captured your comments]:

First Design: (the boxes within boxes design)

Localized change impact.  With this design approach there is, for all
practical purposes, just a single component.  The fact that the
component contains other components is somewhat irrelevant to other
schemas since those inner components are not accessible (reusable). 
Hence, if the Book component changes (i.e., the components within it
change) it will have a relatively limited impact since there is only
"one" component changing.  

Second Design: (the separate boxes design)

Far-reaching change impact.  The sheer number of reusable components
means that there is more likelihood of any changes to Book impacting
many schemas, since the components may be being used in many places and
in many schemas.  
…

In this message we have examined a third design approach for dealing
with the issue of when to declare elements/types globally versus when to
declare them locally.  Here are some things that need further
discussion:

- What are your thoughts on this third design approach?  
- What are the pros and cons of this third design approach?  
- Do you prefer it to the other two design approaches?  
- What name would you give to the third design approach?

[We have covered a lot of material over the last two days!  Is it all
making sense?  Are there enough examples that the concepts are clear, or
should more examples be given?  For example, does everyone know what it
means when I say that elementFormDefault can behave like a "switch",
turning on or off namespace exposure in instance documents?  This might
not be clear to everyone since this is my own, concocted terminology. 
Let me know if I should explain this further.]

Great discussions!!!  Thanks!!! /Roger
Follow-Ups:
- Re: XML Schemas: Best Practices
  - From: "Roger L. Costello" <costello@mitre.org>
- Re: XML Schemas: Best Practices
  - From: Caroline Clewlow <cclewlow@eris.dera.gov.uk>
- RE: XML Schemas: Best Practices
  - From: Ajay K Sanghi <sanghi@giasdl01.vsnl.net.in>
References:
- Re: XML Schemas: Best Practices
  - From: "Roger L. Costello" <costello@mitre.org>
Prev by Date: RE: Will XML change the character of W3C?
Next by Date: Re: Services-based automation (WAS RE: Realistic proposals to the W3C?)
Previous by thread: Re: XML Schemas: Best Practices
Next by thread: RE: XML Schemas: Best Practices
Index(es):
- Date
- Thread