xml-dev - Re: XML Schemas: Best Practices

Re: XML Schemas: Best Practices

[ Lists Home | Date Index | Thread Index ]

From: "Roger L. Costello" <costello@mitre.org>
To: xml-dev@lists.xml.org
Date: Tue, 17 Oct 2000 14:25:19 -0400

Hi Folks,

I would like to move on to the next schema design issue:

Issue: When should an element or type be declared global versus when
should it be declared local?

[Recall that a component (element, complexType, or simpleType) is
"global" if it is an immediate child of <schema>, whereas it is "local"
if it is not an immediate child of <schema>, i.e., it is nested within
another component.]

If someone were to ask you, "In general, when should an element or type
be declared global versus when should it be declared local?", what
advice would you give them?

A month ago I would have answered, "as a general rule, make things
global".  However, after the discussions that we have had, I would have
a very different answer today.

Example.  Below is a snippet of an instance document.  Let's explore the
different design strategies for defining <Book> and its components.

    <Book>
        <Title>Illusions</Title>
        <Author>Richard Bach</Author>
    </Book>

One design approach is to mirror the instance document - declare a Book
element and within it declare a Title element followed by an Author
element:

First Design: 

    <element name="Book"> 
        <complexType> 
            <sequence> 
                <element name="Title" type="string" 
                         minOccurs="1" maxOccurs="1"/> 
                <element name="Author" type="string" 
                         minOccurs="1" maxOccurs="1"/> /> 
            </sequence> 
        </complexType> 
    </element>

That's one end of the design spectrum.  At the other end of the design
spectrum: we disassemble the above instance document into its individual
components, define each component, and then assemble them together:

Second Design:

    <element name="Title" type="string"/>

    <element name="Author" type="string"/>

    <complexType name="Publication">
        <sequence>
            <element ref="cat:Title" 
                     minOccurs="1" maxOccurs="1"/> 
            <element ref="cat:Author" 
                     minOccurs="1" maxOccurs="1"/>
        </sequence>
    </complexType>

    <element name="Book" type="cat:Publication"/>

These approaches represent both ends of the design spectrum. 

For this issue, I like to think in terms of boxes, where a represents an
element or type.  Thus,

- The first design approach corresponds to having a single box, and it
has nested within it boxes, which in turn have boxes nested within them,
and so on.  
- The second design approach corresponds to having many separate boxes
which are composed together.  The composition of the boxes creates the
whole.

I believe that it will be useful to create a name for the two design
approaches: 
 
- What name would you give to the design strategy where the components
(i.e., element declarations and type definitions) are nested within each
other?  It is the "xxxxx" design approach for schema construction.
- What name would you give to the design strategy where components are
defined individually and then composed together?  It is the "yyyyy"
design approach for schema construction.

Let's examine the characteristics of each of the two design approaches. 
(Perhaps the characteristics will yield insights into appropriate names
for the two design approaches?)

First Design Characteristics:

[1] Opaque content. The content of Book is opaque to other schemas, and
to other parts of the same schema. The impact of this is that none of
the types or elements within Book are reusable.

[2] Localized scope. The region of the schema where the Title and Author
element declarations are applicable is localized to within the Book
element.  The impact of this is that if the schema has set
elementFormDefault="unqualified" then the namespaces of Title and Author
are hidden (localized) within the schema.

[3] Compact.  Everything is bundled together into a tidy, single unit.

Second Design Characteristics:

[1] Transparent content. The components which make up Book are visible
to other schemas, and to other parts of the same schema.  The impact of
this is that the types and elements within Book are reusable.

[2] Global scope. All components have global scope.  The impact of this
is that, irrespective of the value of elementFormDefault, the namespaces
of Title and Author will be exposed in instance documents

[3] Verbose. Everything is laid out and clearly visible.

I am sure that there are other characteristics that I am missing.  Can
you please help to list them?

As I see it, the major tradeoff between the two design approaches is 

- The first design approach facilitates hiding (localizing) namespace
complexities
- The second design approach facilitates component reuse.

(I find it interesting that this issue is relating back to the issue we
discussed earlier on when to hide (localize) namespace complexities
within the schema versus when to expose the namespaces in instance
documents.)

Here's a summary of things to be resolved for this issue:

(1) What name do we give to (what I have been calling) First Design?
(2) What name do we give to (what I have been calling) Second Design?
(3) What are the other characteristics of the two design approaches?
(4) What do you see as the main tradeoffs in the two design approaches?

Thanks!  /Roger

Follow-Ups:
- Best Content Model Re: XML Schemas: Best Practices
  - From: Rick JELLIFFE <ricko@geotempo.com>
- Re: XML Schemas: Best Practices
  - From: Caroline Clewlow <cclewlow@eris.dera.gov.uk>

References:
- Re: XML Schemas: Best Practices
  - From: "Roger L. Costello" <costello@mitre.org>
- Re: XML Schemas: Best Practices
  - From: "Roger L. Costello" <costello@mitre.org>
- Re: XML Schemas: Best Practices
  - From: "Roger L. Costello" <costello@mitre.org>

Prev by Date: RE: sunshine and standards development
Next by Date: Re: Realistic proposals to the W3C?
Previous by thread: Re: XML Schemas: Best Practices
Next by thread: Re: XML Schemas: Best Practices
Index(es):
- Date
- Thread