xml-dev - XML Schemas: Best Practices

XML Schemas: Best Practices
[ Lists Home | Date Index | Thread Index ]
From: "Roger L. Costello" <costello@mitre.org>
To: xml-dev@lists.xml.org
Date: Fri, 15 Sep 2000 15:25:12 -0400
Hi Folks,

I would like to see if we can collectively come up with a set of "best
practices" in designing XML Schemas.   I realize that the specifics of
designing a schema are heavily dependent upon the task at hand.  
However, I firmly believe that there are guidelines that can be employed
in creating a schema, and those guidelines hold true irrespective of the
specific task.  It is this set of guidelines that I am hoping we can
shed some light upon. 

I would like to get things started by listing some of the things that
must be considered in designing a schema.  It is by no means an
exhaustive list.  For example, it doesn't address when to block a type
from derivation, when to create a schema without a namespace, when to
make an element or a type abstract, etc.  Nonetheless, it is a start to
some hopefully useful discussions.  

First, a quick list of the issues:

[1] Element versus Type Reuse
[2] Local versus Global
[3] elementFormDefault - to qualify or not to qualify
[4] Evolvability/versioning
[5] One namespace versus many namespaces (import verus include)
[6] Capturing semantics of elements and types

Now, details of each issue:

[1] Element versus Type Reuse: from my own experience in building
schemas I have found that it is oftentimes not obvious whether to
declare something as an element and then reuse that element, or to
declare it as a type and reuse the type.  Let's consider the two cases
by looking at an example:

Element Reuse

   - Declare an element for reuse:

      |<element name="Elevation">
      |   <simpleType base="integer">
      |      <minInclusive value="-1290"/>
      |      <maxInclusive value="29028"/>
      |   </simpleType>
      |</element>

   - Reusing the element:

      |<element name="Boston">
      |   <complexType>
      |      <sequence>
      |         <element ref="city:Elevation"/>
      |      </sequence>
      |   </complexType>
      |</element>

Type Reuse

   - Declare a type for reuse:

      |<simpleType name="Elevation" base="integer">
      |   <minInclusive value="-1290"/>
      |   <maxInclusive value="29028"/>
      |</simpleType>

   - Reusing the type:

      |<element name="Boston">
      |   <complexType>
      |      <sequence>
      |         <element name="Elevation"  type="city:Elevation"/>
      |      </sequence>
      |   </complexType>
      |</element>

Which is preferred - declare Elevation as an element and reuse that
element, or declare Elevation as a type and reuse the type?  Here are
some things to consider:

- Declaring it as an element will allow equivClasses to be created, thus
enabling the Elevation element to be substituted by members of the
equivClass.
- Declaring it as a type will allow derived types to be created, thus
enabling the Elevation type to be substituted by derived types.
- Someone once said that XML Schemas is a "type-based system".  I am not
sure what that means, but perhaps it means that the idea behind XML
Schemas is to reuse types?
- In programming languages types are the items typically that get
reused.  Does that apply to XML Schemas, or not?

What are your thoughts on type versus element reuse?  What guidelines
would you recommend to someone struggling to decide whether he/she
should make an item as an element or as a type?

[2] Local versus Global: when should an element or type be declared
globally versus when should it be nested within something else (i.e.,
local)?  Again, let's take an example:

- Everything Global

      |<element name="Book" type="cat:Listing"/>
      |<complexType name= "Listing">
      |   <sequence>
      |      <element ref="cat:Title"/> 
      |      <element ref="cat:Author"/>
      |   </sequence>
      |</complexType>
      |<element name="Title" type="string"/> 
      |<element name="Author" type="string"/>

- Everything Local

      |<element name="Book">
      |   <complexType>
      |      <sequence>
      |         <element name="Title" type="string"/> 
      |         <element name="Author" type="string"/>
      |      </sequence>
      |   </complexType>
      |</element>

What guidance can we provide a schema designer in deciding whether or
not to "hide" a type or element (by nesting it)?  Someone once asked me
when it would be desirable to make an element or type local. I was hard
pressed to think of a situation.  Thus, I was not able to provide
guidance on when to use elements/types locally.  It is easy to see the
benefit of declaring elements/types globally - they can be reused, not
only within a schema but also across schemas.  It is not so easy for me
to see the benefit of hiding elements/types.  Can someone provide
guidance on this issue?  Does the OO encapsulation principle apply to
XML Schemas?  If so, why?  If not, why not?

 [3] elementFormDefault - to qualify or not to qualify:
elementFormDefault is an attribute of <schema>.  It is used to dictate
what elements are to be namespace-qualified in instance documents: a
value of  "qualified" means that everything is namespace-qualified in
the instance document, whereas a value of "unqualified" means that only
global items are namespace-qualified.  Personally, I find that for
simplicity it is easiest to use "qualified" and then in the instance
document use a default namespace declaration.  It is not real clear to
me the advantages of using "unqualified".   In other words, I would not
be able to provide good guidance on when to use "unqualified".  If
someone asked you to list the scenarios when it would be desirable to
use "unqualified" what guidance would you give?

[4] Evolvability/versioning: in today's rapidly changing marketplace,
there is no question that schemas will need to change (evolve).  What
guidance do you provide a schema designer in engineering his/her schema
to support change?  When a schema is changed, how do you indicate that
it is a new version - with a new namespace?

I have thought quite a bit about schema evolution.  At the end of this
message  I expound quite a bit this subject.

As for versioning, that is something that I would be hard pressed to
provide guidance upon.  When a new version of a schema is created, what
techniques should one use to signify the new version?  One idea is to
create a new namespace for the new version.  Another idea is to simply
change the version attribute on <schema>.  How would you indicate a new
version?

[5] One namespace versus many namespaces (import versus include): I
think that in a typical project many schemas will be created.   A
question will then arise, "shall we define one namespace for all the
schemas or shall we create a different namespace for each schema?"  What
are the tradeoffs in creating multiple namespaces versus a single
namespace?  What guidance would you give someone starting on a project
that will create multiple namespaces - create a namespace for each
schema or one umbrella namespace?

[6] Capturing semantics of elements and types: a schema creates
elements, defines the relationships between the elements, and defines
the datatypes of the elements.  However, that by itself doesn't define
the semantics of the elements.  For example, consider this element
declaration:

<element name= "jdkdsfjkds">
    <simpleType base= "string">
        <pattern value= "[a-zA-Z]+\d"/>
   </simpleType>
</element>

Does this tell you the meaning of "jdkdsfjkds"?  Probably not. 
Something more is needed.  What guidelines would you give someone
wishing to document the semantics of the items created in a schema?  

Here are some guidelines that Mary Pulvermacher sent to me:

"Our current thinking is to capture as much of the semantics as possible
in the XML schema itself.  We plan to do this by using the XML Schema
provided annotation element and having a convention that every element
or attribute has an annotation that provides information on the
meaning.  Of course this is not perfect but it does carry some
advantages. 

- The XML schema will capture the data structure, meta-data and
relationships between the elements. 
- Use of strong typing will capture much of the data content.
- The annotations can capture definitions and other explanatory
information
- The structure of the "definitions" will always be consistent with the
structure used in the schema since they are linked.
- Since the schema itself is an XML document, we can use XSL to
transform this information into a format suitable for human
consumption."

Do you have any other thoughts on capturing the semantics of elements
and types created by a schema?  What guidance would you give to someone
wishing to capture the semantics of the elements and types?
--------------------------------------------------------------------

Some thoughts on enabling schema evolution (expansion of [4] above)

In today's rapidly changing market static schemas will be less
commonplace, as the market pushes schemas to quickly support new
capabilities.  For example, consider the cellphone industry.  Clearly,
this is a rapidly evolving market.  Any schema that the cellphone
community creates will soon become obsolete as hardware/software changes
extend the cellphone capabilities.  For the cellphone community rapid
evolution of a cellphone schema is not just a nicety, the market demands
it!

Suppose that the cellphone community gets together and creates a schema,
cellphone.xsd.  Imagine that every week NOKIA  sends out to the various
vendors an instance document (conforming to cellphone.xsd), detailing
its current product set.   Now suppose that a few months after
cellphone.xsd is agreed upon NOKIA makes some breakthroughs in their
cellphones - they create new memory, call, and display features, none of
which are supported by cellphone.xsd.  To gain a market advantage NOKIA
will want to get information about these new capabilities to its vendors
ASAP.  Further, they will have little motivation to wait for  the next
meeting of the cellphone community to consider upgrades to
cellphone.xsd.  They need results NOW. How does open content help?  
That is described next.

Suppose that the cellphone schema is declared "open".  Immediately NOKIA
can extend its instance documents to incorporate data  about the new
features.  How does this change impact the vendor applications that
receive the instance documents?  The answer is - not  at all.  In the
worst case, the vendor's application will simply skip over the new
elements.  More likely, however, the vendors are showing 
the cellphone features in a list box and these new features will be
automatically captured with the other features.  Let's stop and think
about what has been just described …  Without modifying the cellphone
schema and without touching the vendor's applications, information about
the new NOKIA features has been instantly disseminated to the
marketplace!  Open content in the cellphone schema is the enabler for
this rapid dissemination.

Clearly some types of instance document extensions may require
modification to the vendor's applications.  Recognize, however, that
thevendors are free to upgrade their applications in their own time. 
The applications do not need to be upgraded before changes can be
introduced into instance documents.  At the very worst, the vendor's
applications will simply skip over the extensions.  And, of course,
those vendors do not need to upgrade in lock-step

To wrap up this example … suppose that several months later the
cellphone community reconvenes to discuss enhancements to the schema.
The new features that NOKIA first introduced into the marketplace are
then officially added into the schema.  Thus completes the cycle. 
Changes to the instance documents have driven the evolution of the
schema.
Follow-Ups:
- Re: XML Schemas: Best Practices
  - From: "Roger L. Costello" <costello@mitre.org>
- Re: XML Schemas: Best Practices
  - From: "Roger L. Costello" <costello@mitre.org>
Prev by Date: Re: XML Schemas: ref'ing vs inlining
Next by Date: RE: XML Schemas: Best Practices
Previous by thread: Inserting optional elements
Next by thread: Re: XML Schemas: Best Practices
Index(es):
- Date
- Thread