xml-dev - Re: XML Schemas: Best Practices

Re: XML Schemas: Best Practices
[ Lists Home | Date Index | Thread Index ]
From: "Roger L. Costello" <costello@mitre.org>
To: xml-dev@lists.xml.org
Date: Wed, 20 Sep 2000 17:13:15 -0400
Hi Folks,

Over the last couple of days I have had the good fortune to talk with
some very bright people about one of the issues that was raised here. 
Namely, we discussed: what guidelines can be given with regards to
whether an element should be declared locally versus globally?  During
the discussions some excellent points were made about the benefits of
hiding schema complexity by using local elements in combination with
setting elementFormDefault="unqualified". Below I have done my best to
describe the points that were made during the discussions.

Hiding Namespace Complexities using Local Element Declarations and
elementFormDefault="unqualified"

First some notes:

(a) A typical schema will utilize elements and types from many different
namespaces.  
(b) It is desirable to shield instance documents from the intricacies of
schemas.  

One such schema intricacy that we would like to hide is the namespaces
of all the different components being used by a schema.  Oftentimes it
is irrelevant to the instance document where the schema obtained its
components.  It would like for such things to be kept hidden in the
schema.  By declaring elements locally and by setting
elementFormDefault="unqualified" we can prevent the schema namespace
complexities from sneaking into instance documents.  How to do this is
shown next.

Example.  Consider this schema for <camera>, where the <body> element is
defined in the Nikon schema, the <lens> element is defined in the
Olympus schema, and the <manual_adaptor> element is defined in Pentex
schema:

<?xml version="1.0"?>
<schema xmlns="http://www.w3.org/1999/XMLSchema"
        targetNamespace="http://www.camera.org "
        elementFormDefault="unqualified"
        xmlns:xsi="http://www.w3.org/1999/XMLSchema-instance"
        xsi:schemaLocation=
                        "http://www.w3.org/1999/XMLSchema
                         http://www.w3.org/1999/XMLSchema.xsd"
        xmlns:nikon="http://www.nikon.com"
        xmlns:olympus="http://www.olympus.com"
        xmlns:pentex=http://www.pentex.com>
    <import namespace= http://www.nikon.com
                  schemaLocation= "Nikon.xsd"/> 
    <import namespace= http://www.olympus.com
                  schemaLocation= "Olympus.xsd"/> 
    <import namespace= http://www.pentex.com
                  schemaLocation= "Pentex.xsd"/>
    <element name="camera">
        <complexType>
            <sequence>
                <element ref="nikon:body" minOccurs="1" 
                         maxOccurs="1"/>
                <element ref="olympus:lens" minOccurs="1"
                         maxOccurs="1"/>
                <element ref="pentex:manual_adaptor" minOccurs="1" 
                         maxOccurs="1"/>
            </sequence>
        </complexType>
    </element>
</schema>

Here's an example of a conforming instance document:

<?xml version="1.0"?>
<my:camera xmlns:my="http://www.camera.org"
               xmlns:xsi="http://www.w3.org/1999/XMLSchema-instance"
               xsi:schemaLocation= "http://www.camera.org Camera.xsd">
        <body>Ergonomically designed casing for easy handling</body>
        <lens>300mm zoom, 1.2 f-stop</lens>
        <manual_adaptor>1/10,000 sec to 100 sec</manual_adaptor>
<my:camera>

The instance document is simple.  There are no namespace qualifiers
cluttering up the document, except for the one on camera (which is okay
because it shows the namespace for the document as a whole).  The
instance document simply shows the components of camera - camera is
comprised of body, lens, and manual_adaptor.  The fact that the schema
gets these three components from different namespaces is irrelevant and
hidden within the schema

Consider now what the instance document would look like if the schema
had declared elementFormDefault= "qualified".  Recall that
elementFormDefault="qualified" means that in the instance document all
elements must be qualified.  Look at the resulting instance document:

<?xml version="1.0"?>
<my:camera xmlns:my="http://www.camera.org"
              xmlns:nikon="http://www.nokia.com" 
              xmlns:olympia="http://www.olympia.com"
              xmlns:pentex="http://www.pentex.com"
              xmlns:xsi="http://www.w3.org/1999/XMLSchema-instance"
              xsi:schemaLocation="http://www.camera.org Camera.xsd">
        <nikon:body>Ergonomically designed casing for easy 
                    handling</nikon:body>
        <olympia:lens>300mm zoom, 1.2 f-stop</olympia:lens>
        <pentex:manual_adaptor>1/10,000 sec to 
                     100 sec</pentex:manual_adaptor>
<my:camera>

This instance document is much more complex - it explicitly shows where
all of the components come from.  The complexities of the schema are
thus sneaking into the instance document. 

Note, then, that to hide namespace complexity it is not simply a matter
of declaring elements locally, but it is also important that
elementFormDefault be set appropriately (to the value of 
"unqualified").

Let's try to summarize the principles that have been described here:

[1] Schema authors should design their schemas such that the
complexities of the schema do not show up in the instance documents. 
One such complexity is the namespaces of the schema components (i.e.,
where all the components come from).
[2] The combination of declaring elements locally and setting
elementFormDefault="unqualified" can be used to hide where all the
components come from (i.e., the namespaces).   Thus, that schema
complexity is not transferred to the instance document.

This discussion has argued for keeping hidden in the schema the location
(namespace) of the components.  However, there are scenarios where it is
desirable to make such namespaces explicit, i.e., we want the instance
documents to explicitly show where the components come from.  Would
anyone care to make a case for that?

/Roger

"Roger L. Costello" wrote:
> 
> Hi Folks,
> 
> I would like to see if we can collectively come up with a set of "best
> practices" in designing XML Schemas.   I realize that the specifics of
> designing a schema are heavily dependent upon the task at hand.
> However, I firmly believe that there are guidelines that can be employed
> in creating a schema, and those guidelines hold true irrespective of the
> specific task.  It is this set of guidelines that I am hoping we can
> shed some light upon.
> 
> I would like to get things started by listing some of the things that
> must be considered in designing a schema.  It is by no means an
> exhaustive list.  For example, it doesn't address when to block a type
> from derivation, when to create a schema without a namespace, when to
> make an element or a type abstract, etc.  Nonetheless, it is a start to
> some hopefully useful discussions.
> 
> First, a quick list of the issues:
> 
> [1] Element versus Type Reuse
> [2] Local versus Global
> [3] elementFormDefault - to qualify or not to qualify
> [4] Evolvability/versioning
> [5] One namespace versus many namespaces (import verus include)
> [6] Capturing semantics of elements and types
> 
> Now, details of each issue:
> 
> [1] Element versus Type Reuse: from my own experience in building
> schemas I have found that it is oftentimes not obvious whether to
> declare something as an element and then reuse that element, or to
> declare it as a type and reuse the type.  Let's consider the two cases
> by looking at an example:
> 
> Element Reuse
> 
>    - Declare an element for reuse:
> 
>       |<element name="Elevation">
>       |   <simpleType base="integer">
>       |      <minInclusive value="-1290"/>
>       |      <maxInclusive value="29028"/>
>       |   </simpleType>
>       |</element>
> 
>    - Reusing the element:
> 
>       |<element name="Boston">
>       |   <complexType>
>       |      <sequence>
>       |         <element ref="city:Elevation"/>
>       |      </sequence>
>       |   </complexType>
>       |</element>
> 
> Type Reuse
> 
>    - Declare a type for reuse:
> 
>       |<simpleType name="Elevation" base="integer">
>       |   <minInclusive value="-1290"/>
>       |   <maxInclusive value="29028"/>
>       |</simpleType>
> 
>    - Reusing the type:
> 
>       |<element name="Boston">
>       |   <complexType>
>       |      <sequence>
>       |         <element name="Elevation"  type="city:Elevation"/>
>       |      </sequence>
>       |   </complexType>
>       |</element>
> 
> Which is preferred - declare Elevation as an element and reuse that
> element, or declare Elevation as a type and reuse the type?  Here are
> some things to consider:
> 
> - Declaring it as an element will allow equivClasses to be created, thus
> enabling the Elevation element to be substituted by members of the
> equivClass.
> - Declaring it as a type will allow derived types to be created, thus
> enabling the Elevation type to be substituted by derived types.
> - Someone once said that XML Schemas is a "type-based system".  I am not
> sure what that means, but perhaps it means that the idea behind XML
> Schemas is to reuse types?
> - In programming languages types are the items typically that get
> reused.  Does that apply to XML Schemas, or not?
> 
> What are your thoughts on type versus element reuse?  What guidelines
> would you recommend to someone struggling to decide whether he/she
> should make an item as an element or as a type?
> 
> [2] Local versus Global: when should an element or type be declared
> globally versus when should it be nested within something else (i.e.,
> local)?  Again, let's take an example:
> 
> - Everything Global
> 
>       |<element name="Book" type="cat:Listing"/>
>       |<complexType name= "Listing">
>       |   <sequence>
>       |      <element ref="cat:Title"/>
>       |      <element ref="cat:Author"/>
>       |   </sequence>
>       |</complexType>
>       |<element name="Title" type="string"/>
>       |<element name="Author" type="string"/>
> 
> - Everything Local
> 
>       |<element name="Book">
>       |   <complexType>
>       |      <sequence>
>       |         <element name="Title" type="string"/>
>       |         <element name="Author" type="string"/>
>       |      </sequence>
>       |   </complexType>
>       |</element>
> 
> What guidance can we provide a schema designer in deciding whether or
> not to "hide" a type or element (by nesting it)?  Someone once asked me
> when it would be desirable to make an element or type local. I was hard
> pressed to think of a situation.  Thus, I was not able to provide
> guidance on when to use elements/types locally.  It is easy to see the
> benefit of declaring elements/types globally - they can be reused, not
> only within a schema but also across schemas.  It is not so easy for me
> to see the benefit of hiding elements/types.  Can someone provide
> guidance on this issue?  Does the OO encapsulation principle apply to
> XML Schemas?  If so, why?  If not, why not?
> 
>  [3] elementFormDefault - to qualify or not to qualify:
> elementFormDefault is an attribute of <schema>.  It is used to dictate
> what elements are to be namespace-qualified in instance documents: a
> value of  "qualified" means that everything is namespace-qualified in
> the instance document, whereas a value of "unqualified" means that only
> global items are namespace-qualified.  Personally, I find that for
> simplicity it is easiest to use "qualified" and then in the instance
> document use a default namespace declaration.  It is not real clear to
> me the advantages of using "unqualified".   In other words, I would not
> be able to provide good guidance on when to use "unqualified".  If
> someone asked you to list the scenarios when it would be desirable to
> use "unqualified" what guidance would you give?
> 
> [4] Evolvability/versioning: in today's rapidly changing marketplace,
> there is no question that schemas will need to change (evolve).  What
> guidance do you provide a schema designer in engineering his/her schema
> to support change?  When a schema is changed, how do you indicate that
> it is a new version - with a new namespace?
> 
> I have thought quite a bit about schema evolution.  At the end of this
> message  I expound quite a bit this subject.
> 
> As for versioning, that is something that I would be hard pressed to
> provide guidance upon.  When a new version of a schema is created, what
> techniques should one use to signify the new version?  One idea is to
> create a new namespace for the new version.  Another idea is to simply
> change the version attribute on <schema>.  How would you indicate a new
> version?
> 
> [5] One namespace versus many namespaces (import versus include): I
> think that in a typical project many schemas will be created.   A
> question will then arise, "shall we define one namespace for all the
> schemas or shall we create a different namespace for each schema?"  What
> are the tradeoffs in creating multiple namespaces versus a single
> namespace?  What guidance would you give someone starting on a project
> that will create multiple namespaces - create a namespace for each
> schema or one umbrella namespace?
> 
> [6] Capturing semantics of elements and types: a schema creates
> elements, defines the relationships between the elements, and defines
> the datatypes of the elements.  However, that by itself doesn't define
> the semantics of the elements.  For example, consider this element
> declaration:
> 
> <element name= "jdkdsfjkds">
>     <simpleType base= "string">
>         <pattern value= "[a-zA-Z]+\d"/>
>    </simpleType>
> </element>
> 
> Does this tell you the meaning of "jdkdsfjkds"?  Probably not.
> Something more is needed.  What guidelines would you give someone
> wishing to document the semantics of the items created in a schema?
> 
> Here are some guidelines that Mary Pulvermacher sent to me:
> 
> "Our current thinking is to capture as much of the semantics as possible
> in the XML schema itself.  We plan to do this by using the XML Schema
> provided annotation element and having a convention that every element
> or attribute has an annotation that provides information on the
> meaning.  Of course this is not perfect but it does carry some
> advantages.
> 
> - The XML schema will capture the data structure, meta-data and
> relationships between the elements.
> - Use of strong typing will capture much of the data content.
> - The annotations can capture definitions and other explanatory
> information
> - The structure of the "definitions" will always be consistent with the
> structure used in the schema since they are linked.
> - Since the schema itself is an XML document, we can use XSL to
> transform this information into a format suitable for human
> consumption."
> 
> Do you have any other thoughts on capturing the semantics of elements
> and types created by a schema?  What guidance would you give to someone
> wishing to capture the semantics of the elements and types?
> --------------------------------------------------------------------
> 
> Some thoughts on enabling schema evolution (expansion of [4] above)
> 
> In today's rapidly changing market static schemas will be less
> commonplace, as the market pushes schemas to quickly support new
> capabilities.  For example, consider the cellphone industry.  Clearly,
> this is a rapidly evolving market.  Any schema that the cellphone
> community creates will soon become obsolete as hardware/software changes
> extend the cellphone capabilities.  For the cellphone community rapid
> evolution of a cellphone schema is not just a nicety, the market demands
> it!
> 
> Suppose that the cellphone community gets together and creates a schema,
> cellphone.xsd.  Imagine that every week NOKIA  sends out to the various
> vendors an instance document (conforming to cellphone.xsd), detailing
> its current product set.   Now suppose that a few months after
> cellphone.xsd is agreed upon NOKIA makes some breakthroughs in their
> cellphones - they create new memory, call, and display features, none of
> which are supported by cellphone.xsd.  To gain a market advantage NOKIA
> will want to get information about these new capabilities to its vendors
> ASAP.  Further, they will have little motivation to wait for  the next
> meeting of the cellphone community to consider upgrades to
> cellphone.xsd.  They need results NOW. How does open content help?
> That is described next.
> 
> Suppose that the cellphone schema is declared "open".  Immediately NOKIA
> can extend its instance documents to incorporate data  about the new
> features.  How does this change impact the vendor applications that
> receive the instance documents?  The answer is - not  at all.  In the
> worst case, the vendor's application will simply skip over the new
> elements.  More likely, however, the vendors are showing
> the cellphone features in a list box and these new features will be
> automatically captured with the other features.  Let's stop and think
> about what has been just described …  Without modifying the cellphone
> schema and without touching the vendor's applications, information about
> the new NOKIA features has been instantly disseminated to the
> marketplace!  Open content in the cellphone schema is the enabler for
> this rapid dissemination.
> 
> Clearly some types of instance document extensions may require
> modification to the vendor's applications.  Recognize, however, that
> thevendors are free to upgrade their applications in their own time.
> The applications do not need to be upgraded before changes can be
> introduced into instance documents.  At the very worst, the vendor's
> applications will simply skip over the extensions.  And, of course,
> those vendors do not need to upgrade in lock-step
> 
> To wrap up this example … suppose that several months later the
> cellphone community reconvenes to discuss enhancements to the schema.
> The new features that NOKIA first introduced into the marketplace are
> then officially added into the schema.  Thus completes the cycle.
> Changes to the instance documents have driven the evolution of the
> schema.
Follow-Ups:
- Re: XML Schemas: Best Practices
  - From: Richard Lanyon <rgl@decisionsoft.com>
- Re: XML Schemas: Best Practices
  - From: Curt Arnold <CurtA@techie.com>
- RE: XML Schemas: Best Practices
  - From: rsanford <rsanford@nolimitsystems.com>
References:
- XML Schemas: Best Practices
  - From: "Roger L. Costello" <costello@mitre.org>
Prev by Date: SAX2 and Symmetrical Treatment of Data
Next by Date: RE: XML Schemas: Best Practices
Previous by thread: XML Schemas: Best Practices
Next by thread: RE: XML Schemas: Best Practices
Index(es):
- Date
- Thread