xml-dev - Re: XML Schemas: Best Practices

Re: XML Schemas: Best Practices
[ Lists Home | Date Index | Thread Index ]
From: "Roger L. Costello" <costello@mitre.org>
To: xml-dev@lists.xml.org
Date: Fri, 22 Sep 2000 10:45:15 -0400
Hi Folks,

I am delighted to see the responses to my last message.  Clearly people
are thinking about this issue and have strong feelings about hiding
namespace complexities in the schema versus making namespaces explicit
in instance documents.  This is good!  Now let's see if we can distill
out some general guidelines on when to hide and when to make explicit.

Based upon some of the responses I can see that I did not do a very
satisfactory job in motivating when you would want to hide the namespace
complexities.  So let's quickly address that again, and then move on to
guidelines for when it is desirable to make namespaces explicit in
instance documents.

Recall the camera example that was presented.  By designing the schema
so that body, lens, and manual_adaptor are children of camera (i.e.,
local elements), and by setting elementFormDefault="unqualified" we
enable the creation of a class of instance documents that are pretty
straightforward to read and write.  An example of one instance document
was presented:

<?xml version="1.0"?>
<my:camera xmlns:my=http://www.camera.org … >
        <body>Ergonomically designed casing for easy handling</body>
        <lens>300mm zoom, 1.2 f-stop</lens>
        <manual_adaptor>1/10,000 sec to 100 sec</manual_adaptor>
<my:camera>

Recall that the schema imported the declaration of the body element from
the nokia schema, the lens element from the olympus schema, and the
manual_adaptor element from the pentex schema.  Looking at the instance
document above one would never realize this.  Such complexities are
localized to the schema. Thus, we say that the schema has been designed
in such a fashion that its complexities are "hidden" from the instance
document. 

Several people responded to this design approach arguing that they
believe that it is good and perhaps necessary to qualify body, lens, and
manual_adaptor.  Below I show the instance document with all elements
qualified with a namespace:

<?xml version="1.0"?>
<my:camera xmlns:my="http://www.camera.org"
              xmlns:nikon="http://www.nokia.com" 
              xmlns:olympia="http://www.olympia.com"
              xmlns:pentex=http://www.pentex.com …>
        <nikon:body>Ergonomically designed casing for easy 
                    handling</nikon:body>
        <olympia:lens>300mm zoom, 1.2 f-stop</olympia:lens>
        <pentex:manual_adaptor>1/10,000 sec to 
                     100 sec</pentex:manual_adaptor>
<my:camera>

This instance document makes explicit that the body element comes from
the nikon namespace, the lens element comes from the olympia namespace,
and the manual_adaptor element comes from the pentex namespace.

Thus, we come to two fundamental questions: 

[1] When does it make sense to design a schema to hide the namespace
complexities from instance documents? 

[2] When does it make sense to design a schema to force instance
document to make explicit the namespaces of its elements?  

The later question will be answered in the next section.  For now, let's
try to characterize the systems for which it makes sense to hide the
namespace complexities in the schema.

As I compare the two versions of the instance documents above the first
thing that strikes me is the difference in readability.  The first
version is much easier to read.  The namespaces in the second version -
both the namespace declarations and the qualifiers on each element - are
very confusing to an average fellow like myself.

So, I come to the first characteristic:

"For systems where readability is of utmost importance design the schema
to hide the namespace complexities."

I can well imagine writing an application to process the camera instance
document such that it (the application) does not care what namespace the
body element comes from, what namespace the lens element comes from, or
what namespace the manual_adaptor element comes from.  Such complexities
are irrelevant to the application.  The application just cares that the
camera element contains a body element with the proper type of data, a
lens element with the proper type data, and a manual_adaptor element
with the proper type data.  Knowledge of the namespaces that the body,
lens, manual_adaptor elements belong to provides no additional
information to the application. At the very best, the namespaces are a
distraction to the application. If at some point the application does
find it necessary to know what namespace an element is associated with
then it will simply look it up in the schema. 

This brings me to the second characteristic:

"For systems where knowledge of the namespaces of the elements provide
no additional information design the schema to hide the namespace
complexities."

Those are the two characteristics that I see.  Do you see any further
characterizing features?

Before moving on to when it makes sense to make the namespaces explicit
in instance documents, I would like to pause and address Richard
Lanyon's concern.  Richard's concern is (paraphrasing): 

"Okay Roger, let's suppose that it makes sense to localize the
complexities to the schema.  An author of an instance document will
still have to read the schema, and understand it, to write the instance
document.  Correct?  How have we hidden the complexities of the
schema?"  

Let me see if I can address this concern satisfactorily:

[1] An instance document is written once but processed by many systems
(write once, read many).  All those systems which process the document
are shielded from the complexities of the schema.

[2] In the not-too-distant future there will be tools that read schema
and provide a template for the instance document author to fill in.  The
tool will understand the schema and shield the author from needing to
understand the schema.

I hope that answers your concern satisfactorily Richard.  If anyone else
has anything to add to this please join in.


Now let's move on to characterizing those systems for which it makes
sense to design a schema to force instance document to make explicit the
namespaces of its elements.

First recall the techniques a schema uses to force instance documents to
expose the namespaces of its elements.

[1] Use elementFormDefault="qualified" to Force the Use of Namespace
Qualifiers

Len Bullard sketched out a schema for a 3D rendering system.  Let me
refer to that as the "video-game" schema.  Let's see how to design that
schema so that it forces instance documents to use namespace qualifiers
on its elements: 

<?xml version="1.0"?>
<schema xmlns="http://www.w3.org/1999/XMLSchema"
        targetNamespace="http://www.video-game.org "
        elementFormDefault="qualified"
        xmlns:xsi="http://www.w3.org/1999/XMLSchema-instance"
        xsi:schemaLocation=
                        "http://www.w3.org/1999/XMLSchema
                         http://www.w3.org/1999/XMLSchema.xsd"
        xmlns:design-works="http://www.design-works.com"
        xmlns:disney="http://www.disney.com"
        xmlns:mci=http://www.mci.com>
    <import namespace= http://www.design-works.com
                  schemaLocation= "DesignWorks.xsd"/> 
    <import namespace= http://www.disney.com
                  schemaLocation= "Disney.xsd"/> 
    <import namespace= http://www.mci.com
                  schemaLocation= "MCI.xsd"/>
    <element name="video-game">
        <complexType>
            <sequence>
                <element ref="design-works:geometry" minOccurs="1" 
                                maxOccurs="1"/>
                <element ref="design-works:lighting" minOccurs="1"
                                maxOccurs="1"/>
                <element ref="disney:character" minOccurs="1" 
                                maxOccurs="1"/>
                <element ref="mci:voice" minOccurs="1" 
                                maxOccurs="1"/>
            </sequence>
        </complexType>
    </element>
</schema>

The most important part of this schema is that elementFormDefault=
"qualified".  That attribute forces instance documents to qualify all
elements:

<?xml version="1.0"?>
<video-game xmlns="http://www.video-game.org"
          xmlns:design-works="http://www.design-works.com" 
          xmlns:disney="http://www.disney.com"
          xmlns:mci="http://www.mci.com"
          xmlns:xsi="http://www.w3.org/1999/XMLSchema-instance"
          xsi:schemaLocation="http://www.video-game.org VideoGame.xsd">
  <design-works:geometry> 24m x 71m</design-works:geometry>
  <disney:lighting>Shadow in foreground, light in back</disney:lighting>
  <mci:voice>Digitized voice</mci:voice>
<video-game>

[2] Declare Elements Globally  to Force the Use of Namespace Qualifiers 

Global elements must be qualified in instance documents regardless of
whether elementFormDefault has the value of  "qualified" or
"unqualified".  Thus, we could reorganize the above schema to make all
the elements global.  [Interestingly, for the video-game schema I don't
see how to make geometry, lighting, and voice global.  Any thoughts?]

Now it is time to answer the question: what characterizes systems for
which it makes sense to design the schema so that instance documents are
forced to display the namespaces for each element?  

One quick answer is:

"For systems where knowledge of the namespaces DOES provide additional
information design the schema to force exposure of namespaces in
instance documents."

However, this leaves me a bit empty.  When does "knowledge of the
namespaces provide additional information"?  That is the question which
must be answered. 

Suppose that an application will process the geometry element
differently if it's associated with design-works versus some other
namespace.  I could imagine for marketing purposes such preferential
treatment may occur.  When else?  What are your thoughts on this?

Clearly namespaces are great for dealing with name collisions.  In the
video-game example I don't have multiple elements with the same name. 
If I did, however, and they came from different namespaces then it is
easy to imagine that we would want to design the schema to force
instance documents to expose the namespaces so that applications could
easily distinguish the elements.

Let's try rephrasing the above characterization given this new
information:

"For systems where knowledge of the namespaces does provide additional
information design the schema to force exposure of namespaces in
instance documents.  Knowledge of namespaces may enable applications
with:
- namespace-dependent processing, and 
- distinguishing between elements with the same name."

Okay, that's enough for now.  Your turn.  What are your thoughts on any
of this?  What guidelines would you provide someone who asks you:
"Should I design my schema to hide the namespace complexities, or should
I design it to force instance documents to expose the namespaces of its
elements?"

/Roger
Follow-Ups:
- Re: XML Schemas: Best Practices
  - From: Richard Lanyon <rgl@decisionsoft.com>
- RE: XML Schemas: Best Practices
  - From: johns@syscore.com (John F. Schlesinger)
- Re: XML Schemas: Best Practices
  - From: "Roger L. Costello" <costello@mitre.org>
References:
- XML Schemas: Best Practices
  - From: "Roger L. Costello" <costello@mitre.org>
Prev by Date: LC-117: Locating schema resources
Next by Date: RE: Need a tool that converts HTML into well-formed XML,and noth ing more.
Previous by thread: Re: XML Schemas: Best Practices
Next by thread: Re: XML Schemas: Best Practices
Index(es):
- Date
- Thread