xml-dev - RE: XML Schemas: Best Practices

RE: XML Schemas: Best Practices
[ Lists Home | Date Index | Thread Index ]
From: rsanford <rsanford@nolimitsystems.com>
To: xml-dev@lists.xml.org
Date: Wed, 20 Sep 2000 16:57:34 -0500
is there any way you can put this info onto a web page
and send out the url?

rjsjr

> -----Original Message-----
> From: Roger L. Costello [mailto:costello@mitre.org]
> Sent: Wednesday, September 20, 2000 4:13 PM
> To: xml-dev@lists.xml.org
> Cc: costello@mitre.org; Pulvermacher,Mary K.; Heller,Mark J.;
> JohnSc@crossgain.com; Ripley,Michael W.
> Subject: Re: XML Schemas: Best Practices
>
>
> Hi Folks,
>
> Over the last couple of days I have had the good fortune to talk with
> some very bright people about one of the issues that was raised here.
> Namely, we discussed: what guidelines can be given with regards to
> whether an element should be declared locally versus globally?  During
> the discussions some excellent points were made about the benefits of
> hiding schema complexity by using local elements in combination with
> setting elementFormDefault="unqualified". Below I have done my best to
> describe the points that were made during the discussions.
>
> Hiding Namespace Complexities using Local Element Declarations and
> elementFormDefault="unqualified"
>
> First some notes:
>
> (a) A typical schema will utilize elements and types from many different
> namespaces.
> (b) It is desirable to shield instance documents from the intricacies of
> schemas.
>
> One such schema intricacy that we would like to hide is the namespaces
> of all the different components being used by a schema.  Oftentimes it
> is irrelevant to the instance document where the schema obtained its
> components.  It would like for such things to be kept hidden in the
> schema.  By declaring elements locally and by setting
> elementFormDefault="unqualified" we can prevent the schema namespace
> complexities from sneaking into instance documents.  How to do this is
> shown next.
>
> Example.  Consider this schema for <camera>, where the <body> element is
> defined in the Nikon schema, the <lens> element is defined in the
> Olympus schema, and the <manual_adaptor> element is defined in Pentex
> schema:
>
> <?xml version="1.0"?>
> <schema xmlns="http://www.w3.org/1999/XMLSchema"
>         targetNamespace="http://www.camera.org "
>         elementFormDefault="unqualified"
>         xmlns:xsi="http://www.w3.org/1999/XMLSchema-instance"
>         xsi:schemaLocation=
>                         "http://www.w3.org/1999/XMLSchema
>                          http://www.w3.org/1999/XMLSchema.xsd"
>         xmlns:nikon="http://www.nikon.com"
>         xmlns:olympus="http://www.olympus.com"
>         xmlns:pentex=http://www.pentex.com>
>     <import namespace= http://www.nikon.com
>                   schemaLocation= "Nikon.xsd"/>
>     <import namespace= http://www.olympus.com
>                   schemaLocation= "Olympus.xsd"/>
>     <import namespace= http://www.pentex.com
>                   schemaLocation= "Pentex.xsd"/>
>     <element name="camera">
>         <complexType>
>             <sequence>
>                 <element ref="nikon:body" minOccurs="1"
>                          maxOccurs="1"/>
>                 <element ref="olympus:lens" minOccurs="1"
>                          maxOccurs="1"/>
>                 <element ref="pentex:manual_adaptor" minOccurs="1"
>                          maxOccurs="1"/>
>             </sequence>
>         </complexType>
>     </element>
> </schema>
>
> Here's an example of a conforming instance document:
>
> <?xml version="1.0"?>
> <my:camera xmlns:my="http://www.camera.org"
>                xmlns:xsi="http://www.w3.org/1999/XMLSchema-instance"
>                xsi:schemaLocation= "http://www.camera.org Camera.xsd">
>         <body>Ergonomically designed casing for easy handling</body>
>         <lens>300mm zoom, 1.2 f-stop</lens>
>         <manual_adaptor>1/10,000 sec to 100 sec</manual_adaptor>
> <my:camera>
>
> The instance document is simple.  There are no namespace qualifiers
> cluttering up the document, except for the one on camera (which is okay
> because it shows the namespace for the document as a whole).  The
> instance document simply shows the components of camera - camera is
> comprised of body, lens, and manual_adaptor.  The fact that the schema
> gets these three components from different namespaces is irrelevant and
> hidden within the schema
>
> Consider now what the instance document would look like if the schema
> had declared elementFormDefault= "qualified".  Recall that
> elementFormDefault="qualified" means that in the instance document all
> elements must be qualified.  Look at the resulting instance document:
>
> <?xml version="1.0"?>
> <my:camera xmlns:my="http://www.camera.org"
>               xmlns:nikon="http://www.nokia.com"
>               xmlns:olympia="http://www.olympia.com"
>               xmlns:pentex="http://www.pentex.com"
>               xmlns:xsi="http://www.w3.org/1999/XMLSchema-instance"
>               xsi:schemaLocation="http://www.camera.org Camera.xsd">
>         <nikon:body>Ergonomically designed casing for easy
>                     handling</nikon:body>
>         <olympia:lens>300mm zoom, 1.2 f-stop</olympia:lens>
>         <pentex:manual_adaptor>1/10,000 sec to
>                      100 sec</pentex:manual_adaptor>
> <my:camera>
>
> This instance document is much more complex - it explicitly shows where
> all of the components come from.  The complexities of the schema are
> thus sneaking into the instance document.
>
> Note, then, that to hide namespace complexity it is not simply a matter
> of declaring elements locally, but it is also important that
> elementFormDefault be set appropriately (to the value of
> "unqualified").
>
> Let's try to summarize the principles that have been described here:
>
> [1] Schema authors should design their schemas such that the
> complexities of the schema do not show up in the instance documents.
> One such complexity is the namespaces of the schema components (i.e.,
> where all the components come from).
> [2] The combination of declaring elements locally and setting
> elementFormDefault="unqualified" can be used to hide where all the
> components come from (i.e., the namespaces).   Thus, that schema
> complexity is not transferred to the instance document.
>
> This discussion has argued for keeping hidden in the schema the location
> (namespace) of the components.  However, there are scenarios where it is
> desirable to make such namespaces explicit, i.e., we want the instance
> documents to explicitly show where the components come from.  Would
> anyone care to make a case for that?
>
> /Roger
>
> "Roger L. Costello" wrote:
> >
> > Hi Folks,
> >
> > I would like to see if we can collectively come up with a set of "best
> > practices" in designing XML Schemas.   I realize that the specifics of
> > designing a schema are heavily dependent upon the task at hand.
> > However, I firmly believe that there are guidelines that can be employed
> > in creating a schema, and those guidelines hold true irrespective of the
> > specific task.  It is this set of guidelines that I am hoping we can
> > shed some light upon.
> >
> > I would like to get things started by listing some of the things that
> > must be considered in designing a schema.  It is by no means an
> > exhaustive list.  For example, it doesn't address when to block a type
> > from derivation, when to create a schema without a namespace, when to
> > make an element or a type abstract, etc.  Nonetheless, it is a start to
> > some hopefully useful discussions.
> >
> > First, a quick list of the issues:
> >
> > [1] Element versus Type Reuse
> > [2] Local versus Global
> > [3] elementFormDefault - to qualify or not to qualify
> > [4] Evolvability/versioning
> > [5] One namespace versus many namespaces (import verus include)
> > [6] Capturing semantics of elements and types
> >
> > Now, details of each issue:
> >
> > [1] Element versus Type Reuse: from my own experience in building
> > schemas I have found that it is oftentimes not obvious whether to
> > declare something as an element and then reuse that element, or to
> > declare it as a type and reuse the type.  Let's consider the two cases
> > by looking at an example:
> >
> > Element Reuse
> >
> >    - Declare an element for reuse:
> >
> >       |<element name="Elevation">
> >       |   <simpleType base="integer">
> >       |      <minInclusive value="-1290"/>
> >       |      <maxInclusive value="29028"/>
> >       |   </simpleType>
> >       |</element>
> >
> >    - Reusing the element:
> >
> >       |<element name="Boston">
> >       |   <complexType>
> >       |      <sequence>
> >       |         <element ref="city:Elevation"/>
> >       |      </sequence>
> >       |   </complexType>
> >       |</element>
> >
> > Type Reuse
> >
> >    - Declare a type for reuse:
> >
> >       |<simpleType name="Elevation" base="integer">
> >       |   <minInclusive value="-1290"/>
> >       |   <maxInclusive value="29028"/>
> >       |</simpleType>
> >
> >    - Reusing the type:
> >
> >       |<element name="Boston">
> >       |   <complexType>
> >       |      <sequence>
> >       |         <element name="Elevation"  type="city:Elevation"/>
> >       |      </sequence>
> >       |   </complexType>
> >       |</element>
> >
> > Which is preferred - declare Elevation as an element and reuse that
> > element, or declare Elevation as a type and reuse the type?  Here are
> > some things to consider:
> >
> > - Declaring it as an element will allow equivClasses to be created, thus
> > enabling the Elevation element to be substituted by members of the
> > equivClass.
> > - Declaring it as a type will allow derived types to be created, thus
> > enabling the Elevation type to be substituted by derived types.
> > - Someone once said that XML Schemas is a "type-based system".  I am not
> > sure what that means, but perhaps it means that the idea behind XML
> > Schemas is to reuse types?
> > - In programming languages types are the items typically that get
> > reused.  Does that apply to XML Schemas, or not?
> >
> > What are your thoughts on type versus element reuse?  What guidelines
> > would you recommend to someone struggling to decide whether he/she
> > should make an item as an element or as a type?
> >
> > [2] Local versus Global: when should an element or type be declared
> > globally versus when should it be nested within something else (i.e.,
> > local)?  Again, let's take an example:
> >
> > - Everything Global
> >
> >       |<element name="Book" type="cat:Listing"/>
> >       |<complexType name= "Listing">
> >       |   <sequence>
> >       |      <element ref="cat:Title"/>
> >       |      <element ref="cat:Author"/>
> >       |   </sequence>
> >       |</complexType>
> >       |<element name="Title" type="string"/>
> >       |<element name="Author" type="string"/>
> >
> > - Everything Local
> >
> >       |<element name="Book">
> >       |   <complexType>
> >       |      <sequence>
> >       |         <element name="Title" type="string"/>
> >       |         <element name="Author" type="string"/>
> >       |      </sequence>
> >       |   </complexType>
> >       |</element>
> >
> > What guidance can we provide a schema designer in deciding whether or
> > not to "hide" a type or element (by nesting it)?  Someone once asked me
> > when it would be desirable to make an element or type local. I was hard
> > pressed to think of a situation.  Thus, I was not able to provide
> > guidance on when to use elements/types locally.  It is easy to see the
> > benefit of declaring elements/types globally - they can be reused, not
> > only within a schema but also across schemas.  It is not so easy for me
> > to see the benefit of hiding elements/types.  Can someone provide
> > guidance on this issue?  Does the OO encapsulation principle apply to
> > XML Schemas?  If so, why?  If not, why not?
> >
> >  [3] elementFormDefault - to qualify or not to qualify:
> > elementFormDefault is an attribute of <schema>.  It is used to dictate
> > what elements are to be namespace-qualified in instance documents: a
> > value of  "qualified" means that everything is namespace-qualified in
> > the instance document, whereas a value of "unqualified" means that only
> > global items are namespace-qualified.  Personally, I find that for
> > simplicity it is easiest to use "qualified" and then in the instance
> > document use a default namespace declaration.  It is not real clear to
> > me the advantages of using "unqualified".   In other words, I would not
> > be able to provide good guidance on when to use "unqualified".  If
> > someone asked you to list the scenarios when it would be desirable to
> > use "unqualified" what guidance would you give?
> >
> > [4] Evolvability/versioning: in today's rapidly changing marketplace,
> > there is no question that schemas will need to change (evolve).  What
> > guidance do you provide a schema designer in engineering his/her schema
> > to support change?  When a schema is changed, how do you indicate that
> > it is a new version - with a new namespace?
> >
> > I have thought quite a bit about schema evolution.  At the end of this
> > message  I expound quite a bit this subject.
> >
> > As for versioning, that is something that I would be hard pressed to
> > provide guidance upon.  When a new version of a schema is created, what
> > techniques should one use to signify the new version?  One idea is to
> > create a new namespace for the new version.  Another idea is to simply
> > change the version attribute on <schema>.  How would you indicate a new
> > version?
> >
> > [5] One namespace versus many namespaces (import versus include): I
> > think that in a typical project many schemas will be created.   A
> > question will then arise, "shall we define one namespace for all the
> > schemas or shall we create a different namespace for each schema?"  What
> > are the tradeoffs in creating multiple namespaces versus a single
> > namespace?  What guidance would you give someone starting on a project
> > that will create multiple namespaces - create a namespace for each
> > schema or one umbrella namespace?
> >
> > [6] Capturing semantics of elements and types: a schema creates
> > elements, defines the relationships between the elements, and defines
> > the datatypes of the elements.  However, that by itself doesn't define
> > the semantics of the elements.  For example, consider this element
> > declaration:
> >
> > <element name= "jdkdsfjkds">
> >     <simpleType base= "string">
> >         <pattern value= "[a-zA-Z]+\d"/>
> >    </simpleType>
> > </element>
> >
> > Does this tell you the meaning of "jdkdsfjkds"?  Probably not.
> > Something more is needed.  What guidelines would you give someone
> > wishing to document the semantics of the items created in a schema?
> >
> > Here are some guidelines that Mary Pulvermacher sent to me:
> >
> > "Our current thinking is to capture as much of the semantics as possible
> > in the XML schema itself.  We plan to do this by using the XML Schema
> > provided annotation element and having a convention that every element
> > or attribute has an annotation that provides information on the
> > meaning.  Of course this is not perfect but it does carry some
> > advantages.
> >
> > - The XML schema will capture the data structure, meta-data and
> > relationships between the elements.
> > - Use of strong typing will capture much of the data content.
> > - The annotations can capture definitions and other explanatory
> > information
> > - The structure of the "definitions" will always be consistent with the
> > structure used in the schema since they are linked.
> > - Since the schema itself is an XML document, we can use XSL to
> > transform this information into a format suitable for human
> > consumption."
> >
> > Do you have any other thoughts on capturing the semantics of elements
> > and types created by a schema?  What guidance would you give to someone
> > wishing to capture the semantics of the elements and types?
> > --------------------------------------------------------------------
> >
> > Some thoughts on enabling schema evolution (expansion of [4] above)
> >
> > In today's rapidly changing market static schemas will be less
> > commonplace, as the market pushes schemas to quickly support new
> > capabilities.  For example, consider the cellphone industry.  Clearly,
> > this is a rapidly evolving market.  Any schema that the cellphone
> > community creates will soon become obsolete as hardware/software changes
> > extend the cellphone capabilities.  For the cellphone community rapid
> > evolution of a cellphone schema is not just a nicety, the market demands
> > it!
> >
> > Suppose that the cellphone community gets together and creates a schema,
> > cellphone.xsd.  Imagine that every week NOKIA  sends out to the various
> > vendors an instance document (conforming to cellphone.xsd), detailing
> > its current product set.   Now suppose that a few months after
> > cellphone.xsd is agreed upon NOKIA makes some breakthroughs in their
> > cellphones - they create new memory, call, and display features, none of
> > which are supported by cellphone.xsd.  To gain a market advantage NOKIA
> > will want to get information about these new capabilities to its vendors
> > ASAP.  Further, they will have little motivation to wait for  the next
> > meeting of the cellphone community to consider upgrades to
> > cellphone.xsd.  They need results NOW. How does open content help?
> > That is described next.
> >
> > Suppose that the cellphone schema is declared "open".  Immediately NOKIA
> > can extend its instance documents to incorporate data  about the new
> > features.  How does this change impact the vendor applications that
> > receive the instance documents?  The answer is - not  at all.  In the
> > worst case, the vendor's application will simply skip over the new
> > elements.  More likely, however, the vendors are showing
> > the cellphone features in a list box and these new features will be
> > automatically captured with the other features.  Let's stop and think
> > about what has been just described …  Without modifying the cellphone
> > schema and without touching the vendor's applications, information about
> > the new NOKIA features has been instantly disseminated to the
> > marketplace!  Open content in the cellphone schema is the enabler for
> > this rapid dissemination.
> >
> > Clearly some types of instance document extensions may require
> > modification to the vendor's applications.  Recognize, however, that
> > thevendors are free to upgrade their applications in their own time.
> > The applications do not need to be upgraded before changes can be
> > introduced into instance documents.  At the very worst, the vendor's
> > applications will simply skip over the extensions.  And, of course,
> > those vendors do not need to upgrade in lock-step
> >
> > To wrap up this example … suppose that several months later the
> > cellphone community reconvenes to discuss enhancements to the schema.
> > The new features that NOKIA first introduced into the marketplace are
> > then officially added into the schema.  Thus completes the cycle.
> > Changes to the instance documents have driven the evolution of the
> > schema.
>
Follow-Ups:
- Re: XML Schemas: Best Practices
  - From: tpassin@home.com
References:
- Re: XML Schemas: Best Practices
  - From: "Roger L. Costello" <costello@mitre.org>
Prev by Date: Re: XML Schemas: Best Practices
Next by Date: Re: XML Schemas: Best Practices
Previous by thread: Re: XML Schemas: Best Practices
Next by thread: Re: XML Schemas: Best Practices
Index(es):
- Date
- Thread