OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: XML Schemas: Best Practices



Hi Folks,

Since there was no response to the last issue raised (use XMLSchema as
the default namespace versus the targetNamespace as the default) I will
assume that everyone is in agreement with the conclusion.  Shortly, I
will post a summary of this issue on my web site.

I would like to start a new issue (we will get back to the extensibility
issue after this one).  I think that this issue will be quite
controversial.

Issue: what is Best Practice for checking constraints that are not
expressable by XML Schemas?

Example.  Consider this simple instance document:

<?xml version="1.0"?>
<root>
     <A>10</A>
     <B>20</B>
</root>

With XML Schemas we can check the following constraints:

- the root element contains a sequence of elements, A followed by B
- the A element contains an integer
- the B element contains an integer

In fact, here's the XML Schema to do this:

<?xml version="1.0"?>
<xsd:schema xmlns:xsd="http://www.w3.org/2000/10/XMLSchema"
        targetNamespace="http://www.test.org"
        xmlns="http://www.test.org"
        elementFormDefault="qualified">
    <xsd:element name="root">
        <xsd:complexType>
            <xsd:sequence>
                <xsd:element name="A" type="xsd:integer"/>
                <xsd:element name="B" type="xsd:integer"/>
            </xsd:sequence>
        </xsd:complexType>
    </xsd:element>
</xsd:schema>

We cannot use XML Schemas to express this constraint:

- the value of A must be greater than the value of B

So what do we do to check this constraint?  As I see it, there are three
options:

(1) There are many other schema languages besides XML Schemas:

    - Schematron
    - TREX
    - RELAX

The first option is to study these languages to see if one of them
enables us to express this constraint.

(2) The second option is to write some code to check the constraint.

(3) The third option is to write a stylesheet to check the constraints. 
[Note: I got this idea from an article written by Rick Jelliffe.]  

For example, the following stylesheet checks instance documents to see
if the contents of the A element is greater than the contents of the B
element:

<?xml version="1.0"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
                version="1.0">

    <xsl:output method="text"/>

    <xsl:template match="/">
        <xsl:if test="/root/A &lt; /root/B">
            <xsl:text>Schema is invalid</xsl:text>
        </xsl:if>
        <xsl:if test="/root/A &gt;= /root/B">
            <xsl:text>Schema is valid</xsl:text>
        </xsl:if>
    </xsl:template>
</xsl:stylesheet>

I ran this stylesheet on the above XML data and it generated the
message:

    Schema is invalid.

That's exactly what we want.

Thus, with this third option, you check as many constraints as you can
using XML Schemas.  For the other constraints you write a stylesheet to
do the checking.  If both the schema validator and the XSL processor
generate a positive output then you know that your instance document is
valid.

The combination of XML Schemas plus stylesheets provides for a
powerful constraint checking mechanism.

Those are the three options that I see for extending XML Schema's
constraint checking.  [Are there other options?]

Which option is Best Practice?  At the moment, I am inclined towards
option 3 (creating a stylesheet to check the constraints not expressable
using XML Schemas).  Here's my reasoning:

{1} Recall that this option says: "find another schema language, such as
Schematron, TREX, RELAX, which has the ability to express the constriant
desired".  I see two drawbacks with this option:

   a. Each of the languages has different capabilities.  I might end 
      up needing to use several of these languages to express all my
      constraints.

   b. My time is limited.  I really don't have the time to learn
      all these different languages.

{2} Recall that this option says: "write some code to check the
constraints that XML Schemas missed".  I see this option as less than
optimal.  Here's my reasoning:

    a. There are other XML technologies that can be used, so why 
       write code?

{3} Recall that this option says: "write a stylesheet to express the
additional constraints".  I like this approach for the following
reasons:
  
    a. I don't have to learn a bunch of different schema languages.

    b. XSLT/XPath is very powerful. I suspect that with a stylesheet
       I can express every constraint I might ever need.  [True?]

    c. XSLT/XPath is well supported, and will be supported for a long
       time.  I cannot be sure how long the authors of Schematron,
       TREX, and RELAX will continue to support their initiatives.

    d. I don't have to write any code.  I am able to leverage off
       an existing XML technology.

I am eager to be convinced that Options {1} and {2} have benefits over
Option {3}, but at the moment I am struggling to see what they might
be.  I look forward to your comments.  /Roger