OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: XML Schemas: Best Practices



Hi all,

"Roger L. Costello" wrote:

> Issue: what is Best Practice for checking constraints that are not
> expressable by XML Schemas?

I'll start off by saying that I much enjoy these Best Practice discussions
and I have learnt a lot by following them. The information on
http://www.xfront.com/BestPractices.html has helped me a lot with my work on
XML Schemas. As long as you keep your own problems in mind you can get
helpful hints on how to solve them even if you don't choose to use the "Best
Practice" method.

Now to the issue at hand.

> Example.  Consider this simple instance document:
>
> <?xml version="1.0"?>
> <root>
>      <A>10</A>
>      <B>20</B>
> </root>

> We cannot use XML Schemas to express this constraint:
>
> - the value of A must be greater than the value of B
>
> So what do we do to check this constraint?  As I see it, there are three
> options:
>
> (1) There are many other schema languages besides XML Schemas:
>
>     - Schematron
>     - TREX
>     - RELAX
>
> The first option is to study these languages to see if one of them
> enables us to express this constraint.

I've been experimenting with validation that can't be done by XML Schemas
and the option I choose to use was Schematron. I don't have any particular
reason for my choice more than that I knew Schematron was capable of
validating problems like in your example above. What surprised me was how
easy it was to learn how to use Schematron (if you have knowledge of XPath
and XSLT). Since Schematron use (maybe doesn't have to?) XSLT as the
validation engine much of what you can do with writing your own stylesheet
(option 3 below) can also be done using Schematron. Maybe if your an expert
using XSLT you don't mind writing your own validation stylesheets but I
found Schematron much easier to use. Yes, you have to learn how to use
Schematron but you learn the basics in a couple of hours.

> (2) The second option is to write some code to check the constraint.
>
> (3) The third option is to write a stylesheet to check the constraints.
> [Note: I got this idea from an article written by Rick Jelliffe.]
>
> For example, the following stylesheet checks instance documents to see
> if the contents of the A element is greater than the contents of the B
> element:
>
> <?xml version="1.0"?>
> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
>                 version="1.0">
>
>     <xsl:output method="text"/>
>
>     <xsl:template match="/">
>         <xsl:if test="/root/A &lt; /root/B">
>             <xsl:text>Schema is invalid</xsl:text>
>         </xsl:if>
>         <xsl:if test="/root/A &gt;= /root/B">
>             <xsl:text>Schema is valid</xsl:text>
>         </xsl:if>
>     </xsl:template>
> </xsl:stylesheet>

The above example would look like this in Schematron:

<sch:schema xmlns:sch="http://www.ascc.net/xml/schematron">
   <sch:pattern name="A content greater than B content">
      <sch:rule context="root">
         <sch:assert test="A > B">Validation error: The content of A must be
greater than the content of B</sch:assert>
      </sch:rule>
   </sch:pattern>
</sch:schema>

For basic validation the above four elements are the only ones you need to
learn which makes Schematron easy to use if you're familiar with XPath.

> Thus, with this third option, you check as many constraints as you can
> using XML Schemas.  For the other constraints you write a stylesheet to
> do the checking.  If both the schema validator and the XSL processor
> generate a positive output then you know that your instance document is
> valid.

The way I solved this was to write a very simple XSLT stylesheet that simply
finds all the element from the Schematron namespace within the <xsd:appinfo>
element in my XSD Schema. The stylesheet concatenated all this Schematron
information to a Schematron schema which could be used for validating the
things XSD could not validate (NOTE: this idea came from Rick Jelliffe [1]).
By using the command line version of XSV, Saxon and a simple batch file I
could do all the validation in one step.

> Which option is Best Practice?  At the moment, I am inclined towards
> option 3 (creating a stylesheet to check the constraints not expressable
> using XML Schemas).  Here's my reasoning:
>
> {3} Recall that this option says: "write a stylesheet to express the
> additional constraints".  I like this approach for the following
> reasons:
>
>     a. I don't have to learn a bunch of different schema languages.

Schematron is very easy to learn.

>     b. XSLT/XPath is very powerful. I suspect that with a stylesheet
>        I can express every constraint I might ever need.  [True?]

Much of this is available through Schematron.

>     c. XSLT/XPath is well supported, and will be supported for a long
>        time.  I cannot be sure how long the authors of Schematron,
>        TREX, and RELAX will continue to support their initiatives.
>
>     d. I don't have to write any code.  I am able to leverage off
>        an existing XML technology.

But you still have to write an XSLT stylesheet which can be just as
complicated as writing code for the inexperienced. By using Schematron you
don't have to worry about template rules since Schematron does this for you.

> I am eager to be convinced that Options {1} and {2} have benefits over
> Option {3}, but at the moment I am struggling to see what they might
> be.  I look forward to your comments.  /Roger

This was just a brief description of my experiences with Schematron and I
wouldn't say it's a Best Practice because I know almost nothing about the
other options but I look forward to see comments about the other options as
well.

Cheers,
/Eddie

[1] http://www.ascc.net/xml/resource/schematron/#overview