RE: [xml-dev] Schematron Best Practice: A Schematron schema's area of r

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]

RE: [xml-dev] Schematron Best Practice: A Schematron schema's area of responsibility?

From: "Costello, Roger L." <costello@mitre.org>
To: <xml-dev@lists.xml.org>
Date: Tue, 17 Jul 2007 16:19:33 -0400

Excellent points Noah!

Let me try to summarize the points that have been made over the last
few weeks.

First, recall the issue: 

You are tasked with implementing a system's XML data validation
requirements.  For some data requirements there is only one XML
validation language that has the needed capability, so the selection of
language is clear.  For other requirements, however, there is a choice;
the requirement could be implemented by several XML validation
languages.  How do you decide which language to use?  What factors
should go into making the decision? Should multiple languages be used,
or is it best to stick with one language?

Example:

Suppose this XML instance document is representative of the type of
data that a system exchanges: 

<?xml version="1.0"?>
<Document classification="secret">
      <Para classification="unclassified">
           One if by land; two if by sea.
      </Para>
</Document>

And suppose the system's data requirements are:
1. The <Para> classification value cannot be more sensitive than the
<Document> classification value.
2. The <Document> element must have a classification attribute, whose
value is top secret, secret, confidential, or unclassified.
3. The <Para> element must have a classification attribute, whose value
is top secret, secret, confidential, or unclassified.

The first requirement is a co-constraint and cannot currently be
expressed using a grammar-based language. It must be expressed using
Schematron.

For the next two requirements, however, there are alternative
solutions. Here's how the requirements may be implemented using XML
Schemas:

   <attribute name="classification">
       <simpleType>
           <enumeration value="top-secret" />
           <enumeration value="secret" />
           <enumeration value="confidential" />
           <enumeration value="unclassified" />
       </simpleType>
   </attribute>

Here's how the requirements may be implemented using Schematron:

   <sch:pattern name="Classifications"> 

      <sch:rule context="*[@classification]">

         <sch:assert test="@classification='top-secret' or
                           @classification='secret' or
                           @classification='confidential' or
                           @classification='unclassified'">
             The value of a classification must be one of top-secret,
             secret, confidential, or unclassified.
         </sch:assert>

      </sch:rule>

   </sch:pattern>

Both implementations seem equally plausible.  So how does one decide
which language to use? What factors should enter into the decision? 

Here are some factors that have been identified when trying to make a
decision.

[Bryan Rasmussen] "Questions to ask about languages when they are
equivalent in abilities are which one would be easiest to implement it
in, which one would be easiest to maintain and extend."

Question: above is shown two different implementations of the
classification attribute data requirement. How do we determine whether
the XML Schema implementation will be easier or harder to maintain and
extend than the Schematron implementation?

[Rick Jelliffe] "Traceability of an implementation to its requirement
is terribly important."  Rick notes that Schematron provides a "see"
attribute on each assertion that can be used to connect the Schematron
implementation directly to the requirement it implements.

[Noah Mendelsohn] "Let's say you decide to put some constraints in W3C
XML Schema and some in Schematron.  That can be a great approach, and
many people are happy with it, but there are compromises involved.  For
example, if a 
downstream tool (e.g. a databinding engine) wants to reason about the
constraints on element E, it may have to look at both the Schematron
rules and the XSD grammars together.  Not necessarily a bad thing, but
potentially a complication." 

[Noah Mendelsohn] "I think Schematron is implicitly focused to a
significant degree on situations in which a human user (or maybe a text
log) will be the recipient of a report on how the instance fared with
respect to the Schema.  XSD is at least implicitly a bit more aimed at
scenarios in which the validation will be embedded in some larger
processing context, perhaps a database system, which will get its
validation reports through some API. The consistent means of providing
report "text" in Schematron seems particularly suited to providing
reports to human users, and is arguably a bit less convenient as the
basis for an interface between software layers."

[Dave Carver] Dave raised the issue of selecting an implementation
based upon whether it is geared towards a technical user versus a
business user.

What other factors should enter into deciding which XML validation
language(s) should be used to implement a system's data validation
requirements?

/Roger

-----Original Message-----
From: noah_mendelsohn@us.ibm.com [mailto:noah_mendelsohn@us.ibm.com] 
Sent: Tuesday, July 17, 2007 3:24 PM
To: Rick Jelliffe
Cc: Costello, Roger L.; xml-dev@lists.xml.org
Subject: Re: [xml-dev] Schematron Best Practice: A Schematron schema's
area of responsibility?

Rick Jelliffe writes:

> Discussing the sequencing of particular technologies outside their
> diagnostic context puts the cart before the horse.

Yes, exactly.  Furthermore, there are likely to be tradeoffs involved
in 
any decision you make.  Certain constraints are typically easier to 
express in grammar-based constraint languages.  For example, saying
that 
element E must contain a sequence of A,B,C.   Certain constraints are 
typically easier to express in a language like Schematron.

Fine, so you should use each for what it does well?  Often, but not 
always.  Let's say you decide to put some constraints in W3C XML Schema

and some in Schematron.  That can be a great approach, and many people
are 
happy with it, but there are compromises involved.  For example, if a 
downstream tool (e.g. a databinding engine) wants to reason about the 
constraints on element E, it may have to look at both the Schematron
rules 
and the XSD grammars together.  Not necessarily a bad thing, but 
potentially a complication.   (Indeed, that's one of the reasons for
the 
architecture of the new assertion rules proposed for W3C XML Schema
1.1; 
they are influenced heavily by Schematron, but they are not a complete 
replacement.  They are, however, more directly integrated with the XSD 
type system (in Schema 1.1, XPath-based constraints are on
complexTypes).  
So, you'll still want to use Schematron  or Schematron+XSD to solve
some 
problems, but for many of the simpler uses of Schematron+XSD, Schema
1.1 
provides a more tightly integrated approach.)

The main point here is not to advocate Schema 1.1 assertions,
Schematron, 
XSD or any other combination of technologies.  Rather it's to second
what 
I take to be Rick's position:  which is you need to think hard about
use 
cases and success criteria in order to judge the tradeoffs involved in 
adopting any combination of these technologies.

--------------------------------------
Noah Mendelsohn 
IBM Corporation
One Rogers Street
Cambridge, MA 02142
1-617-693-4036
--------------------------------------

Follow-Ups:
- RE: [xml-dev] Schematron Best Practice: A Schematron schema's area of responsibility?
  - From: "Mark Delaney" <MARKD@microen.com>
- RE: [xml-dev] Schematron Best Practice: A Schematron schema's area ofresponsibility?
  - From: noah_mendelsohn@us.ibm.com

References:
- Re: [xml-dev] Schematron Best Practice: A Schematron schema's area ofresponsibility?
  - From: noah_mendelsohn@us.ibm.com

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]