Schematron Best Practice: Embed Schematron into a Grammar-Based Language

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]

Schematron Best Practice: Embed Schematron into a Grammar-Based Language? Or Keep Separate?

From: "Costello, Roger L." <costello@mitre.org>
To: <xml-dev@lists.xml.org>
Date: Mon, 23 Jul 2007 07:24:38 -0400

Hi Folks,
 
I would like to begin the next Schematron Best Practice issue.  Below
is the issue.  I have made a start on addressing the issue, including a
preliminary recommendation.  I invite you to add to the list of
advantages and disadvantages, and to enhance/modify the recommendation.
/Roger
 

ISSUE
 
You have a set of data validation requirements for your system.  You
have decided to implement the requirements using a combination of a
grammar-based language (e.g. Relax NG or XML Schema) plus Schematron.
Should the Schematron implementation be embedded within the grammar
document, or should the Schematron implementation be in a separate
document from the grammar document?
 

EXAMPLE

Suppose this XML instance document is representative of the type of
data that your system exchanges:

        <?xml version="1.0"?>
        <Document classification="secret">
              <Para classification="unclassified">
                   One if by land; two if by sea.
              </Para>
        </Document>
    

And suppose your system's data requirements are:

1. The <Para> classification value cannot be more sensitive than the
<Document> classification value (top-secret is more sensitive than
secret, which is more sensitive than confidential, which is more
sensitive than unclassified).
   
2. The <Document> element must have a classification attribute, whose
value is either top-secret, secret, confidential, or unclassified.
   
3. The <Para> element must have a classification attribute, whose value
is either top-secret, secret, confidential, or unclassified.

The first requirement will be implemented using Schematron.  The next
two requirements will be implemented using XML Schemas.

There are two alternatives:

A. Create two documents: one document for the Schematron
implementation, and a second document for the XML Schema
implementation.

B. Create one document: the Schematron patterns, rules, and assertions
are embedded within <appinfo> elements in the XML Schema.


ADVANTAGES/DISADVANTAGES OF SEPARATE SCHEMATRON AND GRAMMAR DOCUMENTS

ADVANTAGES

1. The particular grammar language currently being used can be easily
replaced.  Thus, if XML Schema is currently being used, at a later date
you can easily replace it with Relax NG without impacting the
Schematron schema.

2. Constraint checking can be done in stages, in a pipeline fashion.
It might be desirable for your system to implement constraint checks in
phases - first do grammar checking, then do something, then do
co-constraint checking (using Schematron) then do something, then do
data cardinality checking (using Schematron), then do something, then
do algorithmic checking (using Schematron). 

3. There may be a performance improvement. Suppose grammar checking is
done first and suppose it fails (i.e. outputs errors) then it may not
be necessary to execute the Schematron validation; thus there is a time
savings.

DISADVANTAGES

1. There may be a performance degradation.  Running several validations
rather than a single validation may be more expensive.


ADVANTAGES/DISADVANTAGES OF SCHEMATRON EMBEDDED WITHIN A GRAMMAR
DOCUMENT

ADVANTAGES

1. There may be a performance improvement.  Running one validation
rather than several validations may yield a savings in performance.

DISADVANTAGES

1. Swapping out the particular grammar language that is currently being
used and replacing it with a different grammar language may be
difficult since the two are tightly intertwined.  

2. Constraint checking is a big-bang event.  All constraints --
grammar, co-constraints, cardinality, algorithmic -- are checked at
once. 

3. There may be a performance degradation. It is not possible to take
advantage of omitting Schematron validation when grammar validation
fails.


RECOMMENDATION

For maximum flexibility and long-term maintainability, keep the
Schematron schema separate from the grammar schema.

Follow-Ups:
- Re: [xml-dev] Schematron Best Practice: Embed Schematron into a Grammar-Based Language? Or Keep Separate?
  - From: "bryan rasmussen" <rasmussen.bryan@gmail.com>
- Re: [xml-dev] Schematron Best Practice: Embed Schematron into a Grammar-Based Language? Or Keep Separate?
  - From: "Rick Jelliffe" <rjelliffe@allette.com.au>
- Re: [xml-dev] Schematron Best Practice: Embed Schematron into a Grammar-Based Language? Or Keep Separate?
  - From: "Fraser Goffin" <goffinf@googlemail.com>

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]