The awesome power of Schematron + XPath 2.0 ... Able to express all my

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]
The awesome power of Schematron + XPath 2.0 ... Able to express all my data requirements!
From: "Costello, Roger L." <costello@mitre.org>
To: <xml-dev@lists.xml.org>
Date: Wed, 24 Oct 2007 19:49:47 -0400
Hi Folks,

A few days ago Rick Jelliffe mentioned some of the new capabilities
that XPath 2.0 adds to Schematron.  

The things that he mentioned sounded very exciting to me, so I put
together what is for me a typical set of data requirements.  I then
implemented those data requirement using Schematron+XPath 2.0.  Then,
for comparison, I attempted to implement the same data requirements
using XML Schemas.  

It was a very enlightening experience.  Schematron+XPath 2.0 was able
to implement all of my data requirements (including all grammar
constraints). Conversely, XML Schemas was only able to implement the
grammar constraints (which are actually of lesser importance to me than
my other data requirements).  

Of course, this represents only one example; other examples must be
explored.  Nonetheless, the fact that Schematron+XPath 2.0 could
implement all of my (fairly extensive) data requirements is very
exciting. 

Below is my set of data requirements followed by the Schematron+XPath
2.0 implementation, as well as the XML Schema implementation.  Perhaps
you have similar data requirements?

Thanks Rick!

/Roger

-----------------------------------------------------------------------
--

HIGHLIGHTS OF WHAT I DISCOVERED

Schematron+XPath 2.0 was able to express:

- a security classification policy (data requirement #2)
- a reserved word filter (data requirement #3)
- data integrity checks, including a hashcode check (data requirement
#4)
- tracebacks from implementation to data requirements, for
accreditation purposes (data requirement #5)
- backward and forward compatibility in a safe fashion (data
requirement #6)
- validation in stages, e.g. perform a security classification check
first, and if it succeeds only then perform a reserved word check, etc
(data requirement #7)
- all grammar constraints (that are normally implemented using XSD or
RNG) (data requirements #1 and #8)

Conversely, XML Schemas was only able to express the grammar
constraints (data requirements, #1 and #8).  It was unable to express
the other data requirements (#2 - #7). 

-----------------------------------------------------------------------
--

SAMPLE XML INSTANCE DOCUMENT (i.e. SAMPLE DATA)

<?xml version="1.0" encoding="UTF-8"?>
<Document classification="secret">
    <NumParas>4</NumParas>
    <Para classification="unclassified">
          One if by land, two if by sea;
    </Para>
    <Para classification="confidential">
          And I on the opposite shore will be,
          Ready to ride and spread the alarm
    </Para>
    <Para classification="unclassified">
          Ready to ride and spread the alarm
          Through every Middlesex, village and farm,
    </Para>
    <Para classification="secret">
          For the country folk to be up and to arm.
    </Para>
    <Hash>304</Hash>
</Document>

-----------------------------------------------------------------------
--

DATA REQUIREMENTS

1. ** DOCUMENT ORGANIZATION **

1.1 The document is comprised of one or more paragraphs.

1.2 Each paragraph is labeled with a classification, which can be one
of top-secret, secret, confidential, or unclassified.

1.3 A paragraph's text must not exceed 200 characters in length, and
shall be comprised of only these characters: a-z, A-Z, 0-9, whitespace,
comma, period, colon, semi-colon.

1.4 The document has an overall classification, which can also be one
of top-secret, secret, confidential, or unclassified.

1.5 The information in the document may be ordered in any way the
author sees fit.


2. ** SECURITY CLASSIFICATION POLICY **

2.1 No paragraph may have a classification higher than the overall
document classification.


3. ** RESERVED WORD FILTER **

3.1 No paragraph may contain these reserved words: SCRIPT, FUNCTION.


4. ** DATA INTEGRITY CHECKS **

4.1 The document must contain a count of the number of paragraphs in
the document, and that count must match the actual number of
paragraphs.

4.2 The document must contain a hashcode, and that hashcode must match
the hash of the document.


5. ** ACCREDITATION **

5.1 For accreditation purposes an implementation of any one of these
requirements must reference the specific requirement that it is
implementing.


6. ** FUTURE REQUIREMENTS ** 

6.1 Additional future requirements must be backward and forward
compatible.


7. ** VALIDATION IN STAGES **

7.1 It must be possible to validate the data in stages, e.g. check the
data against the security policy and only perform the other checks if
it succeeds.


8. ** XML GRAMMAR **

8.1 The root element is <Document>.

8.2 <Document> has one attribute, classification, whose value can be
one of top-secret, secret, confidential, or unclassified.

8.3 <Document> is comprised of one <NumParas>, one or more <Para>, and
one <Hash>.  

8.3.1 These child elements may occur in any order.

8.4 Each <Para> has one attribute, classification, whose value can be
one of top-secret, secret, confidential, or unclassified.

8.5 The value of each <Para> is a string, constrained to a maximum of
200 characters, comprised of only these characters: a-z, A-Z, 0-9,
whitespace, comma, period, colon, semi-colon.

8.6 The value of <NumParas> is a nonNegativeInteger.

8.7 The value of <Hash> is a long.

-----------------------------------------------------------------------
--
 
XML SCHEMA IMPLEMENTATION

<?xml version="1.0"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema";
            elementFormDefault="qualified">

    <xs:element name="Document">
        <xs:complexType>
            <xs:sequence>
                <xs:element name="NumParas"
type="xs:nonNegativeInteger"/>
                <xs:element name="Para" maxOccurs="unbounded">
                    <xs:complexType>
                        <xs:simpleContent>
                            <xs:extension base="paraType">
                                <xs:attribute name="classification"
type="classificationLevels" use="required"/>
                            </xs:extension>
                        </xs:simpleContent>
                    </xs:complexType>
                </xs:element>
                <xs:element name="Hash" type="xs:long"/>
            </xs:sequence>
            <xs:attribute name="classification"
type="classificationLevels" use="required"/>
        </xs:complexType>
    </xs:element>
    <xs:simpleType name="classificationLevels">
        <xs:restriction base="xs:string">
            <xs:enumeration value="top-secret"/>
            <xs:enumeration value="secret"/>
            <xs:enumeration value="confidential"/>
            <xs:enumeration value="unclassified"/>
        </xs:restriction>
    </xs:simpleType>
    <xs:simpleType name="paraType">
        <xs:restriction base="xs:string">
            <xs:maxLength value="200"/>
            <xs:pattern value="[\sa-zA-Z0-9,;:\.]*"/>
        </xs:restriction>
    </xs:simpleType>
</xs:schema>

-----------------------------------------------------------------------
--

SCHEMATRON IMPLEMENTATION

<?xml version="1.0"?>
<sch:schema xmlns:sch="http://purl.oclc.org/dsdl/schematron";
            xmlns:xs="http://www.w3.org/2001/XMLSchema";
            queryBinding="xslt2">

   <sch:let name="document-classification"
            value="/Document/@classification" />

   <sch:pattern id="SECURITY-CLASSIFICATION-POLICY">

      <sch:p>A Para's classification value cannot be more sensitive 
             than the Document's classification value.</sch:p> 

      <sch:rule context="Para[@classification='top-secret']">

         <sch:assert test="$document-classification='top-secret'"
                     see="Data Requirement 2.1">
             If there is a Para labeled "top-secret" then the Document

             must be labeled top-secret
         </sch:assert>

      </sch:rule>

      <sch:rule context="Para[@classification='secret']">

         <sch:assert test="($document-classification='top-secret') or
                           ($document-classification='secret')"
                     see="Data Requirement 2.1">
             If there is a Para labeled "secret" then the Document  
             must be labeled either secret or top-secret
         </sch:assert>

      </sch:rule>

      <sch:rule context="Para[@classification='confidential']">

         <sch:assert test="($document-classification='top-secret') or
                           ($document-classification='secret') or 
                           ($document-classification='confidential')"
                     see="Data Requirement 2.1">
             If there is a Para labeled "confidential" then the
Document  
             must be labeled either confidential, secret or top-secret
         </sch:assert>

      </sch:rule>

   </sch:pattern>

   <sch:pattern id="RESERVED-WORD-FILTER">

      <sch:p>These reserved words are not allowed anywhere in the
             document: SCRIPT, FUNCTION.</sch:p> 

      <sch:rule context="Document">

         <sch:assert test="count(//node()[contains(.,'SCRIPT')]) = 0
                           and
                           count(//node()[contains(.,'FUNCTION')]) = 0"
                     see="Data Requirement 3.1">
             The document must not contain the words SCRIPT or FUNCTION
         </sch:assert>

      </sch:rule>

   </sch:pattern>

   <sch:pattern id="DATA-INTEGRITY-CHECKS">

      <sch:p>The count value in the NumParas element must match 
             a count of the actual number of Para elements.
             And the hash value must match the value obtained
             by recomputing the hash on the current document.</sch:p> 

      <sch:rule context="NumParas">

         <sch:assert test=". = ../count(Para)"
                     see="Data Requirement 4.1">
             The count value in this element must match 
             a count of the actual number of Para elements
         </sch:assert>

      </sch:rule>

      <sch:rule context="Hash">

         <sch:assert test=". = sum(for $i in //*[not(*) and
not(self::Hash)] return 
                                       for $j in string-length($i)
return $j)"
                     see="Data Requirement 4.2">
             The value of Hash must match the value that is obtained
             by recomputing the hash algorithm on the current document
         </sch:assert>

      </sch:rule>

   </sch:pattern>

   <sch:pattern id="GRAMMAR">

      <sch:rule context="/*">

         <sch:assert test="name() = 'Document'"
                     see="Data Requirement 8.1">
             The root element must be Document
         </sch:assert>

         <sch:assert test="(count(@*) = 1) and 
                           (name(@*) = 'classification') and
                           (($document-classification='top-secret') or
                           ($document-classification='secret') or 
                           ($document-classification='confidential') or
                           ($document-classification='unclassified'))"
                     see="Data Requirement 8.2">
             Document has one attribute, classification, whose value
can 
             be one of top-secret, secret, confidential, or
unclassified.
         </sch:assert>

      </sch:rule>

      <sch:rule context="/Document">

         <sch:assert test="(count(NumParas) = 1) and
                           (count(Para) &gt;= 1) and
                           (count(Hash) = 1) and
                           (count(*[name() !='NumParas' and 
                                    name() != 'Paras' and 
                                    name() != 'Hash']) = 0)"
                     see="Data Requirement 8.3 and 8.3.1">
             Document is comprised of one NumParas, one or more Para,
and 
             one Hash. These child elements may occur in any order.
         </sch:assert>

      </sch:rule>

      <sch:rule context="Para">

         <sch:assert test="(count(@*) = 1) and 
                           (name(@*) = 'classification') and
                           (($document-classification='top-secret') or
                           ($document-classification='secret') or 
                           ($document-classification='confidential') or
                           ($document-classification='unclassified'))"
                     see="Data Requirement 8.4">
             Para has one attribute, classification, whose value can be

             one of top-secret, secret, confidential, or unclassified.
         </sch:assert>

         <sch:assert test="(. castable as xs:string) and
                           (string-length &lt;= 200) and
                           (matches(.,'[\sa-zA-Z0-9,;:\.]*))"
                     see="Data Requirement 8.5">
             The value of a Para is a string, constrained to a maximum
of 
             200 characters, comprised of only these characters: a-z,
A-Z, 
             0-9, comma, period, colon, semi-colon.
         </sch:assert>

      </sch:rule>

      <sch:rule context="NumParas">

         <sch:assert test=". castable as xs:nonNegativeInteger"
                     see="Data Requirement 8.6">
             The value of NumParas is a nonNegativeInteger.
         </sch:assert>

      </sch:rule>

      <sch:rule context="Hash">

         <sch:assert test=". castable as xs:long"
                     see="Data Requirement 8.7">
             The value of Hash is a long.
         </sch:assert>

      </sch:rule>

   </sch:pattern>

</sch:schema>
Follow-Ups:
- Re: [xml-dev] The awesome power of Schematron + XPath 2.0 ... Able to express all my data requirements!
  - From: "Fraser Goffin" <goffinf@googlemail.com>
- Re: [xml-dev] The awesome power of Schematron + XPath 2.0 ...Able to express all my data requirements!
  - From: Rick Jelliffe <rjelliffe@allette.com.au>
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]