Re: [xml-dev] The awesome power of Schematron + XPath 2.0 ... Able to ex

On 25/10/2007, Costello, Roger L. <costello@mitre.org> wrote:

Oops! Thanks Fraser. I will fix the bugs. Yes, Saxon-SA is required for using the expression "castable as xs:nonNegativeInteger".    /Roger

From: Fraser Goffin [mailto:goffinf@googlemail.com]
Sent: Thursday, October 25, 2007 7:56 AM
To: xml-dev@lists.xml.org
Subject: Re: [xml-dev] The awesome power of Schematron + XPath 2.0 ... Able to express all my data requirements!

Hi Roger,

since I am thinking of moving both to ISO schematron and XSLT2, I gave your example above a try. For the present I am using :-

- SaxonB (v8.9)

- iso_schematron_skeleton_for_saxon.xsl

I encountered a couple of issues :-

1. When creating the implementation stylesheet Saxon emitted the warning :-

'Running an XSLT 1.0 stylesheet with an XSLT 2.0 processor'

It nonetheless produced the stylesheet successfully, so not sure why ?

2. When running this stylesheet against the sample input a couple of errors occurred which I think are because :-

>> (string-length <= 200) and

should be :-

(string-length() <= 200) and

and :-

>> (matches(.,'[\sa-zA-Z0-9,;:\.]*))

should be (quote missing after the *) :-

(matches(.,'[\sa-zA-Z0-9,;:\.]*'))"

3. Neither the test for xs:nonNegativeInteger or xs:long would work with SaxonB, it said :-

'The type xs:nonNegativeInteger is not recognised by a basic XSLT processor'

Changing both to xs:integer was OK, so I don't know if this is a limitation of SaxonB which will go away when I switch to SaxonSA ?

Regards

Fraser.

On 25/10/2007, Costello, Roger L. <costello@mitre.org > wrote:
Hi Folks,

A few days ago Rick Jelliffe mentioned some of the new capabilities
that XPath 2.0 adds to Schematron.

The things that he mentioned sounded very exciting to me, so I put
together what is for me a typical set of data requirements.  I then
implemented those data requirement using Schematron+XPath 2.0.  Then,
for comparison, I attempted to implement the same data requirements
using XML Schemas.

It was a very enlightening experience.  Schematron+XPath 2.0 was able
to implement all of my data requirements (including all grammar
constraints). Conversely, XML Schemas was only able to implement the
grammar constraints (which are actually of lesser importance to me than
my other data requirements).

Of course, this represents only one example; other examples must be
explored.  Nonetheless, the fact that Schematron+XPath 2.0 could
implement all of my (fairly extensive) data requirements is very
exciting.

Below is my set of data requirements followed by the Schematron+XPath
2.0 implementation, as well as the XML Schema implementation.  Perhaps
you have similar data requirements?

Thanks Rick!

/Roger

-----------------------------------------------------------------------
--

HIGHLIGHTS OF WHAT I DISCOVERED

Schematron+XPath 2.0 was able to express:

- a security classification policy (data requirement #2)
- a reserved word filter (data requirement #3)
- data integrity checks, including a hashcode check (data requirement
#4)
- tracebacks from implementation to data requirements, for
accreditation purposes (data requirement #5)
- backward and forward compatibility in a safe fashion (data
requirement #6)
- validation in stages, e.g. perform a security classification check
first, and if it succeeds only then perform a reserved word check, etc
(data requirement #7)
- all grammar constraints (that are normally implemented using XSD or
RNG) (data requirements #1 and #8)

Conversely, XML Schemas was only able to express the grammar
constraints (data requirements, #1 and #8).  It was unable to express
the other data requirements (#2 - #7).

-----------------------------------------------------------------------
--

SAMPLE XML INSTANCE DOCUMENT (i.e. SAMPLE DATA)

<?xml version="1.0" encoding="UTF-8"?>
<Document classification="secret">
   <NumParas>4</NumParas>
   <Para classification="unclassified">
         One if by land, two if by sea;
   </Para>
   <Para classification="confidential">
         And I on the opposite shore will be,
         Ready to ride and spread the alarm
   </Para>
   <Para classification="unclassified">
         Ready to ride and spread the alarm
         Through every Middlesex, village and farm,
   </Para>
   <Para classification="secret">
         For the country folk to be up and to arm.
   </Para>
   <Hash>304</Hash>
</Document>

-----------------------------------------------------------------------
--

DATA REQUIREMENTS

1. ** DOCUMENT ORGANIZATION **

1.1 The document is comprised of one or more paragraphs.

1.2 Each paragraph is labeled with a classification, which can be one
of top-secret, secret, confidential, or unclassified.

1.3 A paragraph's text must not exceed 200 characters in length, and
shall be comprised of only these characters: a-z, A-Z, 0-9, whitespace,
comma, period, colon, semi-colon.

1.4 The document has an overall classification, which can also be one
of top-secret, secret, confidential, or unclassified.

1.5 The information in the document may be ordered in any way the
author sees fit.

2. ** SECURITY CLASSIFICATION POLICY **

2.1 No paragraph may have a classification higher than the overall
document classification.

3. ** RESERVED WORD FILTER **

3.1 No paragraph may contain these reserved words: SCRIPT, FUNCTION.

4. ** DATA INTEGRITY CHECKS **

4.1 The document must contain a count of the number of paragraphs in
the document, and that count must match the actual number of
paragraphs.

4.2 The document must contain a hashcode, and that hashcode must match
the hash of the document.

5. ** ACCREDITATION **

5.1 For accreditation purposes an implementation of any one of these
requirements must reference the specific requirement that it is
implementing.

6. ** FUTURE REQUIREMENTS **

6.1 Additional future requirements must be backward and forward
compatible.

7. ** VALIDATION IN STAGES **

7.1 It must be possible to validate the data in stages, e.g. check the
data against the security policy and only perform the other checks if
it succeeds.

8. ** XML GRAMMAR **

8.1 The root element is <Document>.

8.2 <Document> has one attribute, classification, whose value can be
one of top-secret, secret, confidential, or unclassified.

8.3 <Document> is comprised of one <NumParas>, one or more <Para>, and
one <Hash>.

8.3.1 These child elements may occur in any order.

8.4 Each <Para> has one attribute, classification, whose value can be
one of top-secret, secret, confidential, or unclassified.

8.5 The value of each <Para> is a string, constrained to a maximum of
200 characters, comprised of only these characters: a-z, A-Z, 0-9,
whitespace, comma, period, colon, semi-colon.

8.6 The value of <NumParas> is a nonNegativeInteger.

8.7 The value of <Hash> is a long.

-----------------------------------------------------------------------
--

XML SCHEMA IMPLEMENTATION

<?xml version="1.0"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema "
           elementFormDefault="qualified">

   <xs:element name="Document">
       <xs:complexType>
           <xs:sequence>
               <xs:element name="NumParas"
type="xs:nonNegativeInteger"/>
               <xs:element name="Para" maxOccurs="unbounded">
                   <xs:complexType>
                       <xs:simpleContent>
                           <xs:extension base="paraType">
                               <xs:attribute name="classification"
type="classificationLevels" use="required"/>
                           </xs:extension>
                       </xs:simpleContent>
                   </xs:complexType>
               </xs:element>
               <xs:element name="Hash" type="xs:long"/>
           </xs:sequence>
           <xs:attribute name="classification"
type="classificationLevels" use="required"/>
       </xs:complexType>
   </xs:element>
   <xs:simpleType name="classificationLevels">
       <xs:restriction base="xs:string">
           <xs:enumeration value="top-secret"/>
           <xs:enumeration value="secret"/>
           <xs:enumeration value="confidential"/>
           <xs:enumeration value="unclassified"/>
       </xs:restriction>
   </xs:simpleType>
   <xs:simpleType name="paraType">
       <xs:restriction base="xs:string">
           <xs:maxLength value="200"/>
           <xs:pattern value="[\sa-zA-Z0-9,;:\.]*"/>
       </xs:restriction>
   </xs:simpleType>
</xs:schema>

-----------------------------------------------------------------------
--

SCHEMATRON IMPLEMENTATION

<?xml version="1.0"?>
<sch:schema xmlns:sch=" http://purl.oclc.org/dsdl/schematron"
           xmlns:xs=" http://www.w3.org/2001/XMLSchema"
           queryBinding="xslt2">

  <sch:let name="document-classification"
           value="/Document/@classification" />

  <sch:pattern id="SECURITY-CLASSIFICATION-POLICY">

     <sch:p>A Para's classification value cannot be more sensitive
            than the Document's classification value.</sch:p>

     <sch:rule context="Para[@classification='top-secret']">

        <sch:assert test="$document-classification='top-secret'"
                    see="Data Requirement 2.1">
            If there is a Para labeled "top-secret" then the Document

            must be labeled top-secret
        </sch:assert>

     </sch:rule>

     <sch:rule context="Para[@classification='secret']">

        <sch:assert test="($document-classification='top-secret') or
                          ($document-classification='secret')"
                    see="Data Requirement 2.1">
            If there is a Para labeled "secret" then the Document
            must be labeled either secret or top-secret
        </sch:assert>

     </sch:rule>

     <sch:rule context="Para[@classification='confidential']">

        <sch:assert test="($document-classification='top-secret') or
                          ($document-classification='secret') or
                          ($document-classification='confidential')"
                    see="Data Requirement 2.1">
            If there is a Para labeled "confidential" then the
Document
            must be labeled either confidential, secret or top-secret
        </sch:assert>

     </sch:rule>

  </sch:pattern>

  <sch:pattern id="RESERVED-WORD-FILTER">

     <sch:p>These reserved words are not allowed anywhere in the
            document: SCRIPT, FUNCTION.</sch:p>

     <sch:rule context="Document">

        <sch:assert test="count(//node()[contains(.,'SCRIPT')]) = 0
                          and
                          count(//node()[contains(.,'FUNCTION')]) = 0"
                    see="Data Requirement 3.1">
            The document must not contain the words SCRIPT or FUNCTION
        </sch:assert>

     </sch:rule>

  </sch:pattern>

  <sch:pattern id="DATA-INTEGRITY-CHECKS">

     <sch:p>The count value in the NumParas element must match
            a count of the actual number of Para elements.
            And the hash value must match the value obtained
            by recomputing the hash on the current document.</sch:p>

     <sch:rule context="NumParas">

        <sch:assert test=". = ../count(Para)"
                    see="Data Requirement 4.1">
            The count value in this element must match
            a count of the actual number of Para elements
        </sch:assert>

     </sch:rule>

     <sch:rule context="Hash">

        <sch:assert test=". = sum(for $i in //*[not(*) and
not(self::Hash)] return
                                      for $j in string-length($i)
return $j)"
                    see="Data Requirement 4.2">
            The value of Hash must match the value that is obtained
            by recomputing the hash algorithm on the current document
        </sch:assert>

     </sch:rule>

  </sch:pattern>

  <sch:pattern id="GRAMMAR">

     <sch:rule context="/*">

        <sch:assert test="name() = 'Document'"
                    see="Data Requirement 8.1">
            The root element must be Document
        </sch:assert>

        <sch:assert test="(count(@*) = 1) and
                          (name(@*) = 'classification') and
                          (($document-classification='top-secret') or
                          ($document-classification='secret') or
                          ($document-classification='confidential') or
                          ($document-classification='unclassified'))"
                    see="Data Requirement 8.2">
            Document has one attribute, classification, whose value
can
            be one of top-secret, secret, confidential, or
unclassified.
        </sch:assert>

     </sch:rule>

     <sch:rule context="/Document">

        <sch:assert test="(count(NumParas) = 1) and
                          (count(Para) >= 1) and
                          (count(Hash) = 1) and
                          (count(*[name() !='NumParas' and
                                   name() != 'Paras' and
                                   name() != 'Hash']) = 0)"
                    see="Data Requirement 8.3 and 8.3.1">
            Document is comprised of one NumParas, one or more Para,
and
            one Hash. These child elements may occur in any order.
        </sch:assert>

     </sch:rule>

     <sch:rule context="Para">

        <sch:assert test="(count(@*) = 1) and
                          (name(@*) = 'classification') and
                          (($document-classification='top-secret') or
                          ($document-classification='secret') or
                          ($document-classification='confidential') or
                          ($document-classification='unclassified'))"
                    see="Data Requirement 8.4 ">
            Para has one attribute, classification, whose value can be

            one of top-secret, secret, confidential, or unclassified.
        </sch:assert>

        <sch:assert test="(. castable as xs:string) and
                          (string-length <= 200) and
                          (matches(.,'[\sa-zA-Z0-9,;:\.]*))"
                    see="Data Requirement 8.5">
            The value of a Para is a string, constrained to a maximum
of
            200 characters, comprised of only these characters: a-z,
A-Z,
            0-9, comma, period, colon, semi-colon.
        </sch:assert>

     </sch:rule>

     <sch:rule context="NumParas">

        <sch:assert test=". castable as xs:nonNegativeInteger"
                    see="Data Requirement 8.6">
            The value of NumParas is a nonNegativeInteger.
        </sch:assert>

     </sch:rule>

     <sch:rule context="Hash">

        <sch:assert test=". castable as xs:long"
                    see="Data Requirement 8.7">
            The value of Hash is a long.
        </sch:assert>

     </sch:rule>

  </sch:pattern>

</sch:schema>

_______________________________________________________________________

XML-DEV is a publicly archived, unmoderated list hosted by OASIS
to support XML implementation and development. To minimize
spam in the archives, you must subscribe before posting.

[Un]Subscribe/change address: http://www.oasis-open.org/mlmanage/
Or unsubscribe: xml-dev-unsubscribe@lists.xml.org
subscribe: xml-dev-subscribe@lists.xml.org
List archive: http://lists.xml.org/archives/xml-dev/
List Guidelines: http://www.oasis-open.org/maillists/guidelines.php