Use Schematron to formally define key concepts

Hi Folks,

For many years, interoperability of a certain XML Schema was achieved only by twice-yearly “bake-offs.” A bake-off was an event at which engineers from various vendors gathered in one room, with all their equipment, to test and re-program until their equipment interoperated on the test cases. Even for the baseline schema (the schema before it was extended), engineers would spend many hours trying to get the answers to simple questions such as, “Can an application endpoint in state S send an XML message of type m?” Engineers would search the schema and other documents for clues, and argue the meanings of elements and attributes like Biblical scholars. Certainty was rarely achieved.

XML Schemas are only slightly more formal than natural language descriptions. Users are right to distrust XML Schemas.

Use Schematron to formally define key concepts.

Let’s take an example. Consider the following excerpt from an XML Schema for a network protocol.

<xs:element name="node">
    <xs:complexType>
        <xs:sequence>
            
        </xs:sequence>
        <xs:attribute name="following" type="nodeId" use="required" />
        <xs:attribute name="id" type="nodeId" use="required" />
    </xs:complexType>
</xs:element>

The schema states that a node can follow another node. Follow in what sense?

If “following” means following pointers, then the statement is a tautology. If “following” means following in integer order on node identifiers, then it is more meaningful but still wrong – the successor of node 40 may be 5, which does not follow it in integer order (node identifiers come from a bounded set of natural numbers, and the identifiers wrap around from the highest number to zero).

In fact, it is not useful to define “following” in such an ordering because every identifier follows (and precedes) every other identifier. A more useful concept is that of “between,” defined by the following Schematron predicate (XSLT functions may be embedded in Schematron):

<xsl:function name="pred:Between">
    <xsl:param name="n1" />
    <xsl:param name="n2" />
    <xsl:param name="n3" />

    <xsl:choose>
        <xsl:when test="number($n1) lt number($n3)">
            <xsl:if test="(number($n1) lt number($n2)) and
                                 (number($n2) lt number($n3))">true</xsl:if>
        </xsl:when>
        <xsl:otherwise>
            <xsl:if test="(number($n1) lt number($n2)) or
                                 (number($n2) lt number($n3))">true</xsl:if>
        </xsl:otherwise>
    </xsl:choose>

</xsl:function>

Here is a valid XML instance document:

For each node n1, there is no node between n1 and n1.following. For example, following node 5 is 29, and there are no nodes between them. Following node 40 is 5, and there are no nodes between them (this might seem counterintuitive, but check the Between predicate and you will see that 29 is not between 40 and 5. “Intuition” is often wrong – another reason that formal definitions are needed).

This is an invalid XML instance document:

Following node 5 is 40, but there is a node between them (node 29).

We are ready to formally define the “following” concept:

<sch:pattern id="Formal-Definition-of-following">
    <sch:rule context="network">
        <sch:let name="nodes" value="node"/>
        <sch:assert test="
            every $n1 in $nodes, $n2 in $nodes, $n3 in $nodes satisfies
                if (pred:Disjoint($n1, $n2, $n3)) then
                    if (number($n2/@id) eq number($n1/@following)) then
                        not(pred:Between($n1/@id, $n3/@id, $n2/@id))
                    else true()
                else true()
            ">
            No third node falls between a node and its following node.
        </sch:assert>
    </sch:rule>
</sch:pattern>

“If n1 and n2 are distinct network nodes, and n2 is the successor of n1, then no third network node falls between them.”

An XML Schema without a Schematron schema is dangerously ambiguous.

/Roger

P.S.#1 Everything said about the dangerous ambiguity of XML Schemas also applies to UML.

P.S.#2 Acknowledgement: Some of the ideas presented herein, even some sentences, come from this fantastic paper by Pamela Zave (AT&T Labs, Princeton University): http://web2.research.att.com/export/sites/att_labs/people/Zave_Pamela/custom/wripe.pdf

Here is the complete Schematron schema:

<sch:schema xmlns:sch="http://purl.oclc.org/dsdl/schematron"
            xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
            queryBinding="xslt2">

    <sch:ns uri="predicate" prefix="pred"/>

    
    <xsl:function name="pred:Between">
        <xsl:param name="n1" />
        <xsl:param name="n2" />
        <xsl:param name="n3" />

        <xsl:choose>
            <xsl:when test="number($n1) lt number($n3)">
                <xsl:if test="(number($n1) lt number($n2)) and
                                    (number($n2) lt number($n3))">true</xsl:if>
            </xsl:when>
            <xsl:otherwise>
                <xsl:if test="(number($n1) lt number($n2)) or
                                    (number($n2) lt number($n3))">true</xsl:if>
            </xsl:otherwise>
        </xsl:choose>

    </xsl:function>

    
    <xsl:function name="pred:Disjoint">
        <xsl:param name="n1" />
        <xsl:param name="n2" />
        <xsl:param name="n3" />

        <xsl:choose>
            <xsl:when test="$n1 is $n2" />
            <xsl:when test="$n1 is $n3" />
            <xsl:when test="$n2 is $n3" />
            <xsl:otherwise>true</xsl:otherwise>
        </xsl:choose>

    </xsl:function>

    
    <sch:pattern id="Formal-Definition-of-following">
        <sch:rule context="network">
            <sch:let name="nodes" value="node"/>
            <sch:assert test="
                every $n1 in $nodes, $n2 in $nodes, $n3 in $nodes satisfies
                    if (pred:Disjoint($n1, $n2, $n3)) then
                        if (number($n2/@id) eq number($n1/@following)) then
                            not(pred:Between($n1/@id, $n3/@id, $n2/@id))
                        else true()
                    else true()
                ">
                No third node falls between a node and its following node.
            </sch:assert>
        </sch:rule>
    </sch:pattern>

</sch:schema>

Here is the XML Schema:

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">

    <xs:element name="network">
        <xs:complexType>
            <xs:sequence>
                <xs:element ref="node" maxOccurs="unbounded" />
            </xs:sequence>
        </xs:complexType>
    </xs:element>

    <xs:element name="node">
        <xs:complexType>
            <xs:sequence>
                
            </xs:sequence>
            <xs:attribute name="following" type="nodeId" use="required" />
            <xs:attribute name="id" type="nodeId" use="required" />
        </xs:complexType>
    </xs:element>

    <xs:simpleType name="nodeId">
        <xs:restriction base="xs:unsignedByte"/>
    </xs:simpleType>

</xs:schema>