Hi Folks,
For many years, interoperability of a certain XML Schema was achieved only by twice-yearly “bake-offs.” A bake-off was an event at which engineers from various vendors gathered in one room, with all their equipment, to test and re-program until their equipment interoperated on the test cases. Even for the baseline schema (the schema before it was extended), engineers would spend many hours trying to get the answers to simple questions such as, “Can an application endpoint in state S send an XML message of type m?” Engineers would search the schema and other documents for clues, and argue the meanings of elements and attributes like Biblical scholars. Certainty was rarely achieved.
XML Schemas are only slightly more formal than natural language descriptions. Users are right to distrust XML Schemas.
Use Schematron to formally define key concepts.
Let’s take an example. Consider the following excerpt from an XML Schema for a network protocol.
<xs:element name="node">
<xs:complexType>
<xs:sequence>
<!-- node stuff -->
</xs:sequence>
<xs:attribute name="following" type="nodeId" use="required" />
<xs:attribute name="id" type="nodeId" use="required" />
</xs:complexType>
</xs:element>
The schema states that a node can follow another node. Follow in what sense?
If “following” means following pointers, then the statement is a tautology. If “following” means following in integer order on node identifiers, then it is more meaningful but still wrong – the successor of node 40 may be 5, which does not follow it in integer order (node identifiers come from a bounded set of natural numbers, and the identifiers wrap around from the highest number to zero).
In fact, it is not useful to define “following” in such an ordering because every identifier follows (and precedes) every other identifier. A more useful concept is that of “between,” defined by the following Schematron predicate (XSLT functions may be embedded in Schematron):
<!--
The predicate Between is true if and only if
argument n2 lies between arguments n1 and n3
--><xsl:function name="pred:Between">
<xsl:param name="n1" />
<xsl:param name="n2" />
<xsl:param name="n3" />
<xsl:choose>
<xsl:when test="number($n1) lt number($n3)">
<xsl:if test="(number($n1) lt number($n2)) and
(number($n2) lt number($n3))">true</xsl:if>
</xsl:when>
<xsl:otherwise>
<xsl:if test="(number($n1) lt number($n2)) or
(number($n2) lt number($n3))">true</xsl:if>
</xsl:otherwise>
</xsl:choose>
</xsl:function>
Here is a valid XML instance document:
<network>
<node id="5" following="29"/>
<node id="29" following="40"/>
<node id="40" following="5"/>
</network>
For each node n1, there is no node between n1 and n1.following. For example, following node 5 is 29, and there are no nodes between them. Following node 40 is 5, and there are no nodes between them (this might seem counterintuitive, but check the Between predicate and you will see that 29 is not between 40 and 5. “Intuition” is often wrong – another reason that formal definitions are needed).
This is an invalid XML instance document:
<network>
<node id="5" following="40"/>
<node id="29" following="40"/>
<node id="40" following="5"/>
</network>
Following node 5 is 40, but there is a node between them (node 29).
We are ready to formally define the “following” concept:
<!--
Formal definition of the "following" concept.
let nodes = { nodes in the network }
all disj n1, n2, n3: node |
n2 = n1.following
=> ! Between[n1,n3,n2]
-->
<sch:pattern id="Formal-Definition-of-following" >
<sch:rule context="network">
<sch:let name="nodes" value="node"/>
<sch:assert test="
every $n1 in $nodes, $n2 in $nodes, $n3 in $nodes satisfies
if (pred:Disjoint($n1, $n2, $n3)) then
if (number($n2/@id) eq number($n1/@following)) then
not(pred:Between($n1/@id, $n3/@id, $n2/@id))
else true()
else true()
">
No third node falls between a node and its following node.
</sch:assert>
</sch:rule>
</sch:pattern>
“If n1 and n2 are distinct network nodes, and n2 is the successor of n1, then no third network node falls between them.”
An XML Schema without a Schematron schema is dangerously ambiguous.
/Roger
P.S.#1 Everything said about the dangerous ambiguity of XML Schemas also applies to UML.
P.S.#2 Acknowledgement: Some of the ideas presented herein, even some sentences, come from this fantastic paper by Pamela Zave (AT&T Labs, Princeton University): http://web2.research.att.com/
export/sites/att_labs/people/ Zave_Pamela/custom/wripe.pdf
Here is the complete Schematron schema:
<sch:schema xmlns:sch="http://purl.oclc.
org/dsdl/schematron "
xmlns:xsl="http://www.w3.org/1999/XSL/Transform "
queryBinding="xslt2">
<sch:ns uri="predicate" prefix="pred"/>
<!--
The predicate Between is true if and only if
argument n2 lies between arguments n1 and n3
-->
<xsl:function name="pred:Between">
<xsl:param name="n1" />
<xsl:param name="n2" />
<xsl:param name="n3" />
<xsl:choose>
<xsl:when test="number($n1) lt number($n3)">
<xsl:if test="(number($n1) lt number($n2)) and
(number($n2) lt number($n3))" >true</xsl:if>
</xsl:when>
<xsl:otherwise>
<xsl:if test="(number($n1) lt number($n2)) or
(number($n2) lt number($n3))" >true</xsl:if>
</xsl:otherwise>
</xsl:choose>
</xsl:function>
<!--
The predicate Disjoint is true if and only if
all three arguments are different
-->
<xsl:function name="pred:Disjoint">
<xsl:param name="n1" />
<xsl:param name="n2" />
<xsl:param name="n3" />
<xsl:choose>
<xsl:when test="$n1 is $n2" />
<xsl:when test="$n1 is $n3" />
<xsl:when test="$n2 is $n3" />
<xsl:otherwise>true</xsl:otherwise>
</xsl:choose>
</xsl:function>
<!--
Formal definition of the "following" concept.
let nodes = { nodes in the network }
all disj n1, n2, n3: node |
n2 = n1.following
=> ! Between[n1,n3,n2]
-->
<sch:pattern id="Formal-Definition-of-following" >
<sch:rule context="network">
<sch:let name="nodes" value="node"/>
<sch:assert test="
every $n1 in $nodes, $n2 in $nodes, $n3 in $nodes satisfies
if (pred:Disjoint($n1, $n2, $n3)) then
if (number($n2/@id) eq number($n1/@following)) then
not(pred:Between($n1/@id, $n3/@id, $n2/@id))
else true()
else true()
">
No third node falls between a node and its following node.
</sch:assert>
</sch:rule>
</sch:pattern>
</sch:schema>
Here is the XML Schema:
<xs:schema xmlns:xs="http://www.w3.org/
2001/XMLSchema ">
<xs:element name="network">
<xs:complexType>
<xs:sequence>
<xs:element ref="node" maxOccurs="unbounded" />
</xs:sequence>
</xs:complexType>
</xs:element>
<xs:element name="node">
<xs:complexType>
<xs:sequence>
<!-- node stuff -->
</xs:sequence>
<xs:attribute name="following" type="nodeId" use="required" />
<xs:attribute name="id" type="nodeId" use="required" />
</xs:complexType>
</xs:element>
<xs:simpleType name="nodeId">
<xs:restriction base="xs:unsignedByte"/>
</xs:simpleType>
</xs:schema>