Hi Folks,
For many years, interoperability of a certain XML Schema was achieved only by twice-yearly “bake-offs.” A bake-off was an event at which engineers from various vendors gathered
in one room, with all their equipment, to test and re-program until their equipment interoperated on the test cases. Even for the baseline schema (the schema before it was extended), engineers would spend many hours trying to get the answers to simple questions
such as, “Can an application endpoint in state S send an XML message of type
m?” Engineers would search the schema and other documents for clues, and argue the meanings of elements and attributes like Biblical scholars. Certainty was rarely achieved.
|
XML Schemas are only slightly more formal than natural language descriptions. Users are right to distrust XML Schemas.
Use Schematron to formally define key concepts.
Let’s take an example. Consider the following excerpt from an XML Schema for a network protocol.
<xs:element
name="node">
<xs:complexType>
<xs:sequence>
<!-- node stuff
-->
</xs:sequence>
<xs:attribute
name="following"
type="nodeId"
use="required"
/>
<xs:attribute
name="id"
type="nodeId"
use="required"
/>
</xs:complexType>
</xs:element>
The schema states that a node can follow another node. Follow in what sense?
If “following” means following pointers, then the statement is a tautology. If “following” means following in integer order on node identifiers, then it is more meaningful
but still wrong – the successor of node 40 may be 5, which does not follow it in integer order (node identifiers come from a bounded set of natural numbers, and the identifiers wrap around from the highest number to zero).
In fact, it is not useful to define “following” in such an ordering because every identifier follows (and precedes) every other identifier. A more useful concept is that
of “between,” defined by the following Schematron predicate (XSLT functions may be embedded in Schematron):
<!--
The predicate
Between is true if and only if
argument
n2 lies between arguments n1 and n3
-->
<xsl:function
name="pred:Between">
<xsl:param
name="n1"
/>
<xsl:param
name="n2"
/>
<xsl:param
name="n3"
/>
<xsl:choose>
<xsl:when
test="number($n1)
lt number($n3)">
<xsl:if
test="(number($n1)
lt number($n2)) and
(number($n2) lt number($n3))">true</xsl:if>
</xsl:when>
<xsl:otherwise>
<xsl:if
test="(number($n1)
lt number($n2)) or
(number($n2) lt number($n3))">true</xsl:if>
</xsl:otherwise>
</xsl:choose>
</xsl:function>
Here is a valid XML instance document:
<network>
<node
id="5"
following="29"/>
<node
id="29"
following="40"/>
<node
id="40"
following="5"/>
</network>
For each node
n1, there is no node between n1 and n1.following. For example, following node
5 is 29, and there are no nodes between them. Following node 40 is
5, and there are no nodes between them (this might seem counterintuitive, but check the
Between predicate and you will see that 29 is not between 40 and
5. “Intuition” is often wrong – another reason that formal definitions are needed).
This is an invalid XML instance document:
<network>
<node
id="5"
following="40"/>
<node
id="29"
following="40"/>
<node
id="40"
following="5"/>
</network>
Following node
5 is 40, but there is a node between them (node 29).
We are ready to formally define the “following” concept:
<!--
Formal definition of the "following" concept.
let nodes = { nodes in the network }
all disj n1, n2, n3: node |
n2 = n1.following
=> ! Between[n1,n3,n2]
-->
<sch:pattern
id="Formal-Definition-of-following">
<sch:rule
context="network">
<sch:let
name="nodes"
value="node"/>
<sch:assert
test="
every $n1 in $nodes, $n2 in $nodes, $n3 in $nodes satisfies
if (pred:Disjoint($n1, $n2, $n3)) then
if (number($n2/@id) eq number($n1/@following)) then
not(pred:Between($n1/@id, $n3/@id, $n2/@id))
else true()
else true()
">
No third node falls between a node and its following node.
</sch:assert>
</sch:rule>
</sch:pattern>
“If
n1 and n2 are distinct network nodes, and n2 is the successor of
n1, then no third network node falls between them.”
An XML Schema without a Schematron schema is dangerously ambiguous.
/Roger
P.S.#1 Everything said about the dangerous ambiguity of XML Schemas also applies to UML.
P.S.#2 Acknowledgement: Some of the ideas presented herein, even some sentences, come from this fantastic paper by Pamela Zave (AT&T Labs, Princeton University):
http://web2.research.att.com/export/sites/att_labs/people/Zave_Pamela/custom/wripe.pdf
Here is the complete Schematron schema:
<sch:schema
xmlns:sch="http://purl.oclc.org/dsdl/schematron"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
queryBinding="xslt2">
<sch:ns
uri="predicate"
prefix="pred"/>
<!--
The predicate Between is true if and only if
argument n2 lies between arguments n1 and n3
-->
<xsl:function
name="pred:Between">
<xsl:param
name="n1"
/>
<xsl:param
name="n2"
/>
<xsl:param
name="n3"
/>
<xsl:choose>
<xsl:when
test="number($n1)
lt number($n3)">
<xsl:if
test="(number($n1)
lt number($n2)) and
(number($n2) lt number($n3))">true</xsl:if>
</xsl:when>
<xsl:otherwise>
<xsl:if
test="(number($n1)
lt number($n2)) or
(number($n2) lt number($n3))">true</xsl:if>
</xsl:otherwise>
</xsl:choose>
</xsl:function>
<!--
The predicate Disjoint is true if and only if
all three arguments are different
-->
<xsl:function
name="pred:Disjoint">
<xsl:param
name="n1"
/>
<xsl:param
name="n2"
/>
<xsl:param
name="n3"
/>
<xsl:choose>
<xsl:when
test="$n1
is $n2"
/>
<xsl:when
test="$n1
is $n3"
/>
<xsl:when
test="$n2
is $n3"
/>
<xsl:otherwise>true</xsl:otherwise>
</xsl:choose>
</xsl:function>
<!--
Formal definition of the "following" concept.
let nodes = { nodes in the network }
all disj n1, n2, n3: node |
n2 = n1.following
=> ! Between[n1,n3,n2]
-->
<sch:pattern
id="Formal-Definition-of-following">
<sch:rule
context="network">
<sch:let
name="nodes"
value="node"/>
<sch:assert
test="
every $n1 in $nodes, $n2 in $nodes, $n3 in $nodes satisfies
if (pred:Disjoint($n1, $n2, $n3)) then
if (number($n2/@id) eq number($n1/@following)) then
not(pred:Between($n1/@id, $n3/@id, $n2/@id))
else true()
else true()
">
No third node falls between a node and its following node.
</sch:assert>
</sch:rule>
</sch:pattern>
</sch:schema>
Here is the XML Schema:
<xs:schema
xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:element
name="network">
<xs:complexType>
<xs:sequence>
<xs:element
ref="node"
maxOccurs="unbounded"
/>
</xs:sequence>
</xs:complexType>
</xs:element>
<xs:element
name="node">
<xs:complexType>
<xs:sequence>
<!-- node stuff -->
</xs:sequence>
<xs:attribute
name="following"
type="nodeId"
use="required"
/>
<xs:attribute
name="id"
type="nodeId"
use="required"
/>
</xs:complexType>
</xs:element>
<xs:simpleType
name="nodeId">
<xs:restriction
base="xs:unsignedByte"/>
</xs:simpleType>
</xs:schema>