Hi Folks, As you know, a tree data structure can be represented in XML using parent-child relationships. Example: This tree data structure: can be represented in XML using these parent-child relationships:
<Books>
You might create an XML Schema to validate the XML.
But suppose that later you want the tree to look like this:
Or suppose you want the tree to categorize the books based on price:
Suddenly the XML Schema you created is obsolete. Every time you change the tree, you have to change the schema. Representing trees using parent-child relationships is inflexible. Alternatively, trees can be represented using a binary relation. Example: This tree: can be represented in XML using this binary relation:
<Document>
How do we validate that the binary relation represents a tree?
A binary relation represents a tree data structure iff the relation satisfies these constraints:
1. The root node is in the binary relation. In the example, the root node is Book.
2. There is no node above the root node (no node maps to the root). There is no <to> element whose contents is Book.
3. All nodes can be reached from the root node. From
Book we can reach Technical and
Non-Technical, from
Technical we can reach XML and
JSON, from
XML we can reach Definitive XML, from
JSON we can reach
Beginning JSON, and from Non-Technical we can reach
Illusions. Thus,
Book can reach all nodes.
4. No nodes map to itself (irreflexive).
5. No cycles.
6. All nodes are distinct (injective).
These constraints are easily expressed using Schematron (exercise left to the reader).
The tree is easily changed: simply change the binary relation. No change to Schematron needed.
Lesson Learned: The binary relation representation is a flexible way to represent trees.
Caveat: Of course, it is not wise to use binary relations exclusively. That would defeat the purpose of XML. Parent-child relationships are very useful. So, you might have a part of your XML document where great flexibility is needed. That would be a good use
of binary relations – embedded within a larger XML document. The binary relation contains volatile stuff; the other part contains the non-volatile stuff. This way you minimize the need for schema changes.
Comments?
/Roger |