Hi
Folks,
For the past 4
months I have been working on a demo of a Vineyard in which Pickers move around,
harvest ripe grapes, eat, and even die. In the process of building this
demo I have learned some things regarding XML design, which I would like to
share.
When I first started
working on the demo I thought that the best way to design the XML for the
Vineyard/Picker system was a "classical" highly structured, hierarchical
design. In fact, this turned out to be the worst approach. It was
rigid, it made processing the information (e.g., moving the Pickers around,
harvesting ripe grapes, eating, death) horribly complex, and I wanted to be able
to process the Vineyard lots in a parallel fashion, which this design totally
prohibited. Here was my first design:
<vineyard>
<tract num="1">
<lot
num="1">
...
</lot>
<lot
num="2">
...
<picker id="36"> <!-- Picker #36 on lot #2, tract #1
-->
...
</picker>
</lot>
...
<lot
num="50">...</lot>
</tract>
...
<tract num="50">...</tract>
</vineyard>
As you can see, this
design is classical structured
data:
- the vineyard is comprised of
multiple tracts
- each tract is comprised of
multiple lots
- a lot may contain a
picker
Several thousand
lines of XSLT code later I decided it was time to dump this
design.
My next design
"flattened" things out a bit. I put the Pickers physically after the
tracts, and each Picker referenced the tract/lot that they resided
upon using a couple of "ref attributes". This made "moving" the Pickers
easy - simply adjust the references. Here was my second
design:
<vineyard>
<tract num="1">...</tract>
<tract num="2">...</tract>
...
<tract num="50">...</tract>
<picker id="1">
<location
tract-ref="13" lot-ref="48"/>
...
</picker>
...
<picker id="400">
<location tract-ref="21"
lot-ref="4"/>
...
</picker>
</vineyard>
With this design my
XSLT code dropped from several thousand lines to about a thousand lines.
However, this design was still too rigid, and made parallel processing of the
lots impossible.
Here is the design
that I finally arrived at. It is extremely flexible, amenable to parallel
processing, and the code to manipulate it is very simple (a couple hundred lines
of simple XSLT code).
<vineyard>
<lot tract-num="23" lot-num="5">...</lot>
<picker id="36">
<location
tract-ref="12" lot-ref="29"/>
...
</picker>
<lot tract-num="3"
lot-num="24">...</lot>
...
<lot tract-num="1" lot-num="49">...</lot>
</vineyard>
The lots have 2 attributes to identify their
location.
Each picker has a location element that has
2 attributes to identify the lot it resides
on.
Notice that it is an
extremely flat structure:
- a vineyard is comprised of lots and
pickers (no more <tract> elements)
Notice that it is an
extremely flexible structure:
- the order of the lots and pickers is
irrelevant
With this design I
can now process each lot on the vineyard in parallel. The other designs
forced a sequential processing.
Here are some
lessons I learned. I believe these lessons apply to all XML information
structures where you have a requirement to evolve the information structure by
moving the information (e.g., move the Picker around to different lots),
changing the information values (e.g., a Pickers harvests ripe grapes, thereby
decreasing the value of <ripe-grapes> on a lot), and where parallel
processing of the information is desired/needed. I don't know if these
lessons apply everywhere.
1. How you structure your information in XML has a
tremendous impact on the processing of the
information.
2. Hierarchy makes processing information
hard! There exists a relationship between hierarchy of information
and the complexity of code to process the information. The relationship is
roughly: the greater the hierarchy, the greater the complexity of code
to process the information (Some hierarchy is good, of course. But
the amount of hierarchy that is good is probably much less than one might
imagine, certainly less than I thought, as described
above.)
3. Flat data is good data! Flatten out the
hierarchy of your data. It makes the information flexible and easier to
process.
4. Order hurts! Requiring a strict order
of the information makes for a brittle design. It is only when I allowed
the lots and pickers to occur in any order that the flexibility and simplicity
kicked in.
Comments? /Roger