Hi Folks,
Designing data interchange formats can be quick, easy, and inexpensive.
The comments made in this message are intended for this environment:
The XML is distributed to a broad spectrum of consumers,
each consumer might perform widely different operations
on the XML.
I recommend making flat XML. Design your XML to just have a root element, containing a linear sequence of elements.
Be a markup minimalist.
Here are the reasons for my recommendation:
1. Consumers of flat XML can apply powerful parsing techniques to the linear sequence of elements. Thus, consumers can add whatever structure is appropriate for their particular applications to process the data efficiently.
2. XML Schema design and implementation of flat XML is trivial: simply create an XML Schema with a sequential list of element declarations.
3. If, on the other hand, you were to design the XML to have lots of structure, it is a near certainty that the structure will not be suitable for many of your consumer's applications. Further, its structure is likely to hamper powerful parsing techniques.
The "flatter is better" philosophy may be summarized this way:
I can't predict how my consumers will want the data structured,
so I won't try to predict. I will let them apply their own structure
to the data.
Let's take an example. Suppose that you want to model a grape vineyard, with pickers scattered about on the various lots. The following XML is not flat. It's probably a design that most people would come up with. I assert it is a bad design.
<Vineyard>
<Lot id="1">
<ripe-grapes>4</ripe-grapes>
<Picker id="John">
<metabolism>2</metabolism>
<grape-wealth>20</grape-wealth>
</Picker>
</Lot>
<Lot id="2">
<ripe-grapes>3</ripe-grapes>
</Lot>
...
</Vineyard>
That design is well-suited to operations such as this:
What Pickers are on lot 23?
But it is horrible for operations such as this:
Move Picker John to Lot 2.
Don't design XML that way. Design XML to be flat, like this:
<Vineyard>
<lot-number>1</lot-number>
<ripe-grapes>4</ripe-grapes>
<picker>John</picker>
<metabolism>2</metabolism>
<grape-wealth>20</grape-wealth>
<lot-number>1</lot-number>
<ripe-grapes>3</ripe-grapes>
...
</Vineyard>
That's a beautiful design. It enables powerful parsing techniques to be applied to it. For instance, one consumer may parse it to generate the above structuring. Another consumer may parse it to generate this radically different structuring:
<Lot id="1">
<ripe-grapes>4</ripe-grapes>
</Lot>
<Lot id="2">
<ripe-grapes>3</ripe-grapes>
</Lot>
<Picker id="John" locatedOn="1">
<metabolism>2</metabolism>
<grape-wealth>20</grape-wealth>
</Picker>
...
And another consumer may parse it to generate still another structuring.
Each consumer parses the flat XML to create a structuring that is well-suited to their particular application processing.
Are you creating XML "for the long haul"?
Are you creating XML "for a broad, diverse set of clients"?
Are you a manager and don't want to dump a lot of time and money into creating "the perfect XML Schema design"?
Then create flat XML.
Flatter is better!
For more info on parsing flat XML see my recent posts:
Recursive Descent Parsing for XML Developers: http://lists.xml.org/archives/xml-dev/201410/msg00017.html
Bottom-up Parsing for XML Developers: http://lists.xml.org/archives/xml-dev/201409/msg00016.html
Comments welcome.
/Roger