Re: [xml-dev] XML data interchange format: Flatter is better

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]

From: Thomas Passin <list1@tompassin.net>
To: "xml-dev@lists.xml.org" <xml-dev@lists.xml.org>
Date: Wed, 29 Oct 2014 10:59:54 -0600

On 10/27/2014 1:37 PM, Costello, Roger L. wrote:

Hi Folks,

Designing data interchange formats can be quick, easy, and
inexpensive.

The comments made in this message are intended for this environment:

The XML is distributed to a broad spectrum of consumers, each
consumer might perform widely different operations on the XML.

I recommend making flat XML. Design your XML to just have a root
element, containing a linear sequence of elements.

In my experience, working with a flat list of data elements like this is much harder than you make out. That's because you have to create and track a lot of state for yourself that otherwise the xml (or xslt) parser could do for you. You have to figure out, and write code for, the start and end of grouped sections. In your vineyard example, "lot 1" data ends when you spot the next <lot-number> element. Nothing else indicates what lot the ripe-grapes, picker, etc are associated with.

So the first thing your code is probably going to do is to associate the lots with their ripe-grapes, etc.

If you don't need such an association, if you don't need to associate them with their lots, you can still pull out that information with ease from more hierarchical xml, using e.g. xslt. OTOH, the more hierarchical levels there are in the unflattened xml, the harder it is to work out their relationships when you receive the flattened data.

To put it another way, when you flatten a hierarchical structure, you throw away knowledge that you already know about the data. The data user will have to reconstruct some or all of it, which may or may not be feasible but will usually be harder than necessary.

Of course, in real life there's usually a place for everything, so there could be some cases for which a flattened design would be good. But not in general.

So I disagree with your recommendation.

Be a markup minimalist.

Here are the reasons for my recommendation:

1. Consumers of flat XML can apply powerful parsing techniques to the
linear sequence of elements. Thus, consumers can add whatever
structure is appropriate for their particular applications to process
the data efficiently.

2. XML Schema design and implementation of flat XML is trivial:
simply create an XML Schema with a sequential list of element
declarations.

3. If, on the other hand, you were to design the XML to have lots of
structure, it is a near certainty that the structure will not be
suitable for many of your consumer's applications. Further, its
structure is likely to hamper powerful parsing techniques.

The "flatter is better" philosophy may be summarized this way:

I can't predict how my consumers will want the data structured, so I
won't try to predict. I will let them apply their own structure to
the data.

Let's take an example. Suppose that you want to model a grape
vineyard, with pickers scattered about on the various lots. The
following XML is not flat. It's probably a design that most people
would come up with. I assert it is a bad design.

<Vineyard> <Lot id="1"> <ripe-grapes>4</ripe-grapes> <Picker
id="John"> <metabolism>2</metabolism>
<grape-wealth>20</grape-wealth> </Picker> </Lot> <Lot id="2">
<ripe-grapes>3</ripe-grapes> </Lot> ... </Vineyard>

That design is well-suited to operations such as this:

What Pickers are on lot 23?

But it is horrible for operations such as this:

Move Picker John to Lot 2.

Don't design XML that way. Design XML to be flat, like this:

<Vineyard> <lot-number>1</lot-number> <ripe-grapes>4</ripe-grapes>
<picker>John</picker> <metabolism>2</metabolism>
<grape-wealth>20</grape-wealth> <lot-number>1</lot-number>
<ripe-grapes>3</ripe-grapes> ... </Vineyard>

That's a beautiful design. It enables powerful parsing techniques to
be applied to it. For instance, one consumer may parse it to generate
the above structuring. Another consumer may parse it to generate this
radically different structuring:

<Lot id="1"> <ripe-grapes>4</ripe-grapes> </Lot> <Lot id="2">
<ripe-grapes>3</ripe-grapes> </Lot> <Picker id="John" locatedOn="1">
<metabolism>2</metabolism> <grape-wealth>20</grape-wealth> </Picker>
...

And another consumer may parse it to generate still another
structuring.

Each consumer parses the flat XML to create a structuring that is
well-suited to their particular application processing.

Are you creating XML "for the long haul"?

Are you creating XML "for a broad, diverse set of clients"?

Are you a manager and don't want to dump a lot of time and money into
creating "the perfect XML Schema design"?

Then create flat XML.

Flatter is better!

For more info on parsing flat XML see my recent posts:

Recursive Descent Parsing for XML Developers:
http://lists.xml.org/archives/xml-dev/201410/msg00017.html

Bottom-up Parsing for XML Developers:
http://lists.xml.org/archives/xml-dev/201409/msg00016.html

Comments welcome.

/Roger

References:
- XML data interchange format: Flatter is better
  - From: "Costello, Roger L." <costello@mitre.org>

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]