Better Design: "Flatter
is Better" or "Nesting is Better" ?
Table of Contents
- Synonyms
of the terms "Nested Design" and "Flat Design"
- The
Nested Design
- The
Flat Design
- Example
of an Application that is better-suited to an XML document that uses the
Nested Design
- Example
of an Application that is better-suited to an XML document that uses the Flat
Design
- The
Relationship between Applications and XML documents
- Example
of an XML document whose Destiny is to have its Data Mapped into a Relational
Database
- Example
of an XML document whose Destiny is to have its Data Mapped into a Programming
Data Structure
- Should
you be Lavish or Spartan with the use of Markup?
- Should
you attempt to "Futureproof" your XML?
- Conclusions
- Acknowledgements
Synonyms
Nested design:
hierarchical design
Flat design:
relational design, id/idref based
design
The Nested Design
The nested design is characterized by structuring data using parent-child
relationships.
Example: Consider a grape Vineyard comprised of Lots with Pickers scattered
about on the Lots. Here is how the Vineyard data might be structured using the
nested design:
<Vineyard>
<Lot id="1">
<ripe-grapes>4</ripe-grapes>
<Picker>
<name>John</name>
<metabolism>2</metabolism>
<grape-wealth>20</grape-wealth>
</Picker>
</Lot>
<Lot id="1">
<ripe-grapes>3</ripe-grapes>
</Lot>
...
</Vineyard>
Two Lots are shown. Lot 1 has a Picker (John) on it. Note that the Picker
data is nested directly within the Lot.
The Flat Design
With the flat design there is minimal usage of parent-child relationships.
Instead, data fragment A is related to data fragment B by fragment A identifying
itself, and fragment B referencing it. Here is how the Vineyard data might be
structured using the flat design:
<Vineyard>
<Lot id="1">
<ripe-grapes>4</ripe-grapes>
</Lot>
<Lot id="2">
<ripe-grapes>3</ripe-grapes>
</Lot>
...
<Picker locatedOn="1">
<name>John</name>
<metabolism>2</metabolism>
<grape-wealth>20</grape-wealth>
</Picker>
...
</Vineyard>
This version also has Picker John on Lot 1. However, it accomplishes this
through the use of an id/idref mechanism: each Lot is uniquely identified, and
Picker John references (using the locatedOn attribute) the Lot that he is
on.
Processing the Vineyard by an Application
Above we examined two ways of structuring the Vineyard data - using a nested
design and using a flat design. Suppose that an application receives a Vineyard
XML document. Depending on what the application wants to do with the data, one
design may be better suited than the other.
Application 1 - Find all the Pickers on a Lot
Suppose that the application wants to locate all the Pickers that are on a
Lot. For example, suppose that the application wants to locate all the Pickers
on Lot 1. Which design is better suited to this application?
Suitability of the Nested Design: Locating the Pickers on Lot 1 is
simply a matter of navigating to Lot 1 and then selecting the Picker children.
Suitability of the Flat Design: To locate the Pickers on Lot 1
requires examining each Picker and checking to see if its locatedOn attribute
references Lot 1.
Conclusion: The Nested Design is better suited to an application that
needs to locate Pickers on a Lot.
Application 2 - Move a Picker to a Different Lot
Suppose that the application wants to move the Pickers from their current Lot
onto a different Lot. For example, suppose that the application wants to move
Picker John from Lot 1 to Lot 15. Which design is better suited to this
application?
Suitability of the Nested Design: To move Picker John from Lot 1 to
Lot 15 involves a considerable amount of work, as the Picker must be extracted
from Lot 1 and inserted into Lot 15.
Suitability of the Flat Design: Moving Picker John is simply a matter
of changing his locatedOn attribute value from 1 to 15.
Conclusion: The Flat Design is better suited to an application that
needs to move Pickers from their current Lot onto a different Lot.
Lessons Learned
The above example reveals two things:
- How you design your XML can significantly impact its processing by
applications.
- Designing your XML in one fashion may make it well-suited for processing
by one application, but poorly suited for processing by a different
application.
Relationship between an Application and an XML Document
There are three ways to categorize the relationship between an application
and an XML document:
- The application operates directly on the XML document.
- The XML document is transformed into some other format (language objects,
relational database, etc), and the application operates on the data in that
other format.
- The XML document is the application.
In the above Vineyard example we saw two applications. Both applications
operated directly on the XML document. The XML document is akin to a database -
it stores data that is queried and processed by applications.
On the other hand, sometimes an application does not operate directly on the
XML document. Instead, the data in the XML document is transformed into some
other format, and the application then operates on the data in that format. For
example, the data in the XML document is extracted and placed into tables in a
relational database. The application then operates on the data in the database.
Another common scenario is that the data in the XML document is extracted and
placed into data structures in a program. In both of these situations the XML
document is merely serving as a transport syntax (i.e., the XML is simply "bits
on the wire").
Sometimes, the XML document is the application! For example, an XSLT document
is an XML document, and it is also an application.
When the XML is being used just as a Transport Syntax which is Better -
Nested or Flat?
Above we discussed the case where an application operates directly upon the
XML document. We saw that sometimes it is best to design the XML document using
a nested design, other times it is best to design the the XML document using a
flat design. The application was the deciding factor.
Now let's consider the case where the XML document is merely a transport
syntax. Applications don't operate directly on the XML document. Instead, they
operate on the data after it has been transformed into another format. Let's
examine two formats that XML documents are commonly transformed into.
Format 1: The XML Data is stored into a Relational Database
Let's suppose that a database contains these two tables for storing the
Vineyard data:
Lot Table |
|
Picker Table |
Lot id |
Ripe Grapes |
Picker id |
... |
... |
... |
53 |
3 |
Pete |
... |
... |
... | |
|
Picker id |
Metabolism |
Grape Wealth |
... |
... |
... |
Pete |
1 |
10 |
... |
... |
... | |
Let's consider the XML documents to determine which design is better-suited.
Note that the XML is merely a (temporary) transport syntax. Also note that the
destiny of the XML data is a relational database (containing the above two
tables).
Here is the XML document that uses the nested design:
<Vineyard>
<Lot id="1">
<ripe-grapes>4</ripe-grapes>
<Picker>
<name>John</name>
<metabolism>2</metabolism>
<grape-wealth>20</grape-wealth>
</Picker>
</Lot>
<Lot id="1">
<ripe-grapes>3</ripe-grapes>
</Lot>
...
</Vineyard>
Although it is achievable, it will clearly entail some effort to map the XML
document into the tables. And for an XML document that has a great deal of
nesting the mapping could be quite difficult.
Here is the XML document that uses the flat design:
<Vineyard>
<Lot id="1">
<ripe-grapes>4</ripe-grapes>
</Lot>
<Lot id="2">
<ripe-grapes>3</ripe-grapes>
</Lot>
...
<Picker locatedOn="1">
<name>John</name>
<metabolism>2</metabolism>
<grape-wealth>20</grape-wealth>
</Picker>
...
</Vineyard>
This XML document can be directly mapped into the tables.
Format 2: The XML Data is Stored into Program Data Structures
Suppose that a program is using this data structure to store the vineyard
data:
struct {
int lot_id;
int ripe_grapes;
struct {
string name;
int metabolism;
int grape_wealth;
}
}
The XML document that uses the nested design maps directly to this data
structure. The XML document that uses the flat design would require a bit of
manipulation to map into this data structure.
Lessons Learned
Where the XML document is used as a transport syntax you need to consider the
destination for the data. If the destination for the data is a set of relational
database tables then the flat design is likely to be better-suited. If the
destination for the data is programmatic data structures then the better-suited
design could be either the nested design or the flat design.
Should you be Lavish or Spartan with the use of Markup?
There are two philosophies with regards to the use of markup. One philosophy
is to use markup sparingly. If there is not a definite purpose for a tag, then
it shouldn't be used. The other philosophy is to maximize the use of markup, as
markup can make processing of the XML document easier, and provides more
organization to the data.
Example of Lavish Markup and Spartan Markup
Here is an XML document that uses lavish markup:
<Book>
<BookTitle>
<Title>The XML Bible</Title>
</BookTitle>
<Chapters>
<Chapter>
<Title>An Eagle's Eye View of XML<Title>
<Section>
<Title>What is XML?</Title>
<Subsection>
<Title>XML is a meta-markup language</Title>
<Title>XML describes structure and semantics, not formatting</Title>
</Subsection>
</Section>
<Section>
<Title>Why are Developers Excited About XML?</Title>
<Subsection>
<Title>Design of field-specific markup languages</Title>
<Title>Self-describing data</Title>
<Title>Interchange of data among applications</Title>
<Title>Structured and integrated data</Title>
</Subsection>
</Section>
...
</Chapter>
...
</Chapters>
</Book>
Here is the same data, but it has stripped out some tags (i.e., it uses
spartan markup):
<Book>
<Title>The XML Bible</Title>
<Chapter>
<Title>An Eagle's Eye View of XML<Title>
<Section>
<Title>What is XML?</Title>
<Subsection>
<Title>XML is a meta-markup language</Title>
<Title>XML describes structure and semantics, not formatting</Title>
</Subsection>
</Section>
<Section>
<Title>Why are Developers Excited About XML?</Title>
<Subsection>
<Title>Design of field-specific markup languages</Title>
<Title>Self-describing data</Title>
<Title>Interchange of data among applications</Title>
<Title>Structured and integrated data</Title>
</Subsection>
</Section>
...
</Chapter>
...
</Book>
Note that the lavish markup version uses <BookTitle> and
<Chapters>. These tags are considered unnecessary and thus omitted in the
spartan markup version.
Which is better - lavish markup or spartan markup? There is no clear answer.
It should be noted, however, that "it's easier to remove markup than add
markup". So, as the above XML is passed from application to application, an
application can easily remove the <BookTitle> and <Chapters> tags if
it wants to.
Should you attempt to "Futureproof" your XML?
Answer: No. The future is unknown and unknowable. It's best to get the
simplest possible XML working that meets today's requirements. Make changes in
the future as required.
Conclusions
What's the best way to design an XML document? Here are some things to
consider:
- Will applications directly operate on the XML document? Is the XML
document intended to serve as a storage medium that is to be queried by
applications?
- Or is the XML document merely a transport syntax? Once it arrives at its
destination the data is extracted and placed into another format, and
applications process the data in that format.
- In the case of applications operating directly on the XML document, you
must consider the types of processing that the application will perform. Some
processing is better-suited with XML documents that use a nested design, while
other processing is better-suited with XML documents that use a flat design.
- In the case of XML documents that are merely transport syntax, you need to
consider the destiny of the data, that is, you need to consider the format
that the data will be placed into.
Acknowledgements
Many thanks to the following people who participated in the discussions:
- Len Bullard
- Michael Champion
- Joe Chiusano
- Cheryl Connors
- Roger Costello
- Mary Holstege
- Diane Howard
- Peter Hunsberger
- Rick Jelliffe
- Michael Kay
- Ken Laskey
- Anne Thomas Manes
- Ken North
- Dave Pawson
- Michael Rys
- Scott Renner
- Doug Schepers
- Andrezej Jan Tarimina
- Dan Vint
- Nathan Young