xml-dev - [Summary] Better design: "flatter is better" or "nesting is better" ?

[Summary] Better design: "flatter is better" or "nesting is better" ?

[ Lists Home | Date Index | Thread Index ]

To: "XML Developers List" <xml-dev@lists.xml.org>
Subject: [Summary] Better design: "flatter is better" or "nesting is better" ?
From: "Costello, Roger L." <costello@mitre.org>
Date: Mon, 10 Oct 2005 08:02:08 -0400
Thread-index: AcXNknBtYJkrUah8TKmSkDqJ1tcVTg==
Thread-topic: [Summary] Better design: "flatter is better" or "nesting is better" ?

Hi Folks,

Once again, thank you for your outstanding comments on this topic. Below is my attempt at a summary of our discussions. /Roger

Better Design: "Flatter is Better" or "Nesting is Better" ?

Synonyms

Nested design:
hierarchical design

Flat design:
relational design, id/idref based design

The Nested Design

The nested design is characterized by structuring data using parent-child relationships.

Example: Consider a grape Vineyard comprised of Lots with Pickers scattered about on the Lots. Here is how the Vineyard data might be structured using the nested design:

<Vineyard>
      <Lot id="1"> 
           <ripe-grapes>4</ripe-grapes> 
           <Picker> 
                 <name>John</name>
                 <metabolism>2</metabolism> 
                 <grape-wealth>20</grape-wealth> 
           </Picker> 
     </Lot> 
     <Lot id="1"> 
           <ripe-grapes>3</ripe-grapes> 
     </Lot>
     ... 
</Vineyard>

Two Lots are shown. Lot 1 has a Picker (John) on it. Note that the Picker data is nested directly within the Lot.

The Flat Design

With the flat design there is minimal usage of parent-child relationships. Instead, data fragment A is related to data fragment B by fragment A identifying itself, and fragment B referencing it. Here is how the Vineyard data might be structured using the flat design:

<Vineyard>
     <Lot id="1"> 
           <ripe-grapes>4</ripe-grapes> 
     </Lot> 
     <Lot id="2"> 
           <ripe-grapes>3</ripe-grapes> 
     </Lot> 
     ...
     <Picker locatedOn="1"> 
           <name>John</name>
           <metabolism>2</metabolism> 
           <grape-wealth>20</grape-wealth> 
     </Picker>
     ... 
</Vineyard>

This version also has Picker John on Lot 1. However, it accomplishes this through the use of an id/idref mechanism: each Lot is uniquely identified, and Picker John references (using the locatedOn attribute) the Lot that he is on.

Processing the Vineyard by an Application

Above we examined two ways of structuring the Vineyard data - using a nested design and using a flat design. Suppose that an application receives a Vineyard XML document. Depending on what the application wants to do with the data, one design may be better suited than the other.

Application 1 - Find all the Pickers on a Lot

Suppose that the application wants to locate all the Pickers that are on a Lot. For example, suppose that the application wants to locate all the Pickers on Lot 1. Which design is better suited to this application?

Suitability of the Nested Design: Locating the Pickers on Lot 1 is simply a matter of navigating to Lot 1 and then selecting the Picker children.

Suitability of the Flat Design: To locate the Pickers on Lot 1 requires examining each Picker and checking to see if its locatedOn attribute references Lot 1.

Conclusion: The Nested Design is better suited to an application that needs to locate Pickers on a Lot.

Application 2 - Move a Picker to a Different Lot

Suppose that the application wants to move the Pickers from their current Lot onto a different Lot. For example, suppose that the application wants to move Picker John from Lot 1 to Lot 15. Which design is better suited to this application?

Suitability of the Nested Design: To move Picker John from Lot 1 to Lot 15 involves a considerable amount of work, as the Picker must be extracted from Lot 1 and inserted into Lot 15.

Suitability of the Flat Design: Moving Picker John is simply a matter of changing his locatedOn attribute value from 1 to 15.

Conclusion: The Flat Design is better suited to an application that needs to move Pickers from their current Lot onto a different Lot.

Lessons Learned

The above example reveals two things:

How you design your XML can significantly impact its processing by applications.
Designing your XML in one fashion may make it well-suited for processing by one application, but poorly suited for processing by a different application.

Relationship between an Application and an XML Document

There are three ways to categorize the relationship between an application and an XML document:

The application operates directly on the XML document.
The XML document is transformed into some other format (language objects, relational database, etc), and the application operates on the data in that other format.
The XML document is the application.

In the above Vineyard example we saw two applications. Both applications operated directly on the XML document. The XML document is akin to a database - it stores data that is queried and processed by applications.

On the other hand, sometimes an application does not operate directly on the XML document. Instead, the data in the XML document is transformed into some other format, and the application then operates on the data in that format. For example, the data in the XML document is extracted and placed into tables in a relational database. The application then operates on the data in the database. Another common scenario is that the data in the XML document is extracted and placed into data structures in a program. In both of these situations the XML document is merely serving as a transport syntax (i.e., the XML is simply "bits on the wire").

Sometimes, the XML document is the application! For example, an XSLT document is an XML document, and it is also an application.

When the XML is being used just as a Transport Syntax which is Better - Nested or Flat?

Above we discussed the case where an application operates directly upon the XML document. We saw that sometimes it is best to design the XML document using a nested design, other times it is best to design the the XML document using a flat design. The application was the deciding factor.

Now let's consider the case where the XML document is merely a transport syntax. Applications don't operate directly on the XML document. Instead, they operate on the data after it has been transformed into another format. Let's examine two formats that XML documents are commonly transformed into.

Format 1: The XML Data is stored into a Relational Database

Let's suppose that a database contains these two tables for storing the Vineyard data:

Lot Table

Picker Table

Lot id	Ripe Grapes	Picker id
...	...	...
53	3	Pete
...	...	...

Picker id	Metabolism	Grape Wealth
...	...	...
Pete	1	10
...	...	...

Let's consider the XML documents to determine which design is better-suited. Note that the XML is merely a (temporary) transport syntax. Also note that the destiny of the XML data is a relational database (containing the above two tables).

Here is the XML document that uses the nested design:

<Vineyard>
      <Lot id="1"> 
           <ripe-grapes>4</ripe-grapes> 
           <Picker> 
                 <name>John</name>
                 <metabolism>2</metabolism> 
                 <grape-wealth>20</grape-wealth> 
           </Picker> 
     </Lot> 
     <Lot id="1"> 
           <ripe-grapes>3</ripe-grapes> 
     </Lot>
     ... 
</Vineyard>

Although it is achievable, it will clearly entail some effort to map the XML document into the tables. And for an XML document that has a great deal of nesting the mapping could be quite difficult.

Here is the XML document that uses the flat design:

<Vineyard>
     <Lot id="1"> 
           <ripe-grapes>4</ripe-grapes> 
     </Lot> 
     <Lot id="2"> 
           <ripe-grapes>3</ripe-grapes> 
     </Lot> 
     ...
     <Picker locatedOn="1"> 
           <name>John</name>
           <metabolism>2</metabolism> 
           <grape-wealth>20</grape-wealth> 
     </Picker>
     ... 
</Vineyard>

This XML document can be directly mapped into the tables.

Format 2: The XML Data is Stored into Program Data Structures

Suppose that a program is using this data structure to store the vineyard data:

struct {
    int lot_id;
    int ripe_grapes;  
    struct {
        string name;
        int metabolism;
        int grape_wealth;
    }
}

The XML document that uses the nested design maps directly to this data structure. The XML document that uses the flat design would require a bit of manipulation to map into this data structure.

Lessons Learned

Where the XML document is used as a transport syntax you need to consider the destination for the data. If the destination for the data is a set of relational database tables then the flat design is likely to be better-suited. If the destination for the data is programmatic data structures then the better-suited design could be either the nested design or the flat design.

Should you be Lavish or Spartan with the use of Markup?

There are two philosophies with regards to the use of markup. One philosophy is to use markup sparingly. If there is not a definite purpose for a tag, then it shouldn't be used. The other philosophy is to maximize the use of markup, as markup can make processing of the XML document easier, and provides more organization to the data.

Example of Lavish Markup and Spartan Markup

Here is an XML document that uses lavish markup:

<Book>
    <BookTitle>
        <Title>The XML Bible</Title>
    </BookTitle>
    <Chapters>
        <Chapter>
            <Title>An Eagle's Eye View of XML<Title>
            <Section>
                <Title>What is XML?</Title>
                <Subsection>
                    <Title>XML is a meta-markup language</Title>
                    <Title>XML describes structure and semantics, not formatting</Title>
                </Subsection>
            </Section>
            <Section>
                <Title>Why are Developers Excited About XML?</Title>
                <Subsection>
                    <Title>Design of field-specific markup languages</Title>
                    <Title>Self-describing data</Title>
                    <Title>Interchange of data among applications</Title>
                    <Title>Structured and integrated data</Title>
                </Subsection>
            </Section>
            ...
        </Chapter>
        ...
    </Chapters>
</Book>

Here is the same data, but it has stripped out some tags (i.e., it uses spartan markup):

<Book>
    <Title>The XML Bible</Title>
    <Chapter>
        <Title>An Eagle's Eye View of XML<Title>
        <Section>
            <Title>What is XML?</Title>
            <Subsection>
                <Title>XML is a meta-markup language</Title>
                <Title>XML describes structure and semantics, not formatting</Title>
            </Subsection>
        </Section>
        <Section>
            <Title>Why are Developers Excited About XML?</Title>
            <Subsection>
                <Title>Design of field-specific markup languages</Title>
                <Title>Self-describing data</Title>
                <Title>Interchange of data among applications</Title>
                <Title>Structured and integrated data</Title>
            </Subsection>
        </Section>
        ...
    </Chapter>
    ...
</Book>

Note that the lavish markup version uses <BookTitle> and <Chapters>. These tags are considered unnecessary and thus omitted in the spartan markup version.

Which is better - lavish markup or spartan markup? There is no clear answer. It should be noted, however, that "it's easier to remove markup than add markup". So, as the above XML is passed from application to application, an application can easily remove the <BookTitle> and <Chapters> tags if it wants to.

Should you attempt to "Futureproof" your XML?

Answer: No. The future is unknown and unknowable. It's best to get the simplest possible XML working that meets today's requirements. Make changes in the future as required.

Conclusions

What's the best way to design an XML document? Here are some things to consider:

Will applications directly operate on the XML document? Is the XML document intended to serve as a storage medium that is to be queried by applications?
Or is the XML document merely a transport syntax? Once it arrives at its destination the data is extracted and placed into another format, and applications process the data in that format.
In the case of applications operating directly on the XML document, you must consider the types of processing that the application will perform. Some processing is better-suited with XML documents that use a nested design, while other processing is better-suited with XML documents that use a flat design.
In the case of XML documents that are merely transport syntax, you need to consider the destiny of the data, that is, you need to consider the format that the data will be placed into.

Acknowledgements

Many thanks to the following people who participated in the discussions:

Len Bullard
Michael Champion
Joe Chiusano
Cheryl Connors
Roger Costello
Mary Holstege
Diane Howard
Peter Hunsberger
Rick Jelliffe
Michael Kay
Ken Laskey
Anne Thomas Manes
Ken North
Dave Pawson
Michael Rys
Scott Renner
Doug Schepers
Andrezej Jan Tarimina
Dan Vint
Nathan Young

Follow-Ups:
- Re: [xml-dev] [Summary] Better design: "flatter is better" or
  - From: "Rick Jelliffe" <rjelliffe@allette.com.au>
- how to skip parsing of a subtree?
  - From: Anil Philip <goodnewsforyou@yahoo.com>

Prev by Date: Re[3]: [xml-dev] Announce: Unit Testing Framework - XSLT (UTF-X)
Next by Date: how to skip parsing of a subtree?
Previous by thread: Announce: Unit Testing Framework - XSLT (UTF-X)
Next by thread: how to skip parsing of a subtree?
Index(es):
- Date
- Thread

Better Design: "Flatter is Better" or "Nesting is Better" ?

Table of Contents

Synonyms

The Nested Design

The Flat Design

Processing the Vineyard by an Application

Application 1 - Find all the Pickers on a Lot

Application 2 - Move a Picker to a Different Lot

Lessons Learned

Relationship between an Application and an XML Document

When the XML is being used just as a Transport Syntax which is Better - Nested or Flat?

Format 1: The XML Data is stored into a Relational Database

Format 2: The XML Data is Stored into Program Data Structures

Lessons Learned

Should you be Lavish or Spartan with the use of Markup?

Example of Lavish Markup and Spartan Markup

Should you attempt to "Futureproof" your XML?

Conclusions

Acknowledgements