RE: [xml-dev] 3 XML Design Principles - a rebuttal

[ Lists Home | Date Index | Thread Index ]

To: "'XML Developers List'" <xml-dev@lists.xml.org>
Subject: RE: [xml-dev] 3 XML Design Principles - a rebuttal
From: "Roger L. Costello" <costello@mitre.org>
Date: Sun, 30 Jan 2005 15:07:06 -0500
In-reply-to: <1681903875.20050130110722@systemwire.com>
Thread-index: AcUGvCx99cxgovd9TX6V6sdHsZhexQASpwgQ

Outstanding discussion! Thanks Christian for keeping me honest. Every engineering decision has its pros and cons. I failed to consider that in my original post. So, I have completely rewritten my original post to (hopefully) reflect a more balanced perspective. Have I fairly captured the pros and cons of the 3 issues I try to address? /Roger

3 Things to Consider when Designing XML

Introduction

In this message I will discuss XML design with respect to these 3 things:

1. Element relationships – implicit versus explicit.

2. Element coupling – tight versus loose.

3. Element structuring – nested (hierarchical) versus flat.

Which is Better Design?

Suppose that I have data about a grape vineyard. Shown below are two lots on the vineyard, and a picker on one of the lots. Two ways of designing the data are shown. In this message I will address the issue: which design is better?

Version #1

<ripe-grapes>4</ripe-grapes>

<grape-wealth>20</grape-wealth>

</Picker>

</Lot>

<ripe-grapes>3</ripe-grapes>

</Lot>

Version #2

<ripe-grapes>4</ripe-grapes>

</Lot>

<ripe-grapes>3</ripe-grapes>

</Lot>

<grape-wealth>20</grape-wealth>

</Picker>

Question/Observations

Before proceeding with analysis, let me first ask a preliminary question, and make some observations:

1. Question: How difficult is it for an application to process each version?

Example: Suppose that an application wants to move the Picker to Lot 2. With which version would it be easier to do this?

Example: Suppose that an application wants to find all Pickers on a Lot. With which version would it be easier to do this?

2. Observation: Both versions are comprised of two components:

- A Lot component.

- A Picker component.

3. Observation: Both versions are modular. Both represent the Picker as located on Lot 1. They differ in how the information is physically represented. [An equivalent (although fancier) way of stating this observation is: in both versions the information is structured hierarchically, but the lexical representation is different, i.e., same information structure, different lexical representation.]

4. Observation: The two versions differ in these 3 respects:

- Implicit relationship versus explicit relationship.

- Tight coupling versus loose coupling.

- Nested (hierarchical) data versus flat data.

Implicit versus Explicit Relationships

Compare the two versions with respect to how the relationship between the Lot and Picker components are specified:

Version #1

What is the relationship between the Lot and the Picker?

…

</Picker>

</Lot>

You might state that the relationship is:

"The Lot contains the Picker"

Another person might state that the relationship is:

"The Lot has a Picker"

Still another person might state that the relationship is:

"The Picker is a child of Lot"

The relationship between the Lot and the Picker is implicit. Let us consider the advantage and disadvantage of implicit relationships.

Advantage:

There is no need to create some artifact (attribute or element) to specify what the relationship is. The syntax of XML provides a built-in relationship between an element and its children, albeit a somewhat fuzzy relationship.

Disadvantage:

As noted, one person may interpret the relationship in one way, another person may interpret the relationship in another way. Thus, there is ambiguity and is subject to misinterpretation. In any computer application, differing interpretations of things left unspecified is the most insidious of errors.

Version #2

What is the relationship between the Lot and the Picker?

…

</Lot>

…

</Picker>

Clearly the relationship is:

"The Picker is locatedOn the Lot"

The relationship between the Lot and the Picker is explicit. That is, the relationship (locatedOn) is explicitly specified in the instance document. Let us consider the advantage and disadvantage of explicit relationships.

Advantage:

There is no ambiguity. The relationship between the Lot and Picker components is not subject to misinterpretation.

Disadvantage:

There must be a common understanding between the sender and the receiver on what is being used to represent the relationship, and what its meaning is. For example, it must be understood that the locatedOn attribute is being used to represent the relationship, and there must be a common understanding of the semantics of locatedOn.

Tight Coupling vs Loose Coupling

Compare the two versions with respect to connectedness between the Lot and Picker components:

Version #1

…

</Picker>

</Lot>

The Picker is nested (buried) within the Lot. There is tight coupling between the components. Let us consider the advantage and disadvantage of tight coupling.

Advantage:

Tight coupling between components can make processing by an application easier. For example, finding all Pickers on a Lot is easy since the Pickers are children of the Lot.

Disadvantage:

Tight coupling between components can make processing by an application difficult. For example, moving the Picker from Lot 1 to Lot 2 is expensive in terms of the amount of code needed, memory needed, and processing time needed.

Version #2

…

</Lot>

…

</Picker>

The Lot and Picker components are physically completely separate. The only connection between them is the pointer from the Picker to the Lot (the locatedOn attribute). There is loose coupling between the components. Let us consider the advantage and disadvantage of loose coupling.

Advantage:

Loose coupling between components can make processing by an application easier. For example, moving the Picker from Lot 1 to Lot 2 is easy - simply change the value of locatedOn.

Disadvantage:

Loose coupling between components can make processing by an application difficult. For example, locating the Pickers on a Lot is more difficult since they are separated.

Nested (hierarchical) Data vs Flat Data

Compare the two versions with respect to the physical placement of the Lot and Picker components:

Version #1

…

</Picker>

</Lot>

The Picker "box" is nested within the Lot "box". That is, the data is physically structured in a nested (hierarchical) fashion. Let us consider the advantage and disadvantage of structuring data in a "box within a box" fashion.

Advantage:

Visually it is easy to see what Pickers are on a Lot.

Disadvantage:

The more nesting there is, the more difficult it is to understand.

Version #2

…

</Lot>

…

</Picker>

There are two separate "boxes". That is, the data is physically structured in a flat fashion. Let us consider the advantage and disadvantage of structuring data in a flat/linear fashion.

Advantage:

Separate components are easier to understand at a glance.

Disadvantage:

The fact that two components are related is not immediately apparent.

Best Practice

Let’s review each of the 3 things that has been discussed in this message.

1. Relationship between Components – implicit or explicit?

In the best of all worlds relationships between components would be explicit and built into the language. Thus, there would be no ambiguity regarding the relationship between components, and the set of possible relationships would be known to everyone.

Alas, XML provides one built-in relationship and it is a fuzzy one. So, you have two choices:

- Create your own artifact (attribute or element) which specifies the relationship explicitly.

- Use the (fuzzy) relationship that XML provides.

If you create your own relationship then all parties receiving your XML document must have a shared understanding of both (1) what artifact you are using to specify the relationship, and (2) the semantics of that artifact.

If you use the fuzzy relationship that XML provides then you risk misunderstanding.

Best Practice – in a mission-critical applications there can be no tolerance for misunderstanding. Use explicit relationships. If you are in a community where you can get shared understanding, then use explicit relationships. If you require relationships other than parent/child then use explicit relationships. For all other applications implicit relationships may be satisfactory.

2. Coupling between Components – tight or loose?

Your processing requirements will strongly influence whether coupling between components should be tight or loose. In the vineyard example we saw that loosely coupled components enabled an application to easily move a Picker to another Lot by simply changing the value of the locatedOn attribute. On the other hand, if an application wants to list all Pickers on a Lot then nesting the Pickers within a Lot (i.e., tight coupling) makes sense.

Case study: consider an application that models the movement of Pickers on the vineyard. In this example the majority of the processing is spent moving the Pickers. Only at the conclusion of processing Picker movements is there a display of the Pickers on each Lot. In this example, there can be no debate – loose coupling of the Picker and Lot components is essential.

3. Nested (hierarchical) Data or Flat Data?

Nested (hierarchical) data is perhaps easier to manage, as everything is located in one tidy chuck. On the other hand, the more nested the elements run, the more difficult it is to understand the data.

Conversely, flat data is perhaps less easy to manage, as the components are scattered about. On the other hand, components in isolation are typically easier to understand.

Follow-Ups:
- Re: [xml-dev] 3 XML Design Principles - a rebuttal
  - From: Frank Manola <fmanola@acm.org>

References:
- Re: [xml-dev] 3 XML Design Principles - a rebuttal
  - From: Christian Nentwich <christian@systemwire.com>

Prev by Date: Re: [xml-dev] 3 XML Design Principles - a rebuttal
Next by Date: Re: [xml-dev] 3 XML Design Principles - a rebuttal
Previous by thread: RE: [xml-dev] 3 XML Design Principles - a rebuttal
Next by thread: Re: [xml-dev] 3 XML Design Principles - a rebuttal
Index(es):
- Date
- Thread