[
Lists Home |
Date Index |
Thread Index
]
Outstanding discussion! Thanks Christian for keeping me honest.
Every engineering decision has its
pros and cons. I failed to consider
that in my original post. So, I
have completely rewritten my original post to (hopefully) reflect a more
balanced perspective. Have I fairly
captured the pros and cons of the 3 issues I try to address? /Roger
3 Things to Consider when
Designing XML
Introduction
In this message I will
discuss XML design with respect to these 3 things:
1.
Element relationships –
implicit versus explicit.
2.
Element coupling – tight
versus loose.
3.
Element structuring – nested
(hierarchical) versus flat.
Which is Better Design?
Suppose that I have data
about a grape vineyard. Shown below
are two lots on the vineyard, and a picker on one of the lots. Two ways of designing the data are
shown. In this message I will
address the issue: which design is better?
Version #1
<Lot id="1">
<ripe-grapes>4</ripe-grapes>
<Picker id="John">
<metabolism>2</metabolism>
<grape-wealth>20</grape-wealth>
</Picker>
</Lot>
<Lot
id="2">
<ripe-grapes>3</ripe-grapes>
</Lot>
Version #2
<Lot
id="1">
<ripe-grapes>4</ripe-grapes>
</Lot>
<Lot
id="2">
<ripe-grapes>3</ripe-grapes>
</Lot>
<Picker id="John" locatedOn="1">
<metabolism>2</metabolism>
<grape-wealth>20</grape-wealth>
</Picker>
Question/Observations
Before proceeding with
analysis, let me first ask a preliminary question, and make some
observations:
1.
Question: How difficult is it for an
application to process each version?
Example: Suppose that an
application wants to move the Picker to Lot
2. With which version would it be
easier to do this?
Example: Suppose
that an application wants to find all Pickers on a Lot. With
which version would it be easier to do this?
2.
Observation: Both versions are
comprised of two components:
- A Lot
component.
- A Picker
component.
3.
Observation: Both versions are
modular. Both represent the Picker
as located on Lot 1. They differ in how the information is
physically represented. [An equivalent (although fancier) way of stating this
observation is: in both versions the information is structured hierarchically,
but the lexical representation is different, i.e., same information structure, different lexical representation.]
4.
Observation: The two versions differ in
these 3 respects:
- Implicit
relationship versus explicit relationship.
- Tight coupling versus
loose coupling.
- Nested (hierarchical) data
versus flat data.
Implicit versus Explicit
Relationships
Compare the two versions
with respect to how the relationship between the Lot and Picker components are
specified:
Version #1
What is the relationship
between the Lot and the
Picker?
<Lot
id="1">
<Picker id="John">
…
</Picker>
</Lot>
You might state that the
relationship is:
"The Lot contains
the Picker"
Another person might state
that the relationship is:
"The Lot has a
Picker"
Still another person might
state that the relationship is:
"The Picker is a child of Lot"
The relationship between the
Lot and the Picker is implicit. Let us consider the advantage and
disadvantage of implicit relationships.
Advantage:
There is no need to create
some artifact (attribute or element) to specify what the relationship is. The syntax of XML provides a built-in
relationship between an element and its children, albeit a somewhat fuzzy
relationship.
Disadvantage:
As noted, one person may
interpret the relationship in one way, another person may interpret the
relationship in another way. Thus,
there is ambiguity and is subject to misinterpretation. In any computer application, differing
interpretations of things left unspecified is the most insidious of
errors.
Version #2
What is the relationship
between the Lot and the
Picker?
<Lot
id="1">
…
</Lot>
<Picker id="John" locatedOn="1">
…
</Picker>
Clearly the relationship
is:
"The Picker is
locatedOn the Lot"
The relationship between the
Lot and the Picker is explicit. That is, the relationship (locatedOn) is
explicitly specified in the instance document. Let us consider the advantage and
disadvantage of explicit relationships.
Advantage:
There is no ambiguity. The relationship between the Lot and Picker components is not subject to
misinterpretation.
Disadvantage:
There must be a common
understanding between the sender and the receiver on what is being used to
represent the relationship, and what its meaning is. For example, it must be understood that
the locatedOn attribute is being used to represent the relationship, and there
must be a common understanding of the semantics of
locatedOn.
Tight Coupling vs Loose
Coupling
Compare the two versions
with respect to connectedness between the Lot
and Picker components:
Version #1
<Lot
id="1">
<Picker id="John">
…
</Picker>
</Lot>
The Picker is nested
(buried) within the Lot. There is tight coupling between the
components. Let us consider the
advantage and disadvantage of tight coupling.
Advantage:
Tight coupling between
components can make processing by an application easier. For example, finding all Pickers on a
Lot is easy since the Pickers are children of the Lot.
Disadvantage:
Tight coupling between
components can make processing by an application difficult. For example, moving the Picker from Lot
1 to Lot 2 is expensive in terms of the amount
of code needed, memory needed, and processing time needed.
Version #2
<Lot
id="1">
…
</Lot>
<Picker id="John" locatedOn="1">
…
</Picker>
The Lot and Picker components are physically completely
separate. The only connection
between them is the pointer from the Picker to the Lot (the locatedOn attribute). There is loose coupling between the
components. Let us consider the
advantage and disadvantage of loose coupling.
Advantage:
Loose coupling between
components can make processing by an application easier. For example, moving the Picker from Lot
1 to Lot 2 is easy - simply change the value of
locatedOn.
Disadvantage:
Loose coupling between
components can make processing by an application difficult. For example, locating the Pickers on a
Lot is more difficult since they are
separated.
Nested (hierarchical) Data vs Flat
Data
Compare the two versions
with respect to the physical placement of the Lot and Picker components:
Version #1
<Lot
id="1">
<Picker id="John">
…
</Picker>
</Lot>
The Picker "box" is nested
within the Lot "box". That is, the data is physically
structured in a nested (hierarchical) fashion. Let us consider the advantage and
disadvantage of structuring data in a "box within a box" fashion.
Advantage:
Visually it is easy to see
what Pickers are on a Lot.
Disadvantage:
The more nesting there is,
the more difficult it is to understand.
Version #2
<Lot
id="1">
…
</Lot>
<Picker id="John" locatedOn="1">
…
</Picker>
There are two separate
"boxes". That is, the data is
physically structured in a flat fashion. Let us consider the advantage and
disadvantage of structuring data in a flat/linear fashion.
Advantage:
Separate components are
easier to understand at a glance.
Disadvantage:
The fact that two components
are related is not immediately apparent.
Best Practice
Let’s review each of the 3
things that has been discussed in this message.
1.
Relationship between
Components – implicit or explicit?
In the best of all worlds
relationships between components would be explicit and built into the
language. Thus, there would be no
ambiguity regarding the relationship between components, and the set of possible
relationships would be known to everyone.
Alas, XML provides one
built-in relationship and it is a fuzzy one. So, you have two
choices:
-
Create your own artifact
(attribute or element) which specifies the relationship
explicitly.
-
Use the (fuzzy) relationship
that XML provides.
If you create your own
relationship then all parties receiving your XML document must have a shared
understanding of both (1) what artifact you are using to specify the
relationship, and (2) the semantics of that artifact.
If you use the fuzzy
relationship that XML provides then you risk
misunderstanding.
Best Practice – in a
mission-critical applications there can be no tolerance for
misunderstanding. Use explicit
relationships. If you are in a
community where you can get shared understanding, then use explicit
relationships. If you require
relationships other than parent/child then use explicit relationships. For all other applications implicit
relationships may be satisfactory.
2.
Coupling between Components
– tight or loose?
Your processing requirements
will strongly influence whether coupling between components should be tight or
loose. In the vineyard example we
saw that loosely coupled components enabled an application to easily move a
Picker to another Lot by simply changing the
value of the locatedOn attribute.
On the other hand, if an application wants to list all Pickers on a Lot
then nesting the Pickers within a Lot (i.e.,
tight coupling) makes sense.
Case study: consider an application
that models the movement of Pickers on the vineyard. In this example the majority of the
processing is spent moving the Pickers.
Only at the conclusion of processing Picker movements is there a display
of the Pickers on each Lot. In this example, there can be no debate
– loose coupling of the Picker and Lot
components is essential.
3.
Nested (hierarchical) Data
or Flat Data?
Nested (hierarchical) data
is perhaps easier to manage, as everything is located in one tidy chuck. On the other hand, the more nested the
elements run, the more difficult it is to understand the
data.
Conversely, flat data is
perhaps less easy to manage, as the components are scattered about. On the other hand, components in
isolation are typically easier to
understand.
|