xml-dev - Re: [xml-dev] 3 possible approaches for representing concepts

Re: [xml-dev] 3 possible approaches for representing concepts

[ Lists Home | Date Index | Thread Index ]

To: xml-dev@lists.xml.org
Subject: Re: [xml-dev] 3 possible approaches for representing concepts
From: "Thomas B. Passin" <tpassin@comcast.net>
Date: Thu, 03 Apr 2003 01:20:57 -0500
References: <3E8B6043.70E3E32F@bah.com>

[Chiusano Joseph]

> I would like to please solicit some quick feedback if possible regarding
> an approach to using elements and/or attributes to represent concepts in
> an XML document. I am having a "healthy debate" with a "colleague" on
> how much "meaning" should be placed into an element name, and how much
> (if any) should be "filled out" by attributes.
>
> Below I've identified 3 approaches for representing a concept called
> (pipes separate the "subconcepts"):
>
> CurrentYear|Budget|Final|Estimated|Amount
>
> A "related" element in the same XML document might be called (note only
> the first subconcept has been changed):
>
> PriorYear|Budget|Final|Estimated|Amount
>

Joe, I do not see enough information in your description to really get down
to it.  That is because I do not know what other uses you may have for some
of these concepts.  How separable do they have to be?  How many similar
strucures will there be, having only minor variations (like prior year vs
current year)

If you are asking about attributes vs elements, I think that is not of much
importance in itself.  You can tell that by asking whether you could
transform the one approach into the other (e.g., by using a stylesheet).  If
you can, they are basically isomorphic and who cares (religous opinions
aside!)? If you cannot, there is nothing to choose between, since you have
to go with the form that works for your data.

In relational database practice, naming data elements properly can be of
major importance, because most of the semantics tend to be conveyed by the
names.  It sounds like that is the case here too.

What is more important than elements vs attributes is the reusability of
processing, and the ability to know when two structures are essentially the
same kind of thing.  With this in mind, your example # 3 (when the typos are
fixed, of course) is probably closer to the mark, but maybe not there yet
either.

I think that the parent element names should be the same where ever
possible, and you should use attributes for differentiating.  The reason is
that I find that doing special cases on element names is the most clumsy and
least pleasant way to process xml data - and I have some applications where
I do that.

So I would rather see this:

<yearlyBudget type='current-year' status='final'>2999</yearlyBudget>
<yearlyBudget type='current-year' status='estimated'>2999</yearlyBudget>
<yearlyBudget type='2002' status='estimated'>2999</yearlyBudget>

than this:

<finalCurrentYearlyBudget>2999</finalCurrentYearlyBudget>
<estimatedCurrentYearlyBudget>2999</estimatedCurrentYearlyBudget>
<estimatedPreviousYearlyBudget>2999</estimatedPreviousYearlyBudget>

In the first case, it is easy to do the same processing for each element and
stick it into the right place, as determined by the attribute value.  In the
second place, you have to know that the same processing needs to be done on
all three, AND you have to know the right place to put the results.  That
requires more design, more processing, and may lead to more maintenance
problems later on.

However, if I would only ever have that one type of budget, I would prefer
the first form.  But I bet that you will have several types of budget and
would want to reuse the structure and the processing.  My first example is
more reusable in that sense.

Also, if all your element structures are going to be flat with single data
elements, so that you basically have a simple relational table, it makes
much less difference.
The principles I mention above are more significant when the xml structure
is more complex, with a number of nested children, so that the processing is
more complex.

However, if you know there will only be a smallish number of variations and
that they will be relatively stable, using different element names would not
be too bad since it would be reasonably easy to look for them all.

Bottom line - prefer to use values rather than names to denote variations.
Choose names based on

1) The desired semantics and data model.
2) The expected number of similar variations where you would want to do the
same processing for the variations - including the amount of changes and
extensions that you are pretty sure will be coming.

Do not spend time choosing among essentially isomorphic alternatives, like
elements vs attributes.  Since you cannot have repeated attributes in an
element, a need for repeated data elements would be a non-isomorphic
situation.

Cheers,

Tom P

References:
- 3 possible approaches for representing concepts
  - From: "Chiusano Joseph" <chiusano_joseph@bah.com>

Prev by Date: Re: [xml-dev] CDATA
Next by Date: Re: [xml-dev] Re: REX: XML Shallow Parsing with Regular Expressions
Previous by thread: Re: [xml-dev] 3 possible approaches for representing concepts
Next by thread: RE: [xml-dev] 3 possible approaches for representing concepts
Index(es):
- Date
- Thread