OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [xml-dev] XML Database Decision Tree?



Here's another approach to the question of "why native XML database
systems?".  This is in the form of rhetoric, which I acknowledge in
advance overstates the case for the sake of trying to make the point
in an effective way, but I think the core point has some validity.

Suppose you are undertaking to write software that communicates with
other parties by sending and receiving XML documents.  When an XML
document arrives and you need to store it for a while, is it really
always best to dismantle the XML contents into third-normal form, a
collection of rows in several tables that are related by foreign keys
and so on, and then to re-assemble XML by running queries with various
JOIN's over those tables?

Suppose you want to park your car in a garage while you're not using
it.  Should you dismantle your car into bolts, ball bearings, shafts,
cylinders, and so on, putting all the bolts into nice containers
clearly marked "bolts", all the ball bearings into nice jars labelled
"ball bearings", and so on?  When you want to take the car out for a
drive, you just re-assemble it out of all those pieces by joining them
together.

Some RDBMS advocates might way, yes, you really should do that.  After
all, a "car" is just one view of the all those component parts.  Not
every application might want to view the components that way.  There
might be other applications that want to use those same bolts and ball
bearings in order to make a vacuum cleaner.  Or perhaps rather than
driving the car, you might decide that what you want to do with the
car is answer questions like "how many of its component parts weight
more than 20 grams"?  By representing everything in terms of its most
basic parts, you retain the greatest cleanliness and flexibility to
do all these "ad hoc" things that you might want to do someday.

In other words: the relational model was, to some extent, a reaction
to certain problems that often came up among businesses that modelled
their core data using e.g. CODASYL models.  They'd find that their
data was assembled one way, but later new and unanticipated
applications would arise, and they'd find that they needed the data
assembled a different way, along a different axis, and when that
happened a lot of difficult programming was needed.  Relational
databases were introduced to minimize that kind of problem.

The relational model was a particular solution introduced to deal with
a particular problem that arose in a particular set of circumstances
that just happen to come up very often in many businesses.  The
relational model, like all technologies, addresses certain needs and
certain problems; like any tool, it is the right tool for certain jobs
and not for all jobs.  Not every situation in which data is stored is
characterized by those needs and problems!

In the analogy, a car garage simply is not called upon to produce
vacuum cleaners.  If that were one of its requirements, then it might
really store cars by dismantling them.  But in the real world, garage
owners know with a pretty good degree of certainty that producing
vacuum cleaners just isn't one of their engineering requirements.

And of course there are circumstances and use cases where it really is
appropriate to deal with incoming XML documents by storing the
information as rows in a third-normal form relational database.
Without a doubt, the features being added to RDBMS's to help them
"shred" (IBM's word, not mine, nothing pejorative intended!)  XML into
relations are going to be just the ticket for some people and the
right solution for some problems.  There's no one right way; it all
depends on the greater context of what problem you're really trying to
solve.

-- Dan