OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [xml-dev] XML Database Decision Tree?





> -----Original Message-----
> From: Nicolas LEHUEN [mailto:nicolas.lehuen@ubicco.com]
> Sent: Wednesday, October 24, 2001 9:10 AM
> To: 'Champion, Mike'; 'xml-dev@lists.xml.org'
> Subject: RE: [xml-dev] XML Database Decision Tree?
> 
>
> 
> So I believe there is a whole set of problems that will 
> benefit from XML databases (which are I believe based on the hierarchical 
> database model*, maybe Mike can confirm/infirm). The storage, indexation
and 
> querying of a set of document-oriented data is a good example. 
> 
> But XML databases isn't or (won't) be a revolution, blasting all other
> storage models. We could even say that the XML database model 
> is just a come back of the hierarchical model that was supposedly 
> "killed" by the relational model back in the 80s. 

First, NO ONE that I know of has said that XML DBMS will be a "revolution"
that will blast all other storage models.  The strongest assertion I know of
is that there is a large class of problems that ordinary humans and
businesses can address more expediently with XML DBMS than with RDMBS.  I
think the debate is mostly over what that class of problems is and how big
it is.  

I'm not sure if XML DBMS are "based on the hierarchical database model" in
any formal sense, but there is clearly some overlap and inspiration. I
distinctly remember a flash of neuronal activity when I first heard of
Tamino a few years ago: "Oh yeah, Adabas handles hierarchical data rather
than forcing it to be normalized, XML is hierarchical, I bet they have some
deep insight on how to build an efficient XML DBMS!"  When I joined Software
AG some months later, I learned that it's not quite that simple code-wise,
but obviously there is much of Adabas' intellectual heritage in Tamino.

Your [Nicolas] previous post was in the thread inspired by Fabian Pascal's
rants against XML in his column in searchdatabase.techtarget.com and his
various comments on the intelligence of several of us in his dbdebunk.com
site.  I posted a summary of that thread to the "DBA Water Cooler" forum sat
earchdatabase.techtarget.com but got no substantive response (other than
additional reminders about my lack of intellectual acuity, of course!).  I
think there's a very interesting question here: Codd *demolished* the
CODASYL data model as a respectable intellectual activity, and clearly
showed the formal superiority of the relational model.  Nevertheless, as the
statistics I posted earlier in this thread indicate, hierarchical and other
"pre-relational" DBMS keep chugging along, running the world economy in the
back offices of the Fortune 1000.  Finally, XML DBMS  are finding a niche
that some analysts (FWIW) are projecting to be in the billions of dollars in
a couple of years.  If the hierarchical model is dead, why won't it stay
quietly buried? 

One answer is "my code works, my business runs, it ain't broke so I don't
care if some professor sneers at it."  OK, but that begs the question of
why? If the relational model is universally superior, no one has come along
and done it better, running the old fogies out of business.  

I've come up with two reasons, FWIW.  One is that humans can intuitively
envision and reason about hierarchical relationships more easily than they
can perform logical "joins" between diverse facts.  Thus, thinking about
inter-related bits of information as "documents" (and using the analogy with
paper) is just a whole lot easier than doing normalization.  Those who
disagree and say that people intuitively love tables (witness the explosion
of the spreadsheet metaphor in the last 20 years) miss the point that the
formal power of the relational model comes not thinking in tables, it's
about relations (no duplicate rows!), unique keys, foreign key constraints,
joins, and all sorts of other stuff that is utterly foreign to all but the
RDBMS cognoscenti.  Anyway, I'll assert that the formal superiority of the
relational model as an elegant way to model information doesn't overcome our
evolved-in bias for thinking hierarchically (see Herbert Simon's
"Architecture of Complexity" essay that I keep referencing).  When the going
gets tough, we stick with what evolution has taught us well rather than
using what Codd has tried to beat into our heads.

Second, this also applies to computers in that a) programs are written by
humans and b) we evolved to think hierarchically because it's a great
optimization heuristic that works for computers too.  For example, I very
briefly experienced the .bomb "revolution" first hand, and had occasion to
see the data on book, CD, software, etc. catalogs that a company called Muze
supplies to the online stores.  It is BEAUTIFULLY normalized; the raw data
for something that would be one page in Amazon.com is shipped in about 20-25
tables, everything done completely by the book.  I innocently asked, "So,
Amazon is doing a 20-way join on a multi-terabyte database everytime I bring
up a page?  Wow, I didn't know that RDBMS technology had come that far."
The Boss (who had served time in the Real World) basically replied "MO-RON!
They probably denormalize it to hell, and pre-build views that represent the
way people actually look at the data."  [For the record, I have NO IDEA how
Amazon works so well, I'm just relaying folklore]. In any event, I get the
strong impression that real world RDBMS-based applications are doing lots of
essentially hierarchical data processing; XML DBMS simply treat this as an
advertised feature rather than an optimization secret. 

So, in principle the relational model presents the user with a clean,
logical view of arbitrary complexity, and delegates the details of mapping
that clean view to ugly reality to the programmers at Oracle/IBM/Microsoft
and the DBAs at Amazon/etc. There are plenty of cases -- especially where
information is used in different views by multiple applications, or where no
one can tell in advance what relationships will be of interest to end users
-- where this works very well.  There are plenty of other cases --
especially for "documents" where the whole point of the authoring process
was to pre-define interesting relationships -- where it is horribly
inefficient.  And there are all sorts of cases in the middle that neither
RDBMS or XML DBMS know how to handle well, and for which we need better
conceptual and software tools than we have today.