OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [xml-dev] XML Database Decision Tree?





> -----Original Message-----
> From: Ronald Bourret [mailto:rpbourret@rpbourret.com]
> Sent: Friday, October 19, 2001 7:37 PM
> To: xml-dev@lists.xml.org
> Subject: Re: [xml-dev] XML Database Decision Tree?
> 


First, perhaps I misunderstood, or perhaps we disagree on 
what "structured data" means.  You (Ron) imply that it means
something like "easily normalizeable," in which Kevin Williams'
assertion is pretty much a tautology.  I assumed it meant
"well-understood, predictable structure", irrespective of 
how easy it would be to model for RDBMS storage.  A bill of 
materials is "structured data" as far as I understood the
term, but is not a poster child for the the "RDBMS everywhere"
campaign.  Anyway, the DocBook sneer was a cheap shot ... I 
admit it! ... but an article that starts "This column takes a 
look at so-called native XML databases"  by a guy whose 1000
page book called "Professional XML Databases" never (AFAIK from the index)
even mentions the possibility of a "native" XML DBMS doesn't exactly inspire

open-mindedness on my part <grin>.

> > 
> Speaking generally, the native XML DBMS tend  to be more
> flexible in handling data with no schema, or with evolving 
> schema, than solutions based on normalizing XML into RDBMS tables.
> 
> Could you explain this further? The flexibility is easy to understand,
> but I'm having trouble seeing what it buys me. If I store my data 17
> different ways, I'm going to have one heck of a time querying it.

OK.  The various native XML DBMSs, by diverse means, allow 
well-formed XML to be stored and efficiently queried without 
demanding rigid adherence to a schema.  Some do this by never
defining an XML schema at all or by storing well-formed XML that doesn't
match a known schema in a "miscellaneous" collection, some by allowing the
schema to evolve and re-indexing on the fly or whatever, some by allowing
open content models at any point in the schema, or by some combination.
There is a bit of a tradeoff here -- I would guess (I am not up-to-date on
exactly how it works in the current Tamino, and obviously am not privy to
the details of the others) that it is generally more efficient to search on
structures that are rigidly defined. BUT given that  native XML DBMS may use
some sort of inverted list/associative memory/binary relation internal data
model for the XML itself not just the indexes, predictability is probably
not as big an advantage as one might think.  

So, the example of a collection of XML instances whose schema that has
evolved through 17 different iterations would not be a problem at all in
some of the native XML DBMSs, would not be a problem in Tamino in most
cases, and in the typical case all 17 subsets would be stored together in
the "miscellaneous" collection where they can be queried with reasonable, if
not optimal, efficiency. The only time you'd have to query 17 different ways
is if the top-level tag changed with every iteration of the schema; so this
is a worst case scenario involving a completely clueless set of developers
and DBAs.

My larger point (as a couple of people have picked up on) was that most DB
developers do not control their own destiny with respect to schemas -- they
store the data someone tells them to store, and if the schema changes, they
deal with it.  Native XML DBMS developers will spend a lot fewer nights and
weekends "dealing with it" than those trying to store evolving XML schema in
an RDBMS.

> 
> Furthermore, I thought one of the functions of the DBA was to act as a
> mediator (roadblock?) between different groups who want to access the
> same data and who both want to modify the same schema, 
> thereby breaking each others' applications. Doing an end run around this
may 
> yield short term gains, but will be disastrous in the long run.

DBA's generally don't have as much to do in the Native XML DBMS world as
they do in the RDBMS world.  Perhaps that's partly due to the immaturity of
the products; they don't have as many bells and whistles and knobs and
levers ...yet.  It's probably due more to the fact that -- for better or
worse, and admittedly this has a significant downside -- there is less "data
independence" between the logical and physical view of the data in an XML
DBMS. As Ron says, the function of a DBA in the RDBMS world is to be the
janitor that keeps the ivory towers of the various logical views clean and
tidy. In native XML DBMS engines, the logical-physical mapping tends to be
hard-coded rather than something that the DBA can tend to. <aside>Sorry, but
I just *love* the "janitor in the ivory tower" metaphor; I stole it from
Richard Condon's THE MANCURIAN CANDIDATE when I was but a wee nerd.</aside>

I don't think this lack of flexibility in the logical-physical mapping is
"disastrous." Native XML DBMSs are *not* general-purpose information
repositories, they are optimized for one common flavor of information
package: hierarchical documents.  As I think we all agree, if you *do* have
to present XML views of data that is used by non-XML apps, it is probably
better to use a tool that presents an XML view of an underlying RDBMS, but
this will cost you the going rate for a team of top-notch DBAs. 

p.s. I agree with Ron's point in a later message that one node in the
decision tree should consider whether the data is used by other, non-XML
applications.  That's
what I had in mind when I wrote "If you expose data to both existing RDBMS
applications and XML applications, you're probably better off leaving it in
an RDBMS; XML simply has no good notion of "referential integrity" and that
could bite you hard."  

<marketing hat="on">Tamino is evolving to be more than an XML DBMS engine,
it is more like what IDC calls a "virtual database server" that can present
"native" XML representations/manipulations of diverse data in an XML store,
RDBMSs, transactional databases such as Adabas, middleware interfaces to
enterprise applications, etc.  So I have no axe to grind here about the
intrinsic value of one physical storage mechanism vs another.</marketing>