OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [xml-dev] XML Database Decision Tree?



"Champion, Mike" wrote:

> First, perhaps I misunderstood, or perhaps we disagree on
> what "structured data" means.  You (Ron) imply that it means
> something like "easily normalizeable," in which Kevin Williams'
> assertion is pretty much a tautology.

Agreed. Hence my point that the native XML database vendors already
largely agree to most of the points he's making.

>  I assumed it meant
> "well-understood, predictable structure", irrespective of
> how easy it would be to model for RDBMS storage.  A bill of
> materials is "structured data" as far as I understood the
> term, but is not a poster child for the the "RDBMS everywhere"
> campaign.

Also agreed. Lots structured data fits RDBMSs better than bills of
material, so it's a no-brainer to use RDBMSs for that data just like
it's a no-brainer to use native XML databases for documents. It's the
bills-of-material and similar structured data cases that make the line
between where to use RDBMSs and native XML databases fuzzier than one
might initially think.

>  Anyway, the DocBook sneer was a cheap shot ... I
> admit it! ... but an article that starts "This column takes a
> look at so-called native XML databases"  by a guy whose 1000
> page book called "Professional XML Databases" never (AFAIK from the index)
> even mentions the possibility of a "native" XML DBMS doesn't exactly
> inspire open-mindedness on my part <grin>.

Fair enough. My initial reply to DocBook was to say, "He's still wrong,
but the reply is 'semi-structured data', not DocBook," until I noticed
he stated at the close of the article that semi-structured data wasn't
being considered either.

It's too bad that the writing in some cases added an unnecessary bias to
an article that otherwise made some good points.

> > Could you explain this further? The flexibility is easy to understand,
> > but I'm having trouble seeing what it buys me. If I store my data 17
> > different ways, I'm going to have one heck of a time querying it.

[explanation of why native XML databases are flexible snipped]

> So, the example of a collection of XML instances whose schema that has
> evolved through 17 different iterations would not be a problem at all in
> some of the native XML DBMSs, would not be a problem in Tamino in most
> cases, and in the typical case all 17 subsets would be stored together in
> the "miscellaneous" collection where they can be queried with reasonable, if
> not optimal, efficiency. The only time you'd have to query 17 different ways
> is if the top-level tag changed with every iteration of the schema; so this
> is a worst case scenario involving a completely clueless set of developers
> and DBAs.

This is the part where I said you'd have to beat me over the head a bit
:)

If you keep changing the schema in really nasty ways, such as changing
element type names, combining or splitting existing elements, or
inserting new elements into the middle of the hierarchy, no amount of
flexibility is going to make your queries any easier. They will have to
cope with a multitude of incompatible changes.

I think that that means there are only two allowable changes:

1) Adding new elements and attributes, with elements added to the end of
existing content models

2) Deleting optional elements and attributes.

(Is this correct? Or are there more changes that you can make in a way
that doesn't break existing queries?)

Since these are equivalent to adding/deleting nullable columns, what is
the technical advantage afforded by the native XML database? I just
don't see it.

> My larger point (as a couple of people have picked up on) was that most DB
> developers do not control their own destiny with respect to schemas -- they
> store the data someone tells them to store, and if the schema changes, they
> deal with it.  Native XML DBMS developers will spend a lot fewer nights and
> weekends "dealing with it" than those trying to store evolving XML schema in
> an RDBMS.

I'm also having a hard time buying this argument. All you're saying is
that there is no bureaucracy (yet) in place to control the contents of
the native XML database. This is similar to what happened with
spreadsheets when they first came out -- people got control of their own
data and the DBAs lost out.

If this represents a true political change on the part of corporations,
fair and good. Native XML databases can take credit as the agents of
change, although it's not clear that their technology had as much to do
with it as their newness made it possible to slip in under the radar.

But what I'm not convinced of is that, should native XML databases
succeed, the bureaucracy controlling what goes into the database won't
try to take them over. Granted, the schema-less nature of native XML
databases will make that job harder, but it won't stop them from trying.

> I don't think this lack of flexibility in the logical-physical mapping is
> "disastrous."

That wasn't the disastrous I was referring to. The "disastrous" I meant
was when one group starts storing data in a way that it is very
difficult for another group to use. Somebody has to mediate this
process, whether it's DBAs, developers, or your Aunt Sarah's Ouija
board. Failing to do so will cause problems down the road. Doing so will
reduce the flexibility of native XML databases, although for political
reasons, not technical reasons.

> <marketing hat="on">Tamino is evolving to be more than an XML DBMS engine,
> it is more like what IDC calls a "virtual database server" that can present
> "native" XML representations/manipulations of diverse data in an XML store,
> RDBMSs, transactional databases such as Adabas, middleware interfaces to
> enterprise applications, etc.  So I have no axe to grind here about the
> intrinsic value of one physical storage mechanism vs another.</marketing>

I think this is a very good point. Native XML databases look like a good
way to integrate data from a variety of backends, and I think the
winning native XML databases of the future will do this transparently
and bi-directionally. (Some are already doing it today.) Relational
databases may have problems in this area, since the result of
integrating data from a variety of sources is likely to be
semi-structured data, not structured data.

-- Ron