[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: [xml-dev] storing XML files
- From: Chris Parkerson <email@example.com>
- To: 'Ronald Bourret' <firstname.lastname@example.org>, email@example.com
- Date: Tue, 09 Oct 2001 09:52:30 -0400
As you can imagine, I have some comments ;->
About data set sizes: the number and size of the documents DOES matter.
Our internal tests against Oracle 9i, IBM DB2, and SQL Server using
standard OAG Purchase Order documents of fixed sizes and relatively
simple schemas of around 10 KB and separate tests with 100 KB and 1 MB
sizes show repeatedly that as the number of documents increases and the
size of the documents increases the performance of queries and
apply/drop indexes degrades exponentially. The more complex the schema,
the faster the degradation, but it still degrades. We used consultants
that came to us as a result of our merger with C-Bridge that are big
relational heads, so we didn't cheat... we tried to make the RDBMS
systems do well with their own tools and they just didn't work. Of
course, we cannot publish the exact test results since we are not an
independent party and Oracle et al's lawyers would come after us if we
About Oracle's XSQL utility: it does not do much at all. It does not
even perform the mapping for you. YOU have to design the appropriate
relational schema for your XML as Oracle does not provide any tools to
assist you in generating a relational schema (a painful and time
intensive process, if the XML schema is not childishly simple). It can
only map into an already defined relational table structure. If your
XML does not match exactly, it resorts to storing it as a BLOB. Now,
Oracle is the only database I'm aware of that even attempts to do this
canonical mapping through any type of tool. However, it's still painful
(we tried and our customers who are long-time Oracle customers tried).
Oracle consultants will not even use this method and will instead resort
to the way Oracle SAYS you should manage XML in Oracle: use the 9i
XMLType, which stores as a BLOB but at least gives you XPath access and
XPath indexing (albeit quite inefficient, non-scalable, and slow
according to our internal tests).
About API and knowledge portability: Not all of the RDBMS systems
support XPath, but most of the XML-enabled middleware and native XML DBs
I've seen do. Also, the tools, utilities, methodologies, and extensions
to SQL are all different between the RDBMS vendors. If you want to make
XML work in an RDBMS system (unless you're using a system like B-Bop
which thankfully hides you from this mess), you have to use the vendor
provided tools which gives you vendor lock in. Vendor lock-in is not so
severe, in my experience, in the native XML DB world. RDBMS systems
were just not designed for managing XML data and it shows.
About Tamino's query support: install Tamino 184.108.40.206. Their query
language is a hodgepodge of some XPath and proprietary syntax that ends
up not XPath and that they unfittingly call "XQuery" (and it's not
XQuery as the community knows it either). It's a fact, they know it,
and I do know they are also attempting to remedy it. Admittedly my
statements were inflammatory and inappropriate, but just install the
product. It's not XPath, no matter what they say... but darn it,
they've got good marketing ;-> Oracle 5 and 6 were crap, too... but Mr.
Ellison is still a very rich man ;->
About hard evidence: Yes, we have hard data on all of our tests. We
have hard data from competitive benchmarks customers have asked us to
perform as well as internal competitive benchmarks. We use real
customer applications to create our benchmarking tests and suites
against our current and prior releases. Of course, we cannot release
this data to you unless you require it to close a deal on our software
From: Ronald Bourret [mailto:firstname.lastname@example.org]
Sent: Tuesday, October 09, 2001 3:15 AM
Cc: 'Albena Georgieva'
Subject: Re: [xml-dev] storing XML files
Chris Parkerson wrote:
> ... and being able to
> do fast queries across large document sets is NOT a requirement (i.e.
> you've got 1000+ stock quote documents, but you do not need to query
> over them as an aggregate), then the XML capabilities of the RDBMS
> vendors should be sufficient.
I don't understand this statement. What does the number of documents
matter? If each stock quote document holds a single quote -- that is, a
single row of data -- then a relational query should be very fast. As
Soumitra Sengupta pointed out in his reply, I think the issues are
nesting depth and how semi-structured the data is, not size.
> 9i's XML support is really limited with their new vaunted "XMLType"
> column data type being nothing more than a convenience wrapper around
> the CLOB type: XML still gets stored as a blob.
This is only part of Oracle 9i's support. They have two other ways to
map XML to the database. The first uses SQL3 object views to perform the
obvious object-relational mapping between XML documents and the
database. The second is the Internet File System (iFS), which uses
> Keep in mind, however,
> that each RDBMS vendors approach to handling XML is going to be
> proprietary: you will have little to no code [portability]
Let's take a closer look at this. An XML/database application usually
consists of a number of parts:
1) APIs. These are proprietary in RDBMSs, but also in native XML
databases as well. That is, there is currently no way to write an XML
application that is portable with respect to database access. This is
because there is no widely supported standard API. (There is a good
start in this direction -- the XML:DB API -- but it is not yet widely
enough implemented by database vendors to make truly portable
applications a reality.)
In fact, the only way current to write an XML application that is
portable across databases is to use object-relational middleware against
a relational database. This is because a number of middleware vendors
use ODBC, OLE DB, or JDBC for database access. So while you will be
locked into a single vendor's API, the application will be portable
2) Query language. This is also not standardized across databases. While
the most popular query langauge is probably XPath (usually with
extensions for multi-document queries), numerous other query languages
-- all of them proprietary -- are supported as well. This is true of
XML-enabled relational databases as well as native XML databases.
There is good reason for this, as XPath is not rich enough to perform
many of the queries needed by users and XQuery is not yet finished. I
suspect that when XQuery is done, you will see many implementations of
3) Update language. Where these are supported, they are all
non-standard. This is because there is no standard XML update language
in existence. (Again, there is an attempt to standardize this with the
XUpdate language, but there are not enough implementations to make it a
4) DOM, SAX, XSLT, namespaces, etc. These are standard across all XML
database products that I have seen -- native XML and XML-enabled.
I therefore think it's fair to say that you'll have roughly the same
amount of code portability with XML-enabled relational databases that
you'll have with native XML databases.
> or knowledge portability.
Again, this is about the same for native XML databases and XML-enabled
relational databases. Both types of databases are based on fairly
consistent models (object-relational mappings for XML-enabled relational
databases and XML document structure for native XML databases) but the
actual implementations are different for every product.
In short, moving code from one XML-enabled relational database to
another means learning new mapping syntax, a new API, and possibly a new
query language. Moving code from one native XML database to another
means learning a new API and possibly a new query language.
> A major advantage of native XML DB systems (save for Software AG's
> Tamino which still uses a proprietary query language and schema
> among other non-standard things) is their adherence to standards.
> Queries are XPath, transformations are XSLT, granular access to
> data is via the standard DOM API, validation can be against DTD or W3C
> XML Schema, etc.
This is a rather inflammatory statement, as it seems to imply Tamino and
XML-enabled relational databases do not adhere to standards. With the
exception of query languages, most XML database products that I have
seen (native XML, XML-enabled databases, middleware) *do* adhere to
standards, Tamino included. (With respect to the non-standard schema
language in Tamino, Tamino was released *long* before XML Schemas were
completed, so it hardly seems fair to complain. A year from now, yes.
> You gain a bit more in terms of code portability as
> well as knowledge portability between different vendors.
I think the operative term is "a bit more".
> You also gain
> a database system inherently designed to deal with the extensibility
> XML data
> and capable of removing the scalability, performance, ease of
> data update, and cross-document limitations of the RDBMS approach.
Actually, this depends a lot on the application. In some cases, this is
definitely true. In others, it is not.
> The reason most XML applications are converting to native
> XML databases is that they've tried to make their applications work
> with RDBMS systems and they failed
Is there any hard data to back this up? Numbers of customers, number of
transactions per day, etc.? While there are certainly applications that
can be built with native XML databases that can't be built with
XML-enabled relational data, the reverse is also true, so I have a hard
time with the word "most".
The xml-dev list is sponsored by XML.org <http://www.xml.org>, an
initiative of OASIS <http://www.oasis-open.org>
The list archives are at http://lists.xml.org/archives/xml-dev/
To subscribe or unsubscribe from this elist use the subscription