[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [xml-dev] XML Database Decision Tree?
- From: Soumitra Sengupta <soumitra@b-bop.com>
- To: "Champion, Mike" <Mike.Champion@SoftwareAG-USA.com>
- Date: Fri, 19 Oct 2001 11:29:06 -0700
Mike has given me the courage to take a bite as well. Thanks Mike.
I had read this article a while back on alphaWorks. Here is my 2 bits if anyone
cares.
Let me start with the objective of the article as stated by Kevin Williams:
"My objective in this column is to see when it makes sense to store structured
data in these specialized databases."
Then go to the conclusion:
"While those databases are ideally suited to the management of document-oriented
information (unstructured or semi-structured data), they don't make sense for
use for structured data. If you need to make structured information accessible
as XML, you're better off taking advantage of the XML support provided by
relational-database vendors instead."
I agree with Kevin 100% if the purpose is to make existing structured
information accessible as XML. Dan Suciu of University of Washington terms this
as "XML Publishing" and defines it as "the problem of transforming existing,
relational data into XML." It is a good paper to get an understanding of the
problem and for those who care to read it, the title is "On Database Theory and
XML" by Dan Sucio, University of Washington
(www.cs.washington.edu/homes/suciu). He goes on to say that "this is the same
as defining an XML view over the relational data, but this view is considerably
more complex than relational views." He goes on to show the problems and issues
with this approach. Another paper that I found useful in this regard is the one
from Josephine Cheng and Jane XU from IBM Silicon Valley titled "IBM DB2 XML
Extender - An end-to-end solution for storing and retrieving XML documents."
They do not go into all the theoretical issues but essentially describe the
approach taken in DB2. I find their statement "Also, if retrieval of an entire
document is desired, performance is faster because there is no need to compose
the XML document from DB2 data." This essentially reinforces Dan's conclusion
about creating a XML view over relational data. From a purely implementation
standpoint, this is achieved either by using JOIN or UNION. I believe Microsoft
is using the second and there are performance advantages to this.
So this leads me to conclude that even simple publishing of XML data from
standard relational databases is not a trivial problem and we should be aware of
the performance and other issues (the queries to create the view performs more
joins than necessary and typically are not minimal).
Now let us examine the issue of storing XML documents in RDMS as implemented by
the RDMS vendors.
There are 2 approaches that I have seen (at least in Oracle 9i).
1. Store the XML document in its entirety and use text indexing using the
Context engine
This approach has the drawbacks that Kevin correctly points out in his article.
The performance is also questionable. Also questionable is the ability to
search across a collection of documents for an Author Firstname and Lastname.
2. Use the DTD or Schema to derive a relational schema
This is the approach that most RDMS vendors want us to use (again at least from
what I know of Oracle and DB2). As Dan points out in his paper, one table is
created for "each element type that can occur in a collection position." But if
the content model changes slightly (example Dan gives is one such case person
changing from (name, phone) to (name, phone*), the underlying relational schema
has to change. There are a bunch of other issues and for a full listing look up
Dan's paper.
The problem we have with "Native XML" databases as I see it is the theoretical
models and research is way behind practical implementations and use. RDMS is
based on solid theoretical work done by E.F. Codd. I have seen signs of the
academic community picking this problem up and creating the foundation on which
XML databases will be built. We at B-Bop is working with one such group and I
would encourage all the academic lurkers in this group to start looking at this
problem more closely. Till the theoretical foundations are put in place, we
will keep on arguing the merits and de-merits of all the approaches and users
will have to use their judgement in picking one solution over another.
Having said that I would like Mr. Williams to look at all the issues a bit more
closely before making blanket statements like " While those databases are
ideally suited to the management of document-oriented information (unstructured
or semi-structured data), they don't make sense for use for structured data."
But I also know that we will have to live with this level of insight for some
time to come.
Regards,
Soumitra
"Champion, Mike" wrote:
>
> > Native XML databases; a bad idea for data?
> > http://www-106.ibm.com/developerworks/library/x-xdnat.html?n-x-10181
> >
>
> OK, I'll bite.
>
> 1) Storage: "decomposing the XML document to persist it to a relational
> database is not all that difficult" -- I have a one-word answer: "DocBook?".
> <sneer>
> "Do you really want your information's structure to vary?" Uhh, no ... but
> do I usually get a vote? Simplicity of storage is probably the biggest
> advantage of a native XML DBMS: it is easy take a chunk of XML and store it
> in any of the native XML DBMSs I've researched; it takes considerable
> analysis and programming in the XML-enabled RDBMSs. This makes native XML
> DBMSs valuable as reliable cacheing/logging tools for simple XML messages,
> not to mention making it possible to do useful things with complex
> documents.
>
> 2) Retrieval: "some of the native XML platforms require that the entire
> document be returned from the database" Uhh, which ones? Not the ones
> supporting XPath, which is pretty much all of them, AFAIK.
> "many relational database vendors are currently implementing thin XML
> serializer wrappers that enable them to generate XML documents on demand
> from relational data." Right, but I thought we were talking about XML data?
>
> 3)Searching: "searching will return only a set of XML documents; the calling
> program must then take further action on those documents, if necessary."
> True. SQL has a far richer set of data manipulation operators than XPath
> (XQuery addresses this to some extent), and is (loosely) based on a general
> theory of data, and XML is much more ad hoc.
> "these features are indispensable when working with traditional documents,
> where context plays a large role in meaning, but far less important when
> working with structured data" Absolutely; if you don't have relatively
> complex document-like XML, the *searching* features of a native XML DBMS
> don't buy you much over storing simple XML in an RDBMS. But if you do have
> "interesting" XML schema, e.g. with recursive elements, simple XPath queries
> can address problems that seriously challenge SQL experts.
>
> 4) Aggregation: "pulling information together and rolling it up (into sums,
> averages, and so on) is quite difficult." He's right. Guilty as charged,
> although he didn't emphasize "joins" as much as I would have. Again, XQuery
> will help, but what we really need is a better theoretical understanding of
> how hierarchical XML data relates to the relational model, sortof an
> "InfoSet Algebra" if you will. XQuery is at least addressing this problem of
> formalizing XML operators, and if even if they fail it will be dissertation
> fodder for a new generation of CS theorists :~)
>
> -----------------------------------------------------------------
> The xml-dev list is sponsored by XML.org <http://www.xml.org>, an
> initiative of OASIS <http://www.oasis-open.org>
>
> The list archives are at http://lists.xml.org/archives/xml-dev/
>
> To subscribe or unsubscribe from this elist use the subscription
> manager: <http://lists.xml.org/ob/adm.pl>
--
Soumitra Sengupta, Ph.D.
Co-Founder and C.T.O.
B-Bop Associates Inc.
Phone: 650-340-2700
Fax : 650-340-2701
Email: soumitra@b-bop.com
http://www.b-bop.com