[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: [xml-dev] XML Database Decision Tree?

From: Michael Rys <mrys@microsoft.com>
To: Soumitra Sengupta <soumitra@b-bop.com>,"Champion, Mike" <Mike.Champion@SoftwareAG-USA.com>
Date: Fri, 19 Oct 2001 12:30:34 -0700
I would like to just point out that SQLServer 2000 takes a different
approach than the one outlined below in that it allows to map XML to and
from relational data by simply mapping an XML Schema to the existing
relational schema.

There are tools to make this very simple. It also guards the relational
schema against arbitrary XML schema changes and vice versa, since only
the mapping needs to be adopted. 

Please check out my papers at XML 2000 and ICDE 2001 or the SQL XML site
at
http://www.microsoft.com/sql/techinfo/xml/default.asp

Best regards
Michael

> -----Original Message-----
> From: Soumitra Sengupta [mailto:soumitra@b-bop.com] 
> Sent: Friday, October 19, 2001 11:29 AM
> To: Champion, Mike
> Cc: xml-dev@lists.xml.org
> Subject: Re: [xml-dev] XML Database Decision Tree?
> 
> 
> Mike has given me the courage to take a bite as well.  Thanks Mike.
> 
> I had read this article a while back on alphaWorks.  Here is 
> my 2 bits if anyone
> cares.
> 
> Let me start with the objective of the article as stated by 
> Kevin Williams:
> 
> "My objective in this column is to see when it makes sense to 
> store structured
> data in these specialized databases."
> 
> Then go to the conclusion:
> 
> "While those databases are ideally suited to the management 
> of document-oriented
> information  (unstructured or semi-structured data), they 
> don't make sense for
> use for structured data. If you need to make structured 
> information accessible
> as XML, you're better off taking advantage of the XML support 
> provided by
> relational-database vendors instead."
> 
> I agree with Kevin 100% if the purpose is to make existing structured
> information accessible as XML.  Dan Suciu of University of 
> Washington terms this
> as "XML Publishing" and defines it as "the problem of 
> transforming existing,
> relational data into XML."  It is a good paper to get an 
> understanding of the
> problem and for those who care to read it, the title is "On 
> Database Theory and
> XML" by Dan Sucio, University of Washington
> (www.cs.washington.edu/homes/suciu).  He goes on to say that 
> "this is the same
> as defining an XML view over the relational data, but this 
> view is considerably
> more complex than relational views."  He goes on to show the 
> problems and issues
> with this approach.  Another paper that I found useful in 
> this regard is the one
> from Josephine Cheng and Jane XU from IBM Silicon Valley 
> titled "IBM DB2 XML
> Extender - An end-to-end solution for storing and retrieving 
> XML documents."
> They do not go into all the theoretical issues but 
> essentially describe the
> approach taken in DB2.  I find their statement "Also, if 
> retrieval of an entire
> document is desired, performance is faster because there is 
> no need to compose
> the XML document from DB2 data."  This essentially reinforces 
> Dan's conclusion
> about creating a XML view over relational data.  From a 
> purely implementation
> standpoint, this is achieved either by using JOIN or UNION.  
> I believe Microsoft
> is using the second and there are performance advantages to this.
> 
> So this leads me to conclude that even simple publishing of 
> XML data from
> standard relational databases is not a trivial problem and we 
> should be aware of
> the performance and other issues (the queries to create the 
> view performs more
> joins than necessary and typically are not minimal).
> 
> Now let us examine the issue of storing XML documents in RDMS 
> as implemented by
> the RDMS vendors.
> 
> There are 2 approaches that I have seen (at least in Oracle 9i).
> 
> 1. Store the XML document in its entirety and use text 
> indexing using the
> Context engine
> 
> This approach has the drawbacks that Kevin correctly points 
> out in his article.
> The performance is also questionable.  Also questionable is 
> the ability to
> search across a collection of documents for an Author 
> Firstname and Lastname.
> 
> 2. Use the DTD or Schema to derive a relational schema
> 
> This is the approach that most RDMS vendors want us to use 
> (again at least from
> what I know of Oracle and DB2).  As Dan points out in his 
> paper, one table is
> created for "each element type that can occur in a collection 
> position."  But if
> the content model changes slightly (example Dan gives is one 
> such case person
> changing from (name, phone) to (name, phone*), the underlying 
> relational schema
> has to change.  There are a bunch of other issues and for a 
> full listing look up
> Dan's paper.
> 
> The problem we have with "Native XML" databases as I see it 
> is the theoretical
> models and research is way behind practical implementations 
> and use.  RDMS is
> based on solid theoretical work done by E.F. Codd.  I have 
> seen signs of the
> academic community picking this problem up and creating the 
> foundation on which
> XML databases will be built.  We at B-Bop is working with one 
> such group and I
> would encourage all the academic lurkers in this group to 
> start looking at this
> problem more closely.  Till the theoretical foundations are 
> put in place, we
> will keep on arguing the merits and de-merits of all the 
> approaches and users
> will have to use their judgement in picking one solution over another.
> 
> Having said that I would like Mr. Williams to look at all the 
> issues a bit more
> closely before making blanket statements like " While those 
> databases are
> ideally suited to the management of document-oriented 
> information  (unstructured
> or semi-structured data), they don't make sense for use for 
> structured data."
> But I also know that we will have to live with this level of 
> insight for some
> time to come.
> 
> Regards,
> 
> Soumitra
> 
> "Champion, Mike" wrote:
> 
> >
> > > Native XML databases; a bad idea for data?
> > > 
> http://www-106.ibm.com/developerworks/library/x-xdnat.html?n-x-10181
> > >
> >
> > OK, I'll bite.
> >
> > 1) Storage: "decomposing the XML document to persist it to 
> a relational
> > database is not all that difficult" -- I have a one-word 
> answer: "DocBook?".
> > <sneer>
> > "Do you really want your information's structure to vary?"  
> Uhh, no ... but
> > do I usually get a vote?  Simplicity of storage is probably 
> the biggest
> > advantage of a native XML DBMS: it is easy take a chunk of 
> XML and store it
> > in any of the native XML DBMSs I've researched; it takes 
> considerable
> > analysis and programming in the XML-enabled RDBMSs.  This 
> makes native XML
> > DBMSs valuable as reliable cacheing/logging tools for 
> simple XML messages,
> > not to mention making it possible to do useful things with complex
> > documents.
> >
> > 2) Retrieval: "some of the native XML platforms require 
> that the entire
> > document be returned from the database"  Uhh, which ones?  
> Not the ones
> > supporting XPath, which is pretty much all of them, AFAIK.
> > "many relational database vendors are currently 
> implementing thin XML
> > serializer wrappers that enable them to generate XML 
> documents on demand
> > from relational data."  Right, but I thought we were 
> talking about XML data?
> >
> > 3)Searching: "searching will return only a set of XML 
> documents; the calling
> > program must then take further action on those documents, 
> if necessary."
> > True. SQL has a far richer set of data manipulation 
> operators than XPath
> > (XQuery addresses this to some extent), and is (loosely) 
> based on a general
> > theory of data, and XML is much more ad hoc.
> > "these features are indispensable when working with 
> traditional documents,
> > where context plays a large role in meaning, but far less 
> important when
> > working with structured data"  Absolutely; if you don't 
> have relatively
> > complex document-like XML, the *searching* features of a 
> native XML DBMS
> > don't buy you much over storing simple XML in an RDBMS.  
> But if you do have
> > "interesting" XML schema, e.g. with recursive elements, 
> simple XPath queries
> > can address problems that seriously challenge SQL experts.
> >
> > 4) Aggregation: "pulling information together and rolling 
> it up (into sums,
> > averages, and so on) is quite difficult."  He's right.  
> Guilty as charged,
> > although he didn't emphasize "joins" as much as I would 
> have.  Again, XQuery
> > will help, but what we really need is a better theoretical 
> understanding of
> > how hierarchical XML data relates to the relational model, sortof an
> > "InfoSet Algebra" if you will. XQuery is at least 
> addressing this problem of
> > formalizing XML operators, and if even if they fail it will 
> be dissertation
> > fodder for a new generation of CS theorists :~)
> >
> > -----------------------------------------------------------------
> > The xml-dev list is sponsored by XML.org <http://www.xml.org>, an
> > initiative of OASIS <http://www.oasis-open.org>
> >
> > The list archives are at http://lists.xml.org/archives/xml-dev/
> >
> > To subscribe or unsubscribe from this elist use the subscription
> > manager: <http://lists.xml.org/ob/adm.pl>
> 
> --
> Soumitra Sengupta, Ph.D.
> Co-Founder and C.T.O.
> B-Bop Associates Inc.
> Phone: 650-340-2700
> Fax  : 650-340-2701
> Email: soumitra@b-bop.com
> http://www.b-bop.com
> 
> 
> 
> -----------------------------------------------------------------
> The xml-dev list is sponsored by XML.org <http://www.xml.org>, an
> initiative of OASIS <http://www.oasis-open.org>
> 
> The list archives are at http://lists.xml.org/archives/xml-dev/
> 
> To subscribe or unsubscribe from this elist use the subscription
> manager: <http://lists.xml.org/ob/adm.pl>
>
Prev by Date: RE: [xml-dev] Browser-based XML Editor?
Next by Date: [xml-dev] ANNOUNCE: XML Spy 4.1 Suite now available - adds support forXSL:FO (Formatting Objects) and PDF conversion
Previous by thread: Re: [xml-dev] XML Database Decision Tree?
Next by thread: RE: [xml-dev] XML Database Decision Tree?
Index(es):
- Date
- Thread