xml-dev - Re: [xml-dev] XML-enabled databases, XQuery APIs

Re: [xml-dev] XML-enabled databases, XQuery APIs

[ Lists Home | Date Index | Thread Index ]

Subject: Re: [xml-dev] XML-enabled databases, XQuery APIs
From: Ronald Bourret <rpbourret@rpbourret.com>
Date: Fri, 15 Apr 2005 09:05:09 -0700
Cc: xml-dev@lists.xml.org
In-reply-to: <425F9008.4030304@metalab.unc.edu>
References: <003001c54133$ae25d660$1601a8c0@DURANTE> <425F9008.4030304@metalab.unc.edu>
User-agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.4) Gecko/20030624 Netscape/7.1 (ax)

Elliotte Harold wrote:

> Ken North wrote:
> 
>> When you store a document in a text column, it's searchable using SQL 
>> queries.
>> Those queries use the indexing capabilities of software such as Oracle
>> interMedia Text Cartridge, DB2 Text Extender and Microsoft Search 
>> service.
> 
> Yes, but suppose I want to find all occurrences of the string "Now is 
> the time for all good men to come to the aid of their country." XML 
> unaware full text search would not normally find "<strong>Now</strong> 
> is the time for all good men to come to the aid of their country."
> 
> More importantly, if I have XML data then I normally want to use the 
> markup to delimit and index the searches. For instance, I might want to 
> find all prescriptions of Xanax with a dosage of 5mg, not all 
> occurrences of the word "Xanax" somewhere near the text "5mg". If we're 
> just storing everything in a CLOB, we might as well be using flat files. 
> The markup has meaning. It should be used to inform and optimize the 
> queries.

Agreed.

The full-text search engines in some of the big databases (Oracle, DB2, 
not sure about SQL Server) have this capability to some extent. That is, 
they are XML aware in that they are able to recognize the difference 
between text and markup and that they are able to restrict text searches 
to a certain part of the markup.

This is probably sufficient for many applications, but insufficient for 
others. For example, DB2's Net Search Extender requires you to specify a 
document model (essentially an XML schema) when setting up the index for 
a CLOB column. This would make it impossible to store documents 
conforming to multiple different versions of the same schema in that 
column, since different models would be required for each schema version.

CLOB storage has other problems as well, which is why I highlighted it 
in my comments to Edd. For example, you can't do node-level updates. Not 
a problem when updating a 10K document, but definitely a problem when 
you want to change the value of a single node in a 100MB document and 
you have to reinsert, reparse, and reindex it. And CLOB storage 
generally doesn't support XPath or XQuery. (Actually, Oracle's XML CLOB 
storage does support XPath, but good luck if the XPath expression can't 
be evaluated against the indexes.)

So while such storage is adequate for some applications, it's inadequate 
for other applications. Which is why the big relational databases have 
added / are adding native XML storage.

-- Ron

References:
- XML-enabled databases, XQuery APIs
  - From: "Ken North" <kennorth@sbcglobal.net>
- Re: [xml-dev] XML-enabled databases, XQuery APIs
  - From: Elliotte Harold <elharo@metalab.unc.edu>

Prev by Date: Re: [xml-dev] [ANN] smallx XML Infoset and Pipeline Released (OpenSource)
Next by Date: Re: [xml-dev] XML-enabled databases, XQuery APIs
Previous by thread: more products
Next by thread: Re: [xml-dev] XML-enabled databases, XQuery APIs
Index(es):
- Date
- Thread