xml-dev - Re: [xml-dev] Indexing solution for native XML database

Re: [xml-dev] Indexing solution for native XML database

[ Lists Home | Date Index | Thread Index ]

To: <xml-dev@lists.xml.org>
Subject: Re: [xml-dev] Indexing solution for native XML database
From: "Ken North" <kennorth@sbcglobal.net>
Date: Thu, 1 Dec 2005 14:16:13 -0800
References: <002e01c5f659$fa57afa0$0115a8c0@Elektonika.local>

Michael Kay wrote:
> Relational databases are a very useful tool, there are some jobs they are
> very good at (mainly the kind of jobs that people used punched cards for 50
> years ago).

They do handle numbers and characters quite well, but you're describing the
strengths of vintage-1980s SQL technology. A lot has changed.

Many of the SQL platforms have evolved to a universal database model that
support queries over rich types such as video, images, HTML, XML and so on.
Some use a plug-in model similar to how  web browsers uses plug-ins for Flash or
Adobe PDFs.

Using an SQL database for rich types has implications for query optimization.
Here's an example I've probably beaten to death.

You want to write a web app or services that uses tabular data, XML, maps and
geo-spatial data:

"My GPS coordinates are x, customer profile y and presentation format is
[1024x768|320x240|104x208].
Find the nearest location where I can buy a widget for less than 500 Euros. Give
me a top 10 list with directions, map and a consumer review."

One approach is to use specialized servers -- one for a native XML database, one
for maps, etc. -- each with their own data model, indexing technology and access
methods. To write a query optimizer for that solution requires an understanding
of queries across distributed data stores using different data models, indexing
and access methods.

That solution requires integration middleware that processes query results
before delivering a solution to the client. It will require plenty of network
roundtrips to get statistics, data and metadata.

Contrast that with having all of the data managed by a single DBMS (SQL/XML).
Because the query optimizer understands the disparate types and their associated
access methods and indexing techniques, we don't have to develop an optimizer
for a query against an XML database, an image database, geo-spatial data and so
on. We can rely on the query optimizer to determine in what order to retrieve
data instead of having to build that logic into our application or middleware.

To optimize the query and prepare its access plan, the query optimizer does disk
seeks or cache reads, not RPCs over TCP/IP connections. Retrieving the data when
executing the plan is also a matter of doing disk seeks or reading cache memory,
instead of round-tripping messages across a network wire.

The performance gains you expect from using specialized data stores are negated
if your applications have to process several types of data, each with their own
specialized server or engine.

Follow-Ups:
- RE: [xml-dev] Indexing solution for native XML database
  - From: "Michael Kay" <mike@saxonica.com>

Prev by Date: How to create "strict" flag for schema?
Next by Date: RE: [xml-dev] Common Word Processing Format
Previous by thread: RE: [xml-dev] Indexing solution for native XML database
Next by thread: RE: [xml-dev] Indexing solution for native XML database
Index(es):
- Date
- Thread