OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Fwd: Re: [xml-dev] storing XML files]

Ronald Bourret wrote:
Soumitra Sengupta wrote:

This is a valid way of solving XML storage, indexing and retrieval problem if
you have:

a. you have pretty well defined DTD or schema for all the incoming documents and
they are unlikely to change. You do not expect to add new DTD or schemas in the
future often.
b. the hierarchical relationship expressed in these documents are not very rich
or are not too nested.

Agreed. However, I would caution that "not very rich" and "not too
nested" are not well-defined terms. That is, what is acceptable
performance in one application is not acceptable in another.
I should have been more precise but one person's "rich" is another person's "flat".  I guess semi-structured
would have been a more appropriate definition.


Furthermore, native XML databases are not guaranteed to outperform
XML-enabled relational databases in all cases. As a general rule:

1) Native XML databases should outperform XML-enabled relational
databases on queries along the XML hierarchy, such as XPath queries.


2) It's anybody's guess which will perform better on queries not along
the hierarchy, such as a query that inverts an XML document.
Theoretically, it seems that if the queried fields are indexed in both
types of database and similarly clever people implement the query and
storage engines, performance should be comparable. However, I don't know
enough about storage and query engines to know if this is true.

3) Native XML databases could encounter serious problems querying
unindexed data. This is because they have to search many more nodes to
find the data. That is, first they have to determine if a node is the
correct node ("Are you a foo element?"), then they have to look at the
data for the node ("Is your value 'bar'?"). Relational databases only
have to look at a given set of column values.
Xfinity already solves this problem for any well formed XML document.


If on the other hand you do not have control on the structure of the incoming
documents, they vary a lot and they are very rich, you should look at using a
native XML database.

Agreed, with the caveat that "very rich" is an ill-defined term.

There is a slight possibility that the customer moves to Oracle, so my
solution shouldn't be database provider dependent.

I recommend that you look at our Xfinity Server 2.0
(http://www.b-bop.com/products_xfinity_server.htm) as it will allow you the
benefits of a native XML database but protect your investment in RDMS technology
like MS-SQL server or Oracle. It meets your requirement of "database provider"

Note that Xfinity provides database independence because it is built on
top of a relational database. This allows you to switch the underlying
database. This is not true of all native XML databases, many of which
have proprietary database engines.

Is it smart to develop a generalized format into which all those data feeds
will be transformed (I hope an XSLT is a good tool to do that)? Or, I should
deal with them individually?
Depends on whether the structure is pretty static, how many different structure
are you dealing with, are they likely to change a lot, are they very rich? Also
how much time and money do you want to spend on development and do you want to
protect your existing investment?

Personally, I don't think this is a good idea. The types of data you
mentioned (stock quotes, weather report, events information, daily news,
horoscope, etc.) are all radically different from each other. It makes
little sense to convert them to a common XML schema.
In most cases that would be true.  But if they want to search on kewords and the issue date for
each of these documents and nothing more, this would work just fine.


Can I apply XSLT on my XML data feeds and directly produce SQL statements?
I have not seen anything that does this.  Anyone with an answer to this?

I haven't either. However, there are a number of middleware products
that will store the data for you. (These create and execute the SQL
statements internally.) See the middleware section of the product list
mentioned by Chris.

Has someone already developed an XML2SQL transformation tool?

Yes, although they actually transfer the data instead of generating SQL
statements. Again, see the middleware section of the product list, as
well as the XML-enabled databases section.

Just a flat
table solution or I can also capture relationship information?

Many of the middleware solutions, as well as all of the XML-enabled
relational databases, use an object-relational mapping that captures
relationship information.

We have solved this problem in Xfinity.  All the rich relationships are captured
and stored.

I think it is important to understand a fundamental difference between
what a native XML database built on top of a relational database (like
Xfinity) and an XML-enabled relational database (like SQL Server). For
example, consider the following document:

<SalesOrder SONumber="12345">
<CustomerNumber="543" />
<Item ItemNumber="1">
<PartNumber="123" />
<Item ItemNumber="2">
<PartNumber="456" />

In an XML-enabled relational database, this will be mapped to two
tables: Orders and Items. In a native XML database built on a relational
database, this will be mapped to tables such as Elements, Attributes,
and PCDATA. (In
Couldn't have said it better.


Is it the right thing to save those XMLs in to a relational database or
shall I explore the native XML databases?
Depends on incoming XML and the needs of your application.


Is there an existing DTD for SQL or that makes no sense to you?

These exist, but I don't think anybody actually uses them. This is
usually handled internally by the products that transfer data between
XML documents and relational databases.
I believe Oracle and Datachannel do publish something called XPages.


My system should be flexible enough to accept new kinds of data feeds ...

This I think is the key requirement that you have.  Unless there are other
requirements that I do not understand or know of, you really should look at
native XML databases.

Agreed. This is not something that XML-enabled relational databases do
well, so it's worth your time to look at native XML databases.

-- Ron

The xml-dev list is sponsored by XML.org <http://www.xml.org>, an
initiative of OASIS <http://www.oasis-open.org>

The list archives are at http://lists.xml.org/archives/xml-dev/

To subscribe or unsubscribe from this elist use the subscription
manager: <http://lists.xml.org/ob/adm.pl>