OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Fwd: Re: [xml-dev] storing XML files]



Soumitra Sengupta wrote:

> This is a valid way of solving XML storage, indexing and retrieval problem if
> you have:
> 
> a. you have pretty well defined DTD or schema for all the incoming documents and
> they are unlikely to change.  You do not expect to add new DTD or schemas in the
> future often.
> b. the hierarchical relationship expressed in these documents are not very rich
> or are not too nested.

Agreed. However, I would caution that "not very rich" and "not too
nested" are not well-defined terms. That is, what is acceptable
performance in one application is not acceptable in another.

Furthermore, native XML databases are not guaranteed to outperform
XML-enabled relational databases in all cases. As a general rule:

1) Native XML databases should outperform XML-enabled relational
databases on queries along the XML hierarchy, such as XPath queries.

2) It's anybody's guess which will perform better on queries not along
the hierarchy, such as a query that inverts an XML document.
Theoretically, it seems that if the queried fields are indexed in both
types of database and similarly clever people implement the query and
storage engines, performance should be comparable. However, I don't know
enough about storage and query engines to know if this is true.

3) Native XML databases could encounter serious problems querying
unindexed data. This is because they have to search many more nodes to
find the data. That is, first they have to determine if a node is the
correct node ("Are you a foo element?"), then they have to look at the
data for the node ("Is your value 'bar'?"). Relational databases only
have to look at a given set of column values.

> If on the other hand you do not have control on the structure of the incoming
> documents, they vary a lot and they are very rich, you should look at using a
> native XML database.

Agreed, with the caveat that "very rich" is an ill-defined term.

> > There is a slight possibility that the customer moves to Oracle, so my
> > solution shouldn't be database provider dependent.
> >
> 
> I recommend that you look at our Xfinity Server 2.0
> (http://www.b-bop.com/products_xfinity_server.htm) as it will allow you the
> benefits of a native XML database but protect your investment in RDMS technology
> like MS-SQL server or Oracle.  It meets your requirement of "database provider"
> independent.

Note that Xfinity provides database independence because it is built on
top of a relational database. This allows you to switch the underlying
database. This is not true of all native XML databases, many of which
have proprietary database engines.

> > Is it smart to develop a generalized format into which all those data feeds
> > will be transformed (I hope an XSLT is a good tool to do that)? Or, I should
> > deal with them individually?
> 
> Depends on whether the structure is pretty static, how many different structure
> are you dealing with, are they likely to change a lot, are they very rich?  Also
> how much time and money do you want to spend on development and do you want to
> protect your existing investment?

Personally, I don't think this is a good idea. The types of data you
mentioned (stock quotes, weather report, events information, daily news,
horoscope, etc.) are all radically different from each other. It makes
little sense to convert them to a common XML schema.

> > Can I apply XSLT on my XML data feeds and directly produce SQL statements?
> 
> I have not seen anything that does this.  Anyone with an answer to this?

I haven't either. However, there are a number of middleware products
that will store the data for you. (These create and execute the SQL
statements internally.) See the middleware section of the product list
mentioned by Chris.

> > Has someone already developed an XML2SQL transformation tool?

Yes, although they actually transfer the data instead of generating SQL
statements. Again, see the middleware section of the product list, as
well as the XML-enabled databases section.

> > Just a flat
> > table solution or I can also capture relationship information?

Many of the middleware solutions, as well as all of the XML-enabled
relational databases, use an object-relational mapping that captures
relationship information.

> We have solved this problem in Xfinity.  All the rich relationships are captured
> and stored.

I think it is important to understand a fundamental difference between
what a native XML database built on top of a relational database (like
Xfinity) and an XML-enabled relational database (like SQL Server). For
example, consider the following document:

   <SalesOrder SONumber="12345">
      <CustomerNumber="543" />
      <OrderDate>981215</OrderDate>
      <Item ItemNumber="1">
         <PartNumber="123" />
         <Quantity>10</Quantity>
      </Item>
      <Item ItemNumber="2">
         <PartNumber="456" />
         <Quantity>5</Quantity>
      </Item>
   </SalesOrder>

In an XML-enabled relational database, this will be mapped to two
tables: Orders and Items. In a native XML database built on a relational
database, this will be mapped to tables such as Elements, Attributes,
and PCDATA. (In fact, the database structure is probably more complex
for performance reasons, but this gives you the idea.)

Each structure is appropriate for different needs. The main advantage of
an XML-enabled relational database is that the data can be used by
legacy (non-XML) applications. The main drawback is that you have to map
the data at design time. This doesn't matter if you're working with a
fixed set of schemas (as was mentioned above), but is clearly a problem
if you want to store arbitrary XML.

The main advantages of native XML databases are that you can store
arbitrary XML documents at runtime and that they handle the richness of
XML documents well. (Relational databases do not handle semi-structured
data well.) The main drawback is that non-XML applications cannot access
the data. This obviously doesn't matter if you don't have any such
applications.

> > Is it the right thing to save those XMLs in to a relational database or
> > shall I explore the native XML databases?
> 
> Depends on incoming XML and the needs of your application.

Agreed.

> > Is there an existing DTD for SQL or that makes no sense to you?

These exist, but I don't think anybody actually uses them. This is
usually handled internally by the products that transfer data between
XML documents and relational databases.

> > My system should be flexible enough to accept new kinds of data feeds ...
> >
> 
> This I think is the key requirement that you have.  Unless there are other
> requirements that I do not understand or know of, you really should look at
> native XML databases.

Agreed. This is not something that XML-enabled relational databases do
well, so it's worth your time to look at native XML databases.

-- Ron