RE: storing xml files into database

Michael:

Thank you for the correction regarding the UpdateGram support. I look forward to seeing how it compares to ours (and congratulate Microsoft on being serious about XML compared to Oracle and IBM DB2 ;->).

The customers we are running into have multiple problems. Sure, they have data that needs to be integrated into their relational model. And we have the tools to help do that. They also need to integrate with SAP, i2, Ariba, Essbase or Hyperion for analysis, old IBM mainframes, etc. They tried EAI, but EAI was too loosely coupled and they needed a record of the data flowing back and forth. Our DB tends to be used as a middle-tier XML data “hub” that handles pulling data in from many different sources and transforming it out to many different sources. None of this data is “Shakespeare”. You ought to see some of the data models we’ve run across in insurance, financial services, telco, retail, and other sectors that are making use of XML to solve this aggregation/integration problem. You’ve got combinations of structured, semi-structured, and unstructured data from both internal and partner systems that have to be combined into a common data model. The first thing they tried was to normalize it (or even leave it slightly de-normalized) and put it into a relational database. They tried and they failed. It was too painful and the system could not keep up with the volume of this data that they were seeing. These are Global 2000 companies that are not seeing this as a special exception, but as a rule.

Our philosophy is if you can make it fit a relational schema, the schema of the incoming XML does not evolve or change very often, and you’ve made it work and you’re comfortable with it, save for our caching and XSLT capabilities, we do not bring much value. If you can make it fit relational through mapping and you’re happy with the result, stick with it. But, what we’ve found in our customers, prospects, etc. (which are numerous) is that they’ve tried and they couldn’t make it work.

From rumors I’ve heard, Microsoft has realized some of these limitations and is working on devising ways in SQL Server to make most of these arguments invalid. I think this is great because it validates the usefulness of XML as a common data model to solve these real business problems.

Cheers,

Chris

---------------------------------------

Chris Parkerson

Product Manager

eXcelon Corporation

Burlington, MA

(781) 674-5393

http://www.exceloncorp.com

---------------------------------------

-----Original Message-----
From: Michael Rys [mailto:mrys@microsoft.com]
Sent: Monday, September 10, 2001 2:22 PM
To: chrisp; gharesh@vsnl.com; Chuck White
Cc: Xml-Dev
Subject: RE: storing xml files into database

Dear Chris. While I don't dispute that there are situations where a native XML store is better than the mapping, let me address some of your points below.

I think it is important to note that XML-enabled relational databases have some specific scenarios in mind (integrating relational data/schemata with XML in a bi-directional way) that do not work that well for storing "document-centric" XML. Thus, use the tools that fit the task at hand.

Best regards

Michael

-----Original Message-----
From: Chris Parkerson [mailto:chrisp@exceloncorp.com]
Sent: Monday, September 10, 2001 6:18 AM
To: gharesh@vsnl.com; 'Chuck White'
Cc: 'Xml-Dev'
Subject: RE: storing xml files into database

Haresh:

[Michael Rys] <snip/>

A couple of questions that will quickly let you know that an RDBMS is not going to work:

1) Are you dealing with XML data whose schema is not necessarily consistent from document to document or the schemas change often? The key advantage of using XML as a data model is extensibility: you lose that advantage when you store it in an RDBMS.

2) Do you need to update this data often? All current RDBMS implementations (except for a beta utility available for Microsoft for SQL Server) do not allow incremental document updates. You must replace entire documents when you need to make a change as all of the RDBMS systems store the data as BLOBs. Most of our competitors in the XML DB world also do not allow incremental updates. This is very important should data need to change fairly often.

The update facility of SQL Server is released and fully supported (and not beta). It is shiped as a web release via msdn.microsoft.com/xml and /sqlserver.

3) Is performance and scalability important to you? We’ve tried both Oracle and DB2 with over 100,000 documents, which is a relatively small amount. Oracle fails the benchmark, and DB2 goes very slow. Why? They have to parse the data on the fly to do anything with it: query, index, anything. eXcelon XIS stores the data in parsed form so we eliminate this overhead PLUS our caching technology is distributed and works like an in-memory database to give you really fast access.

4) Do you have data that comes from business partners or from systems where you really do not have much control over the schema? Does that data need to be integrated into your common XML data model? If so, keeping that all coordinated in an RDBMS is very difficult. This is the problem that has afflicted most of our customers and a primary reason why they failed in using an RDBMS.

If you need to integrate with your existing relational data model, then the story of course looks different.

5) How many nodes wide and how many nodes deep is your XML data? If you decide to go the mapping route (which is the only way to avoid having it stored as a BLOB), the mappings will fail or become immensely complex the wider and deeper the XML data gets

This is a fairly vague statement. You need to do the mapping once and there are tools available (at least for the SQL Server annotated schema approach) that provide you with a simple to use visual interface. Can you provide more information why and when mappings fail or become immensely complex in the context of data-centric XML documents (and not Shakespeare)?