xml-dev - RE: Mechanics of mapping relational data into XML

RE: Mechanics of mapping relational data into XML
[ Lists Home | Date Index | Thread Index ]
From: Michael Rys <mrys@microsoft.com>
To: 'Ronald Bourret' <rpbourret@hotmail.com>, xml-dev@lists.xml.org
Date: Wed, 12 Jul 2000 23:03:17 -0700
Ronald makes some good observations and distinctions.

Please note that SQLServer 2000 (and the SQLXML tech preview) infer nesting
information from the SQL query based on the join/Cartesian product in the
FOR XML auto mode mentioned below). One limitation of this (and the approach
mentioned in the WebTechnique article) is that you only get single type
hierarchies (i.e. all children are of the same type, e.g. Order contains
Customer) and there is no way to identify sibling instance data of a
different type solely based on the resulting rowset of a simple join query.

Also the mapping schema approach of SQLServer provides the "model-driven"
approach by annotating an XDR schema and translating an XPath expression
against that schema into the corresponding SQL expression.

Ronald hints also at another categorization of mappings of relational data
to XML:

- Canonical mappings such as 

<tables>
 <table>
  <row>
   <column>...</column>
   ...
  </row>
  ...
 </table>
 ...
</tables>

which are generated by tools such as Oracle's XML query templates, ADO's XML
persist format.
This format is efficiently generated but needs a lot of (potentially)
inefficient postprocessing to generate the application's XML structure.

- Heuristic mappings such as the FOR XML auto mode, or the WebTechnique
article. A semantic XML structure is inferred from the underlying datamodel.
Closer to the goal if the database model corresponds to the application's
model.

- Mapping descriptions such as FOR XML explicit, mapping schemata, or IBM's
DADs. They generate the desired XML structure with some mainly relational
limitations w.r.t. to pivoting relational data into meta-information such as
the issue mentioned below w.r.t. transforming values into XML element names.

The over-/non-normalized issue identified by Ronald is an issue in the first
two approaches and would probably be best solved via an intermediate,
normalized view. The mapping approaches (such as SQLServer's mapping schema)
should not have a problem with data being combined from over-/non-normalized
tables, assuming that some identification for the relationships is
available.

As to platform independence: If you link another database system via
SQLServer linked server facility, all XML features are available for these
databases. I am sure that Oracle's and IBM's XML feature can be used for
other database systems in one way or the other. What you can do is making
the database engine (or in some cases only the XML part) part of a
middle-tier component. In any case, the issue is that there is not (yet) a
standardized way to tell a relational database on how to generate XML. 

Best regards
Michael Rys

> -----Original Message-----
> From: Ronald Bourret [mailto:rpbourret@hotmail.com]
> Sent: Wednesday, July 12, 2000 1:53 PM
> To: xml-dev@lists.xml.org
> Subject: RE: Mechanics of mapping relational data into XML
> 
> 
> Joshua Allen wrote:
> 
> >IBM have an XML Extender for DB2 (free download on alphaworks or
> >developerworks),
> >and MS SQL Server allows selecting to XML using a specified schema or
> >automatic
> >(e.g. SELECT * FROM authors FOR XML AUTO).  I am sure other 
> vendors can do
> >this now, too.  Besides persisting the recordset object to 
> XML (as in ADO
> >mentioned
> >below) and querying to get XML back, there are plenty of 
> other ways to get
> >XML
> >from a relational database.  I can think of 5 or 10 at least..
> 
> At least. Because Jerry only wants to get data from the 
> database to XML (and 
> not the other direction), he has a lot of choices. Most of the 
> "template-driven" middleware listed in "XML Database 
> Products" [1] can do 
> this, although to get the "grouping" ("nesting" in XML terms) 
> he wants, he 
> will probably have to code the joins himself. 
> (Template-driven middleware 
> generally processes an XML document with embedded SQL 
> statements, which are 
> replaced in the output document by their results.)
> 
> The alternative to template-driven software is "model-driven" 
> software, in 
> which the software is based on modeling the XML document in 
> some manner and 
> then mapping that model to the database. The advantage of 
> model-driven 
> software is that it can also transfer data from the XML 
> document to the 
> database. It also often saves the user from having to write the SQL 
> statements to do the joins as well.
> 
> (Transferring data from the XML document to the database is 
> generally not 
> supported by template-driven software because of the 
> complexity involved -- 
> a template allows the user to place data from the database 
> almost anywhere 
> in the resulting XML document, and reversing this action is 
> non-trivial, to 
> say the least.)
> 
> Note that some model-driven software models the XML document 
> as a set of 
> tables:
> 
>   <tables>
>     <table>
>       <row>
>         <column>...</column>
>         etc.
>       </row>
>       etc.
>     </table>
>     etc.
>   <tables>
> 
> Clearly, this is not what Jerry needs, since it does not 
> perform the nesting 
> he needs. Instead, he needs what I call a "tree-of-objects" 
> model, in which 
> the XML document is modelled as data-specific objects. In 
> Jerry's case, 
> these objects would be:
> 
>      Product
>    /    |     \
> Comp.  Comp.  Comp. etc.
> 
> These are then mapped to the database with an 
> object-relational mapping.
> 
> Note that two things about Jerry's relational schema (see 
> below) are likely 
> to pose problems for model-driven software, including ADO(?). 
> First, the 
> data for the Product object is non-normal: it comes from two 
> different 
> tables. In a traditional object-relational mapping, a single 
> object has all 
> its property values in a single table. It is not clear to me 
> that any of the 
> model-driven products on the market will be able to do this 
> in the XML => 
> database direction and many (most?) won't be able to do this 
> in the database 
> => XML direction.
> 
> The second problem is that table z contains columns called 
> component_attribute and component_value. The problem here is 
> that the value 
> of the component_attribute column is being used to name 
> elements/attributes 
> in the XML document -- again something that model-driven 
> software probably 
> can't do but (some) template-driven software can do.
> 
> Jerry's database schema
> =======================
> Table x: one row for each product
>   product-id  |  product_name  |  product_manufacturer
> 
> Table y: one row for each product
>   product-id  |  product_type  |  product_class
> 
> Table z: multiple rows for each product
>   product-id  |  product_component |  component_attribute | 
> component_value
> 
> Desired XML
> ===========
>   <product
>         product_id="456"
>         product_name="widget2"
>         product_manufacturer="manufacturerY"
>         product_type="type2"
>         product_class="classY">
>      <product_components>
>         <cover material=plastic color="blue" weight_oz="4"/>
>         <base material=metal color="gray" weight_oz="7" />
>      </product_components>
>   </product>
> 
> >In general, it is still quite fast to let the relational
> >engine do all of your work, then convert to XML after the fact.
> 
> Agreed. It's also a lot easier from the perspective of how 
> much work you 
> have to do.
> 
> >Another thing to note about the technique Kar Yan mentioned
> >below is that you can be fairly database-agnostic.  All of the
> >major databases out there seem to have different syntax for
> >returning XML directly, but ADO always returns a recordset, no
> >matter what database you use.  If you convert from ADO to
> >recordset, you can be programming-language agnostic and
> >database agnostic.  (If you need your code to run on platforms
> >other than Windows and can use pure Java, I am sure that
> >Java recordsets now have [or coming soon] ability to persist
> >to XML generically).
> 
> Note that database-independence is true for virtually all of 
> the middleware 
> listed in the product list -- these are almost all based on 
> JDBC, ODBC, or 
> OLE DB. Only the database-specific products (such as the DB2 
> XML Extender or 
> SQL Server) don't have this capability.
> 
> -- Ron Bourret
> 
> [1] 
> http://www.informatik.tu-darmstadt.de/DVS1/staff/bourret/xml/X
MLDatabaseProds.htm
________________________________________________________________________
Get Your Private, Free E-mail from MSN Hotmail at http://www.hotmail.com
Prev by Date: RE: SAX2 Lexical Handler Suggestions
Next by Date: Re: Mechanics of mapping relational data into XML
Previous by thread: Re: Mechanics of mapping relational data into XML
Next by thread: RE: Mechanics of mapping relational data into XML
Index(es):
- Date
- Thread