Re: [xml-dev] Shredding XML

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]

From: Johannes Lichtenberger <Johannes.Lichtenberger@uni-konstanz.de>
To: Fraser Goffin <goffinf@googlemail.com>
Date: Sun, 01 Nov 2009 13:51:37 +0100

On Sat, 2009-10-31 at 01:05 +0000, Fraser Goffin wrote:
> Thanks for the great comments thus far from every one.
> 
> Several people have mentioned using BLOB or CLOB and indeed this is
> something we have done in the recent past. However, one of the key
> issues is that at least some the applications that will access the
> data are either not XML capable and/or the programmers using them are
> not really that familiar. Whilst its possible to process XML data
> natively in Cobol, most of the time this is not the approach thats
> taken, and resource constraints and project deadlines often mitigate
> towards existing skills, technologies and practices.
> 
> So I'm really interested in experience of shredding moderately complex
> XML content models into relational tables (for example structures that
> might produce 30-50 even possibly more tables when decomposed). And
> also some arguments for and against that approach (I would like to be
> able to make a compelling case for moving towards treating XML as a
> first class type system rather than one which just providing a format
> for data exchange).
> 
> One of the suggestions from one of our solution designers was to
> 'flatten' the XML structure and represent relationships using
> keys/ids, that is, make the XML more like the database. Personally I
> like the contextual relationships implicit in the hierarchical content
> model and am not really keen to navigate around the document using ID
> values as opposed to simply walking the tree ... but maybe others
> people's experience could provide some use cases where that approach
> has merit ?.

You definately should have a look at the XPath Accelerator Scheme
originally developed from Thorsten Grust (which can even be enhanced),
the Staircase Join and various tree based enhancements to rewrite XQuery
queries into (very) efficient SQL queries.

> I am mainly interested in the process of LOADING XML data to a
> database rather than extracting (at least for the purposes of this
> discussion). So another key issue (excuse the pun) is that I will be
> processing the XML data and at various points contructing SQL INSERT
> statements including gathering together all of the [primary] key
> values that identify each entity and their [foreign] key
> relationship(s). Not all the data to support those relationships is
> inherent in the source XML data, so I also need to think about
> generating key values either from the database or as part of the XML
> processing. Of course use of stored procedures are another aspect in
> terms of positioning of the business/transformation logic.

You can parse XML (possibly very large XML files) with for instance SAX
and generate the appropriate relational encoding. With the XPath
Accelerator scheme you preserve the tree relationship, document order
and so on _and_ it has an inherit knowledge of each XPath axis, so one
can rewrite XQuery statements in SQL (with tree knowledge you can also
avoid duplicates efficiently, skip certain nodes etc.pp and with the
staircase join you have a very efficient JOIN operator).

HTH,
Johannes

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]