Re: [xml-dev] Shredding XML

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]
From: Fraser Goffin <goffinf@googlemail.com>
To: xml-dev@lists.xml.org
Date: Sat, 31 Oct 2009 01:05:21 +0000
Thanks for the great comments thus far from every one.

Several people have mentioned using BLOB or CLOB and indeed this is
something we have done in the recent past. However, one of the key
issues is that at least some the applications that will access the
data are either not XML capable and/or the programmers using them are
not really that familiar. Whilst its possible to process XML data
natively in Cobol, most of the time this is not the approach thats
taken, and resource constraints and project deadlines often mitigate
towards existing skills, technologies and practices.

So I'm really interested in experience of shredding moderately complex
XML content models into relational tables (for example structures that
might produce 30-50 even possibly more tables when decomposed). And
also some arguments for and against that approach (I would like to be
able to make a compelling case for moving towards treating XML as a
first class type system rather than one which just providing a format
for data exchange).

One of the suggestions from one of our solution designers was to
'flatten' the XML structure and represent relationships using
keys/ids, that is, make the XML more like the database. Personally I
like the contextual relationships implicit in the hierarchical content
model and am not really keen to navigate around the document using ID
values as opposed to simply walking the tree ... but maybe others
people's experience could provide some use cases where that approach
has merit ?.

I am mainly interested in the process of LOADING XML data to a
database rather than extracting (at least for the purposes of this
discussion). So another key issue (excuse the pun) is that I will be
processing the XML data and at various points contructing SQL INSERT
statements including gathering together all of the [primary] key
values that identify each entity and their [foreign] key
relationship(s). Not all the data to support those relationships is
inherent in the source XML data, so I also need to think about
generating key values either from the database or as part of the XML
processing. Of course use of stored procedures are another aspect in
terms of positioning of the business/transformation logic.

I want to also consider differences in the type systems that might be
problematic. Some people have already mentioned structured vs.
unstructured data as well as the volatility of the XML content model
(or at least the benefit of potentially less disruptive change control
of XML vs. a database schema).

Please keep the comments coming. I have read some of the articles that
were provided many going back to 2001 (guess this is not a new subject
:-)

Regards

Fraser.




2009/10/30 Jim Tivy <jimt@bluestream.com>:
> Choice - BLOB: Use a CLOB or BLOB column for the entire XML document. �MySQL
> has a maximum of 1 or 2 GIG there. �Then process in memory using XSLT,
> XQuery or what have you (Saxon 9). �Extract index values to other columns as
> necessary to make selective loading faster.
>
> Choice - Package: Buy something that does this slice and dice for you.
> Perhaps Progress/DataDirect has something.
>
> Choice - XML column: Create a column of XML type in DB2 or Oracle and do an
> XQuery on that column.
>
> Choice - get MS SQL Server. �I think their first approach at supporting XML
> was to slice and dice - may still be that way. �From what I can tell
> Microsoft's approach was clumsy for alot of uses.
>
> Choice - Native XML database.
>
> You will have to decide which of these to do given your requirements.
>
> -----Original Message-----
> From: Michael Sokolov [mailto:sokolov@ifactory.com]
> Sent: Thursday, October 29, 2009 7:42 PM
> To: 'Fraser Goffin'; xml-dev@lists.xml.org
> Subject: RE: [xml-dev] Shredding XML
>
> I spent a little while evaluating DB-2 and Oracle XQuery implementations -
> didn't go so far as to implement a full-blown system though, I guess because
> nobody was holding a gun to my head.
>
> The whole automated shredding approach strikes me as totally unworkable for
> data with any complexity (think about David Lee's 80 table joins), and
> unneccessary for simple data, where you might as well map by hand. �One
> complication is that a schema is absolutely required, and if the schema
> changes, you need to re-run the entire table generation process. �When I
> checked, it was looking like it could be quite complex to retain data in
> such a case: there didn't seem to be the ability to generate incremental
> schema change operations, so probably it would be necessary to migrate data
> from an old set of tables to a new set (with the same names!).
>
> Then I considered the approach of storing an XML blob or two attached to a
> metatdata record. �I could tell it would have been possible to implement,
> and we probably could have gotten it working with some reasonable
> computational efficiency in the end system. �However the programming
> environment was looking very hostile: there are uncomfortable lexical issues
> that arise when embedding XQuery in SQL or vice versa, and the idea of
> passing values back and forth between the two different type systems was
> making me uneasy. �I also found that the full text support (which for me is
> absolutely critical) in DB-2 was lacking - when I checked they were in the
> midst of a transition from an older, imperfect but functioning system to a
> newer, but less functional one; the situation with full text is probably
> better in Oracle; I didn't dig deep enough to find out details.
>
> It does sound as if your data may be more record-oriented than mine, which
> is almost always documents written in English or some natural language, with
> tagging to make it at least somewhat machine-friendly. �So, as Michael Kay
> said, if it's *already* record-oriented data that has just been wrapped up
> in angle brackets, you might not run into these problems.
>
> -Mike
>
>
>> -----Original Message-----
>> From: Fraser Goffin [mailto:goffinf@googlemail.com]
>> Sent: Thursday, October 29, 2009 5:20 PM
>> To: xml-dev@lists.xml.org
>> Subject: [xml-dev] Shredding XML
>>
>> This list has been unusually quiet of late so I thought it
>> might be an opportune moment to ask for opinions on the
>> subject of decomposing XML into relational databases, often
>> referred to as 'shredding'.
>>
>> My particular interest is related to some work I'm currently
>> engaged in. The basics are we receive XML messages from an
>> external trading partner and process those messages,
>> enriching and routing to a number of internal subscriber
>> applications. One of these applications is MI and the deal
>> here is that they want the data to been put into a relational
>> database so that they can create a number of interfaces
>> 'files' which are sent to still more applications.
>>
>> Whilst I would like to consider a pure XML database or even
>> use some of the XML features that are increasingly prevalent
>> in mainstream DB vendor products, clearly putting data into a
>> 'staging' database is one thing, but the capabilities and
>> competances of the applications and application programmers
>> who want to retrieve it is a key factor. So, for the
>> immediate term I might be stuck (if thats fair - probably
>> not) with relational.
>>
>> So to better inform myself and maybe help the debate along
>> internally, I am interested in anyone else experience good
>> and bad, of shredding XML data, pitfalls, things to be aware
>> of, good approaches, when to really not do it. All thoughts
>> are welcome.
>>
>> I find it intersting the some of the 'big boys' are at least
>> giving the appearance of providing first-class support for
>> XML both in terms of storage options and manipulation
>> capability. IBM for example has pureXML. I haven't used these
>> enough to know if they're just a thin veneer of whether they
>> have real substance and depth, so again your experiences are welcome.
>>
>> Regards
>>
>> Fraser,
>>
>> ______________________________________________________________
>> _________
>>
>> XML-DEV is a publicly archived, unmoderated list hosted by
>> OASIS to support XML implementation and development. To
>> minimize spam in the archives, you must subscribe before posting.
>>
>> [Un]Subscribe/change address: http://www.oasis-open.org/mlmanage/
>> Or unsubscribe: xml-dev-unsubscribe@lists.xml.org
>> subscribe: xml-dev-subscribe@lists.xml.org List archive:
>> http://lists.xml.org/archives/xml-dev/
>> List Guidelines: http://www.oasis-open.org/maillists/guidelines.php
>>
>>
>
>
> _______________________________________________________________________
>
> XML-DEV is a publicly archived, unmoderated list hosted by OASIS
> to support XML implementation and development. To minimize
> spam in the archives, you must subscribe before posting.
>
> [Un]Subscribe/change address: http://www.oasis-open.org/mlmanage/
> Or unsubscribe: xml-dev-unsubscribe@lists.xml.org
> subscribe: xml-dev-subscribe@lists.xml.org
> List archive: http://lists.xml.org/archives/xml-dev/
> List Guidelines: http://www.oasis-open.org/maillists/guidelines.php
>
>
>
>
Follow-Ups:
- Re: [xml-dev] Shredding XML
  - From: Robert Koberg <rob@koberg.com>
References:
- Shredding XML
  - From: Fraser Goffin <goffinf@googlemail.com>
- RE: [xml-dev] Shredding XML
  - From: "Michael Sokolov" <sokolov@ifactory.com>
- RE: [xml-dev] Shredding XML
  - From: "Jim Tivy" <jimt@bluestream.com>
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]