Re: [xml-dev] Shredding XML

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]
From: Robert Koberg <rob@koberg.com>
To: Fraser Goffin <goffinf@googlemail.com>
Date: Fri, 30 Oct 2009 18:12:17 -0700
Hi,

Are you in the java world?

If so, check out jaxb and specifically hyperjaxb

https://hyperjaxb.dev.java.net/

JAXB allows you to take an XML document and load it into POJOs.  
HyperJAXB connect those POJOs to the DB.

So, depending on the complexity of you app, all of your shredding  
could happen behind the scenes.

best,
-Rob



On Oct 30, 2009, at 6:05 PM, Fraser Goffin wrote:

> Thanks for the great comments thus far from every one.
>
> Several people have mentioned using BLOB or CLOB and indeed this is
> something we have done in the recent past. However, one of the key
> issues is that at least some the applications that will access the
> data are either not XML capable and/or the programmers using them are
> not really that familiar. Whilst its possible to process XML data
> natively in Cobol, most of the time this is not the approach thats
> taken, and resource constraints and project deadlines often mitigate
> towards existing skills, technologies and practices.
>
> So I'm really interested in experience of shredding moderately complex
> XML content models into relational tables (for example structures that
> might produce 30-50 even possibly more tables when decomposed). And
> also some arguments for and against that approach (I would like to be
> able to make a compelling case for moving towards treating XML as a
> first class type system rather than one which just providing a format
> for data exchange).
>
> One of the suggestions from one of our solution designers was to
> 'flatten' the XML structure and represent relationships using
> keys/ids, that is, make the XML more like the database. Personally I
> like the contextual relationships implicit in the hierarchical content
> model and am not really keen to navigate around the document using ID
> values as opposed to simply walking the tree ... but maybe others
> people's experience could provide some use cases where that approach
> has merit ?.
>
> I am mainly interested in the process of LOADING XML data to a
> database rather than extracting (at least for the purposes of this
> discussion). So another key issue (excuse the pun) is that I will be
> processing the XML data and at various points contructing SQL INSERT
> statements including gathering together all of the [primary] key
> values that identify each entity and their [foreign] key
> relationship(s). Not all the data to support those relationships is
> inherent in the source XML data, so I also need to think about
> generating key values either from the database or as part of the XML
> processing. Of course use of stored procedures are another aspect in
> terms of positioning of the business/transformation logic.
>
> I want to also consider differences in the type systems that might be
> problematic. Some people have already mentioned structured vs.
> unstructured data as well as the volatility of the XML content model
> (or at least the benefit of potentially less disruptive change control
> of XML vs. a database schema).
>
> Please keep the comments coming. I have read some of the articles that
> were provided many going back to 2001 (guess this is not a new subject
> :-)
>
> Regards
>
> Fraser.
>
>
>
>
> 2009/10/30 Jim Tivy <jimt@bluestream.com>:
>> Choice - BLOB: Use a CLOB or BLOB column for the entire XML  
>> document.  MySQL
>> has a maximum of 1 or 2 GIG there.  Then process in memory using  
>> XSLT,
>> XQuery or what have you (Saxon 9).  Extract index values to other  
>> columns as
>> necessary to make selective loading faster.
>>
>> Choice - Package: Buy something that does this slice and dice for  
>> you.
>> Perhaps Progress/DataDirect has something.
>>
>> Choice - XML column: Create a column of XML type in DB2 or Oracle  
>> and do an
>> XQuery on that column.
>>
>> Choice - get MS SQL Server.  I think their first approach at  
>> supporting XML
>> was to slice and dice - may still be that way.  From what I can tell
>> Microsoft's approach was clumsy for alot of uses.
>>
>> Choice - Native XML database.
>>
>> You will have to decide which of these to do given your requirements.
>>
>> -----Original Message-----
>> From: Michael Sokolov [mailto:sokolov@ifactory.com]
>> Sent: Thursday, October 29, 2009 7:42 PM
>> To: 'Fraser Goffin'; xml-dev@lists.xml.org
>> Subject: RE: [xml-dev] Shredding XML
>>
>> I spent a little while evaluating DB-2 and Oracle XQuery  
>> implementations -
>> didn't go so far as to implement a full-blown system though, I  
>> guess because
>> nobody was holding a gun to my head.
>>
>> The whole automated shredding approach strikes me as totally  
>> unworkable for
>> data with any complexity (think about David Lee's 80 table joins),  
>> and
>> unneccessary for simple data, where you might as well map by hand.   
>> One
>> complication is that a schema is absolutely required, and if the  
>> schema
>> changes, you need to re-run the entire table generation process.   
>> When I
>> checked, it was looking like it could be quite complex to retain  
>> data in
>> such a case: there didn't seem to be the ability to generate  
>> incremental
>> schema change operations, so probably it would be necessary to  
>> migrate data
>> from an old set of tables to a new set (with the same names!).
>>
>> Then I considered the approach of storing an XML blob or two  
>> attached to a
>> metatdata record.  I could tell it would have been possible to  
>> implement,
>> and we probably could have gotten it working with some reasonable
>> computational efficiency in the end system.  However the programming
>> environment was looking very hostile: there are uncomfortable  
>> lexical issues
>> that arise when embedding XQuery in SQL or vice versa, and the idea  
>> of
>> passing values back and forth between the two different type  
>> systems was
>> making me uneasy.  I also found that the full text support (which  
>> for me is
>> absolutely critical) in DB-2 was lacking - when I checked they were  
>> in the
>> midst of a transition from an older, imperfect but functioning  
>> system to a
>> newer, but less functional one; the situation with full text is  
>> probably
>> better in Oracle; I didn't dig deep enough to find out details.
>>
>> It does sound as if your data may be more record-oriented than  
>> mine, which
>> is almost always documents written in English or some natural  
>> language, with
>> tagging to make it at least somewhat machine-friendly.  So, as  
>> Michael Kay
>> said, if it's *already* record-oriented data that has just been  
>> wrapped up
>> in angle brackets, you might not run into these problems.
>>
>> -Mike
>>
>>
>>> -----Original Message-----
>>> From: Fraser Goffin [mailto:goffinf@googlemail.com]
>>> Sent: Thursday, October 29, 2009 5:20 PM
>>> To: xml-dev@lists.xml.org
>>> Subject: [xml-dev] Shredding XML
>>>
>>> This list has been unusually quiet of late so I thought it
>>> might be an opportune moment to ask for opinions on the
>>> subject of decomposing XML into relational databases, often
>>> referred to as 'shredding'.
>>>
>>> My particular interest is related to some work I'm currently
>>> engaged in. The basics are we receive XML messages from an
>>> external trading partner and process those messages,
>>> enriching and routing to a number of internal subscriber
>>> applications. One of these applications is MI and the deal
>>> here is that they want the data to been put into a relational
>>> database so that they can create a number of interfaces
>>> 'files' which are sent to still more applications.
>>>
>>> Whilst I would like to consider a pure XML database or even
>>> use some of the XML features that are increasingly prevalent
>>> in mainstream DB vendor products, clearly putting data into a
>>> 'staging' database is one thing, but the capabilities and
>>> competances of the applications and application programmers
>>> who want to retrieve it is a key factor. So, for the
>>> immediate term I might be stuck (if thats fair - probably
>>> not) with relational.
>>>
>>> So to better inform myself and maybe help the debate along
>>> internally, I am interested in anyone else experience good
>>> and bad, of shredding XML data, pitfalls, things to be aware
>>> of, good approaches, when to really not do it. All thoughts
>>> are welcome.
>>>
>>> I find it intersting the some of the 'big boys' are at least
>>> giving the appearance of providing first-class support for
>>> XML both in terms of storage options and manipulation
>>> capability. IBM for example has pureXML. I haven't used these
>>> enough to know if they're just a thin veneer of whether they
>>> have real substance and depth, so again your experiences are  
>>> welcome.
>>>
>>> Regards
>>>
>>> Fraser,
>>>
>>> ______________________________________________________________
>>> _________
>>>
>>> XML-DEV is a publicly archived, unmoderated list hosted by
>>> OASIS to support XML implementation and development. To
>>> minimize spam in the archives, you must subscribe before posting.
>>>
>>> [Un]Subscribe/change address: http://www.oasis-open.org/mlmanage/
>>> Or unsubscribe: xml-dev-unsubscribe@lists.xml.org
>>> subscribe: xml-dev-subscribe@lists.xml.org List archive:
>>> http://lists.xml.org/archives/xml-dev/
>>> List Guidelines: http://www.oasis-open.org/maillists/guidelines.php
>>>
>>>
>>
>>
>> _______________________________________________________________________
>>
>> XML-DEV is a publicly archived, unmoderated list hosted by OASIS
>> to support XML implementation and development. To minimize
>> spam in the archives, you must subscribe before posting.
>>
>> [Un]Subscribe/change address: http://www.oasis-open.org/mlmanage/
>> Or unsubscribe: xml-dev-unsubscribe@lists.xml.org
>> subscribe: xml-dev-subscribe@lists.xml.org
>> List archive: http://lists.xml.org/archives/xml-dev/
>> List Guidelines: http://www.oasis-open.org/maillists/guidelines.php
>>
>>
>>
>>
>
> _______________________________________________________________________
>
> XML-DEV is a publicly archived, unmoderated list hosted by OASIS
> to support XML implementation and development. To minimize
> spam in the archives, you must subscribe before posting.
>
> [Un]Subscribe/change address: http://www.oasis-open.org/mlmanage/
> Or unsubscribe: xml-dev-unsubscribe@lists.xml.org
> subscribe: xml-dev-subscribe@lists.xml.org
> List archive: http://lists.xml.org/archives/xml-dev/
> List Guidelines: http://www.oasis-open.org/maillists/guidelines.php
>
References:
- Shredding XML
  - From: Fraser Goffin <goffinf@googlemail.com>
- RE: [xml-dev] Shredding XML
  - From: "Michael Sokolov" <sokolov@ifactory.com>
- RE: [xml-dev] Shredding XML
  - From: "Jim Tivy" <jimt@bluestream.com>
- Re: [xml-dev] Shredding XML
  - From: Fraser Goffin <goffinf@googlemail.com>
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]