XML.orgXML.org
FOCUS AREAS |XML-DEV |XML.org DAILY NEWSLINK |REGISTRY |RESOURCES |ABOUT
OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]
Re: [xml-dev] Shredding XML

Hi Jim,

thats interesting ... which should be the 'driving' schema, XML or Db ?

I guess I've been somewhat tiptoe'ing around this one.

I should admit my bias if its not already apparent. I work mainly in
the SOA integration space and since XML is the primary exchange format
and XML schema does a reasonable job as the type system, I favour
processing XML as .. well XML ... whilst I understand the argument
around leveraging existing technologies and skillset .. often-times
this is little more than protectionism and continually [de]composing
from XML to objects then to CopyLibs and then to relational just seems
unnecessary a lot of the time (sorry - soap box over).  But of course
the whole world isn't XML and just like most other large organisations
the vast majority of our processing capability and data isn't and
probably never will be ... I have no issue with that.

On the one hand it is the end product that drives the design (even if
that design has a relatively short shelf-life ... but hey, we all do
agile right). In that case it is definately the Database schema that
prevails from the pure delivery point of view, since this is the
desired source for the staging area from which to produce interface
files for upstream applications. At present there appears to be no
possiblility of revisiting that choice. At the same time, I don't want
to 'paint myself into a corner' or promote this as an exemplar for all
future approaches (unless it turns out that way :-)

My unease is around the brittleness of the database schema in the face
of change, but I suppose that situation is almost inevitable since I
can't crystal-ball what changes might be coming along next week and
its probably folly to try. XML changes that dynamic, but not
completely.

I have been having this internal debate about, .. if I concede I'm
going to have a relational database then should its design be derived
from the XML schema or should the XML schema change to accomodate the
database, indeed one of the Solution Designers on this has already
indicated a desire to 'flatten' the XML schema (although I have to say
I disagree that it is necessary). I have some degree of opportunity to
change the XML schema (although messages
are received from external sources, within reason, I can transform
them into any 'shape' I like so long as thats a loss-less exchange).
The database is green field so it can be any shape, but clearly some
designs are going to lend themsleves better than others to XML mapping
I would have thought ?

Surely there are some structures in XML that don't map
straight-forwardly. Ted Neward called this the 'last mile' (a familiar
term to us all I'm sure), where the illusion of a high fidelity
solution draws us in, and indeed 80%+ appears to go quite well, but
that last few % hold a disproportiate cost and increasing complexity
(but you don't realise that until late on at which point some are
going to object to a rethink). I want to know where that 'last mile'
lives so I can try and avoid it !

Fraser.

2009/11/2 Jim Tivy <jimt@bluestream.com>:
> Fraser
>
> I am not entirely hearing firm commitment that you plan to establish an RDB
> schema and make it the driving schema.  In other words, what this would mean
> is that data elements cannot be put into the RDB unless they exist in the
> RDB schema.  For example, if some new data elements show up in some external
> XML to be imported then the DBA decides whether to allow them into the
> appropriate RDB column or not, or drop them for the time being.
>
> Another option (from the infinite number) would be to let the XML schema
> generate the RDB schema and the mapping code.  For your application
> programmers using SQL on the RDB this would likely lead to gagging and
> hacking and an "out of body experience"  This is not something I would
> recommend and if this is what you want then get a database that supports
> XQuery and retrain your developers.
>
> But I think you have to choose between these two - the first being what it
> sounds like you want - then work backwards from that decision.
>
> Jim
>
> -----Original Message-----
> From: Fraser Goffin [mailto:goffinf@googlemail.com]
> Sent: Monday, November 02, 2009 12:22 PM
> To: xml-dev@lists.xml.org
> Subject: Re: [xml-dev] Shredding XML
>
> Yes Jim, that is spot on.
>
> Whilst there has been much discussion thus far on the technolgies and
> techniques of getting data out of the database (and that has been
> interesting), the programming for doing so are 'bread and butter' to
> our mainframe Cobol and Sapiens guys, so thats not really my problem.
>
> Mine is the task of getting the data from a fairly complex XML content
> model into an appropriately factored relational database. The design
> of that database is 'green field' but (and thanks to many on this
> thread who have posted related papers) this may not be as easy at it
> might at first appear, what with impedence mismatches here there and
> everywhere ;-)
>
> Its also the case that the XML data doesn't contain enough data
> inherently to represent primary or foreign key values for all of the
> relationships that are likely to arise. In some cases I MAY be
> permitted to generate them myself (say using a UUID) as I 'walk' the
> XML, in other cases I MAY be required to get the database to provide
> the value(s), not sure yet. The later may increase the complexity
> somewhat (sidenote: our DBAs don't allow stored procs (don't ask)  so
> I'm going to be doing whole bunches of INSERTs as part of the
> tree-walk I suspect)
>
> I'm really interested in the gotchas and best practices. Some have
> already been mentioned like the fact that the XML schema may define
> optional items and unrestricted length facets and such like. Others
> I've seen in reading talk about the mis-match of identity approaches
> (although this was talking primarily about OO/Relational mapping but
> the idea is similar I suspect). This could be important, since some
> messages received may 'relate' to others already loaded and, given
> what I said about not having all of the data in the XML to form all of
> the keys, this might be a significant problem.
>
> It is my intention to look into other options (we have recently
> acquired DB2 v9 which includes pureXML) but as is so often the case,
> the immediate project delivery pressures won't allow it. The PM is
> very nervous about using any new tech, perhaps justifiably, but my
> sense of unease is more to do with the perhaps misplaced assumption
> that 'tried and tested' tech like relational databases will always
> provide a workable solution, imho sometimes they actually represent
> the most significant constraint.
>
> So yes, back to the actual problem. How to come up with a database
> design that provides the capability of staging the shredded XML in a
> reasonable efficient manner and enables it to be loaded from XML
> instances received, again efficiently (ideally without 100's of tables
> and joins to negotiate). As far as efficiency of storage, well that
> MAY be a concern although perhaps not a huge one so long as the Db
> doesn't bloat up too much if normalisation is preferred over extra
> tables.
>
> Please add your thoughts and suggestions and experiences as you are
> able. Nothing is too trivial (or rude) to mention (i.e. if you want to
> say don't do this if you want to keep your sanity, thats ok).
>
> regards
>
> Fraser.
>
> I'm
>
>
> 2009/11/1 Jim Tivy <jimt@bluestream.com>:
>> Interesting post, but I am not sure that "now is the time to talk of many
>> things".
>>
>> Let me try to focus:
>>
>> Proper software execution comes from the choice of appropriate
>> actions/technologies to match the driving requirements.  But more
>> importantly, the greatest Wisdom, is to frame the driving requirements
>> correctly before "going off half cocked" or doing something that is
>> unnecessary and unwarranted.
>>
>> So lets start by framing the requirements again:
>>
>> Fraser Gofin wrote:
>>
>> "
>> The basics are we receive XML messages from an external trading partner
> and
>> process those messages, enriching and routing to a number of internal
>> subscriber applications. One of these applications is MI and the deal here
>> is that they want the data to been put into a relational database so that
>> they can create a number of interfaces 'files' which are sent to still
> more
>> applications.
>> "
>>
>> OR
>>
>> "
>> I am mainly interested in the process of LOADING XML data to a database
>> rather than extracting (at least for the purposes of this discussion).
>> "
>>
>> It is possible that the "mother persistent application datamodel" is
>> contained in the relational database in all its normalized glory.  If so,
>> then, "processing the messages" is simply a "data import" operation.  So
> the
>> question is, how do I get XML X* to tables T*.  It would strike me that
> lots
>> of people are doing this.  Are there common techniques and technologies
> for
>> doing this import?
>>
>> Fraser, is that a proper framing of the question/requirements?
>>
>> Jim
>>
>>
>> -----Original Message-----
>> From: Petite Abeille [mailto:petite.abeille@gmail.com]
>> Sent: Sunday, November 01, 2009 9:56 AM
>> To: xml-dev@lists.xml.org
>> Subject: Re: [xml-dev] Shredding XML
>>
>>
>> On Oct 29, 2009, at 10:20 PM, Fraser Goffin wrote:
>>
>>> opinions on the subject of decomposing XML into relational databases
>>
>> Outside of the most trivial case, this is a major PITA of the same
>> epic proportion as the object-relational one:
>>
>> http://blogs.tedneward.com/2006/06/26/The+Vietnam+Of+Computer+Science.aspx
>>
>> Good luck.
>>
>>
>>
>> _______________________________________________________________________
>>
>> XML-DEV is a publicly archived, unmoderated list hosted by OASIS
>> to support XML implementation and development. To minimize
>> spam in the archives, you must subscribe before posting.
>>
>> [Un]Subscribe/change address: http://www.oasis-open.org/mlmanage/
>> Or unsubscribe: xml-dev-unsubscribe@lists.xml.org
>> subscribe: xml-dev-subscribe@lists.xml.org
>> List archive: http://lists.xml.org/archives/xml-dev/
>> List Guidelines: http://www.oasis-open.org/maillists/guidelines.php
>>
>>
>>
>>
>> _______________________________________________________________________
>>
>> XML-DEV is a publicly archived, unmoderated list hosted by OASIS
>> to support XML implementation and development. To minimize
>> spam in the archives, you must subscribe before posting.
>>
>> [Un]Subscribe/change address: http://www.oasis-open.org/mlmanage/
>> Or unsubscribe: xml-dev-unsubscribe@lists.xml.org
>> subscribe: xml-dev-subscribe@lists.xml.org
>> List archive: http://lists.xml.org/archives/xml-dev/
>> List Guidelines: http://www.oasis-open.org/maillists/guidelines.php
>>
>>
>
> _______________________________________________________________________
>
> XML-DEV is a publicly archived, unmoderated list hosted by OASIS
> to support XML implementation and development. To minimize
> spam in the archives, you must subscribe before posting.
>
> [Un]Subscribe/change address: http://www.oasis-open.org/mlmanage/
> Or unsubscribe: xml-dev-unsubscribe@lists.xml.org
> subscribe: xml-dev-subscribe@lists.xml.org
> List archive: http://lists.xml.org/archives/xml-dev/
> List Guidelines: http://www.oasis-open.org/maillists/guidelines.php
>
>
>
>


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 1993-2007 XML.org. This site is hosted by OASIS