RE: [xml-dev] Shredding XML

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]
From: "Jim Tivy" <jimt@bluestream.com>
To: "'Fraser Goffin'" <goffinf@googlemail.com>,<xml-dev@lists.xml.org>
Date: Wed, 4 Nov 2009 09:58:36 -0800
Fraser

Often XML coming in reflects one hierarchy of the data - for example invoice
header and details - for example, where each invoice is a customer
transaction at a CoffeeHouse Chain.  These incoming hierarchies can have
redundant information - for example the address of a customer as stated in
each invoice.  If you need to report on this data in a different hiearchy,
first you need to define an appropriate RDB application data model and then
import carefully into it to remove redudancy, normalize the data into
respective entities (tables) and generally protect and enforce the quality
of data in the database.  Then you can acurately report on the data where
the output of the report reflects a different hierarchy than the incoming
Xml data -  say, total product sales of product latte in January by City by
Store. 

In this way, the reports you need out will carry requirements for your RDB
application data model.

Not sure what you mean by "interface files"?

Jim

-----Original Message-----
From: Fraser Goffin [mailto:goffinf@googlemail.com] 
Sent: Wednesday, November 04, 2009 8:20 AM
To: xml-dev@lists.xml.org
Subject: Re: [xml-dev] Shredding XML

I don't know the full detail of the interface files that will be
produced from this store, but its probably safe to assume that these
will need to model the data with multiple hierarchies and groupings.

Whats driving this line of thought ?

Fraser.

2009/11/3 Jim Tivy <jimt@bluestream.com>:
> do they need to produce reports that present the information in
> multiple hierarchial ways? �It might be invoices within a day or Lattes
sold
> within City within Province.
>
> -----Original Message-----
> From: Fraser Goffin [mailto:goffinf@googlemail.com]
> Sent: Tuesday, November 03, 2009 1:41 AM
> To: xml-dev@lists.xml.org
> Subject: Re: [xml-dev] Shredding XML
>
>> Do your applications that use the relational database need to update
>
> Thats a difficult one. As far as producing up-stream interfaces, no,
> they are created just by selecting from the database.
>
> However, as I mentioned (lost in the detail) earlier, messages
> received will have a relationship to existing records. I'm not at all
> convinced that update will be possible because not all of the keys
> will be available in the message received. Some form of 'fuzzy'
> matching may be possible in some case, for example, if I receive a new
> address then the 1st line + PostCode + the relationship of this
> address to something else may be sufficient.
>
> It might be the case that we have to consider time based snap-shots,
> i.e. the 'state' of a collection of data at a point in time. Not sure
> yet, its one of those areas of difficulty that probably doesn't change
> much whether I use relational of native XML stores.
>
> At present the primary purpose of the database is for retreival, i.e.
> its NOT an operational store, its for MI, so optimiastions for read
> rather than update are appropriate.
>
>> do they need to produce reports that present the information in multiple
> ways
>
> Yes.
>
> Fraser.
>
> 2009/11/3 Jim Tivy <jimt@bluestream.com>:
>> Do your applications that use the relational database need to update and
> do
>> they need to produce reports that present the information in multiple
ways
>> hierarchial ways? �This may suggest whether your database needs to be
>> normalized (approaching 3rd Normal form)?
>>
>> -----Original Message-----
>> From: Fraser Goffin [mailto:goffinf@googlemail.com]
>> Sent: Monday, November 02, 2009 3:31 PM
>> To: xml-dev@lists.xml.org
>> Subject: Re: [xml-dev] Shredding XML
>>
>> Hi Jim,
>>
>> thats interesting ... which should be the 'driving' schema, XML or Db ?
>>
>> I guess I've been somewhat tiptoe'ing around this one.
>>
>> I should admit my bias if its not already apparent. I work mainly in
>> the SOA integration space and since XML is the primary exchange format
>> and XML schema does a reasonable job as the type system, I favour
>> processing XML as .. well XML ... whilst I understand the argument
>> around leveraging existing technologies and skillset .. often-times
>> this is little more than protectionism and continually [de]composing
>> from XML to objects then to CopyLibs and then to relational just seems
>> unnecessary a lot of the time (sorry - soap box over). �But of course
>> the whole world isn't XML and just like most other large organisations
>> the vast majority of our processing capability and data isn't and
>> probably never will be ... I have no issue with that.
>>
>> On the one hand it is the end product that drives the design (even if
>> that design has a relatively short shelf-life ... but hey, we all do
>> agile right). In that case it is definately the Database schema that
>> prevails from the pure delivery point of view, since this is the
>> desired source for the staging area from which to produce interface
>> files for upstream applications. At present there appears to be no
>> possiblility of revisiting that choice. At the same time, I don't want
>> to 'paint myself into a corner' or promote this as an exemplar for all
>> future approaches (unless it turns out that way :-)
>>
>> My unease is around the brittleness of the database schema in the face
>> of change, but I suppose that situation is almost inevitable since I
>> can't crystal-ball what changes might be coming along next week and
>> its probably folly to try. XML changes that dynamic, but not
>> completely.
>>
>> I have been having this internal debate about, .. if I concede I'm
>> going to have a relational database then should its design be derived
>> from the XML schema or should the XML schema change to accomodate the
>> database, indeed one of the Solution Designers on this has already
>> indicated a desire to 'flatten' the XML schema (although I have to say
>> I disagree that it is necessary). I have some degree of opportunity to
>> change the XML schema (although messages
>> are received from external sources, within reason, I can transform
>> them into any 'shape' I like so long as thats a loss-less exchange).
>> The database is green field so it can be any shape, but clearly some
>> designs are going to lend themsleves better than others to XML mapping
>> I would have thought ?
>>
>> Surely there are some structures in XML that don't map
>> straight-forwardly. Ted Neward called this the 'last mile' (a familiar
>> term to us all I'm sure), where the illusion of a high fidelity
>> solution draws us in, and indeed 80%+ appears to go quite well, but
>> that last few % hold a disproportiate cost and increasing complexity
>> (but you don't realise that until late on at which point some are
>> going to object to a rethink). I want to know where that 'last mile'
>> lives so I can try and avoid it !
>>
>> Fraser.
>>
>> 2009/11/2 Jim Tivy <jimt@bluestream.com>:
>>> Fraser
>>>
>>> I am not entirely hearing firm commitment that you plan to establish an
>> RDB
>>> schema and make it the driving schema. �In other words, what this would
>> mean
>>> is that data elements cannot be put into the RDB unless they exist in
the
>>> RDB schema. �For example, if some new data elements show up in some
>> external
>>> XML to be imported then the DBA decides whether to allow them into the
>>> appropriate RDB column or not, or drop them for the time being.
>>>
>>> Another option (from the infinite number) would be to let the XML schema
>>> generate the RDB schema and the mapping code. �For your application
>>> programmers using SQL on the RDB this would likely lead to gagging and
>>> hacking and an "out of body experience" �This is not something I would
>>> recommend and if this is what you want then get a database that supports
>>> XQuery and retrain your developers.
>>>
>>> But I think you have to choose between these two - the first being what
> it
>>> sounds like you want - then work backwards from that decision.
>>>
>>> Jim
>>>
>>> -----Original Message-----
>>> From: Fraser Goffin [mailto:goffinf@googlemail.com]
>>> Sent: Monday, November 02, 2009 12:22 PM
>>> To: xml-dev@lists.xml.org
>>> Subject: Re: [xml-dev] Shredding XML
>>>
>>> Yes Jim, that is spot on.
>>>
>>> Whilst there has been much discussion thus far on the technolgies and
>>> techniques of getting data out of the database (and that has been
>>> interesting), the programming for doing so are 'bread and butter' to
>>> our mainframe Cobol and Sapiens guys, so thats not really my problem.
>>>
>>> Mine is the task of getting the data from a fairly complex XML content
>>> model into an appropriately factored relational database. The design
>>> of that database is 'green field' but (and thanks to many on this
>>> thread who have posted related papers) this may not be as easy at it
>>> might at first appear, what with impedence mismatches here there and
>>> everywhere ;-)
>>>
>>> Its also the case that the XML data doesn't contain enough data
>>> inherently to represent primary or foreign key values for all of the
>>> relationships that are likely to arise. In some cases I MAY be
>>> permitted to generate them myself (say using a UUID) as I 'walk' the
>>> XML, in other cases I MAY be required to get the database to provide
>>> the value(s), not sure yet. The later may increase the complexity
>>> somewhat (sidenote: our DBAs don't allow stored procs (don't ask) �so
>>> I'm going to be doing whole bunches of INSERTs as part of the
>>> tree-walk I suspect)
>>>
>>> I'm really interested in the gotchas and best practices. Some have
>>> already been mentioned like the fact that the XML schema may define
>>> optional items and unrestricted length facets and such like. Others
>>> I've seen in reading talk about the mis-match of identity approaches
>>> (although this was talking primarily about OO/Relational mapping but
>>> the idea is similar I suspect). This could be important, since some
>>> messages received may 'relate' to others already loaded and, given
>>> what I said about not having all of the data in the XML to form all of
>>> the keys, this might be a significant problem.
>>>
>>> It is my intention to look into other options (we have recently
>>> acquired DB2 v9 which includes pureXML) but as is so often the case,
>>> the immediate project delivery pressures won't allow it. The PM is
>>> very nervous about using any new tech, perhaps justifiably, but my
>>> sense of unease is more to do with the perhaps misplaced assumption
>>> that 'tried and tested' tech like relational databases will always
>>> provide a workable solution, imho sometimes they actually represent
>>> the most significant constraint.
>>>
>>> So yes, back to the actual problem. How to come up with a database
>>> design that provides the capability of staging the shredded XML in a
>>> reasonable efficient manner and enables it to be loaded from XML
>>> instances received, again efficiently (ideally without 100's of tables
>>> and joins to negotiate). As far as efficiency of storage, well that
>>> MAY be a concern although perhaps not a huge one so long as the Db
>>> doesn't bloat up too much if normalisation is preferred over extra
>>> tables.
>>>
>>> Please add your thoughts and suggestions and experiences as you are
>>> able. Nothing is too trivial (or rude) to mention (i.e. if you want to
>>> say don't do this if you want to keep your sanity, thats ok).
>>>
>>> regards
>>>
>>> Fraser.
>>>
>>> I'm
>>>
>>>
>>> 2009/11/1 Jim Tivy <jimt@bluestream.com>:
>>>> Interesting post, but I am not sure that "now is the time to talk of
> many
>>>> things".
>>>>
>>>> Let me try to focus:
>>>>
>>>> Proper software execution comes from the choice of appropriate
>>>> actions/technologies to match the driving requirements. �But more
>>>> importantly, the greatest Wisdom, is to frame the driving requirements
>>>> correctly before "going off half cocked" or doing something that is
>>>> unnecessary and unwarranted.
>>>>
>>>> So lets start by framing the requirements again:
>>>>
>>>> Fraser Gofin wrote:
>>>>
>>>> "
>>>> The basics are we receive XML messages from an external trading partner
>>> and
>>>> process those messages, enriching and routing to a number of internal
>>>> subscriber applications. One of these applications is MI and the deal
>> here
>>>> is that they want the data to been put into a relational database so
> that
>>>> they can create a number of interfaces 'files' which are sent to still
>>> more
>>>> applications.
>>>> "
>>>>
>>>> OR
>>>>
>>>> "
>>>> I am mainly interested in the process of LOADING XML data to a database
>>>> rather than extracting (at least for the purposes of this discussion).
>>>> "
>>>>
>>>> It is possible that the "mother persistent application datamodel" is
>>>> contained in the relational database in all its normalized glory. �If
> so,
>>>> then, "processing the messages" is simply a "data import" operation.
�So
>>> the
>>>> question is, how do I get XML X* to tables T*. �It would strike me that
>>> lots
>>>> of people are doing this. �Are there common techniques and technologies
>>> for
>>>> doing this import?
>>>>
>>>> Fraser, is that a proper framing of the question/requirements?
>>>>
>>>> Jim
>>>>
>>>>
>>>> -----Original Message-----
>>>> From: Petite Abeille [mailto:petite.abeille@gmail.com]
>>>> Sent: Sunday, November 01, 2009 9:56 AM
>>>> To: xml-dev@lists.xml.org
>>>> Subject: Re: [xml-dev] Shredding XML
>>>>
>>>>
>>>> On Oct 29, 2009, at 10:20 PM, Fraser Goffin wrote:
>>>>
>>>>> opinions on the subject of decomposing XML into relational databases
>>>>
>>>> Outside of the most trivial case, this is a major PITA of the same
>>>> epic proportion as the object-relational one:
>>>>
>>>>
>>
http://blogs.tedneward.com/2006/06/26/The+Vietnam+Of+Computer+Science.aspx
>>>>
>>>> Good luck.
>>>>
>>>>
>>>>
>>>> _______________________________________________________________________
>>>>
>>>> XML-DEV is a publicly archived, unmoderated list hosted by OASIS
>>>> to support XML implementation and development. To minimize
>>>> spam in the archives, you must subscribe before posting.
>>>>
>>>> [Un]Subscribe/change address: http://www.oasis-open.org/mlmanage/
>>>> Or unsubscribe: xml-dev-unsubscribe@lists.xml.org
>>>> subscribe: xml-dev-subscribe@lists.xml.org
>>>> List archive: http://lists.xml.org/archives/xml-dev/
>>>> List Guidelines: http://www.oasis-open.org/maillists/guidelines.php
>>>>
>>>>
>>>>
>>>>
>>>> _______________________________________________________________________
>>>>
>>>> XML-DEV is a publicly archived, unmoderated list hosted by OASIS
>>>> to support XML implementation and development. To minimize
>>>> spam in the archives, you must subscribe before posting.
>>>>
>>>> [Un]Subscribe/change address: http://www.oasis-open.org/mlmanage/
>>>> Or unsubscribe: xml-dev-unsubscribe@lists.xml.org
>>>> subscribe: xml-dev-subscribe@lists.xml.org
>>>> List archive: http://lists.xml.org/archives/xml-dev/
>>>> List Guidelines: http://www.oasis-open.org/maillists/guidelines.php
>>>>
>>>>
>>>
>>> _______________________________________________________________________
>>>
>>> XML-DEV is a publicly archived, unmoderated list hosted by OASIS
>>> to support XML implementation and development. To minimize
>>> spam in the archives, you must subscribe before posting.
>>>
>>> [Un]Subscribe/change address: http://www.oasis-open.org/mlmanage/
>>> Or unsubscribe: xml-dev-unsubscribe@lists.xml.org
>>> subscribe: xml-dev-subscribe@lists.xml.org
>>> List archive: http://lists.xml.org/archives/xml-dev/
>>> List Guidelines: http://www.oasis-open.org/maillists/guidelines.php
>>>
>>>
>>>
>>>
>>
>> _______________________________________________________________________
>>
>> XML-DEV is a publicly archived, unmoderated list hosted by OASIS
>> to support XML implementation and development. To minimize
>> spam in the archives, you must subscribe before posting.
>>
>> [Un]Subscribe/change address: http://www.oasis-open.org/mlmanage/
>> Or unsubscribe: xml-dev-unsubscribe@lists.xml.org
>> subscribe: xml-dev-subscribe@lists.xml.org
>> List archive: http://lists.xml.org/archives/xml-dev/
>> List Guidelines: http://www.oasis-open.org/maillists/guidelines.php
>>
>>
>>
>>
>
> _______________________________________________________________________
>
> XML-DEV is a publicly archived, unmoderated list hosted by OASIS
> to support XML implementation and development. To minimize
> spam in the archives, you must subscribe before posting.
>
> [Un]Subscribe/change address: http://www.oasis-open.org/mlmanage/
> Or unsubscribe: xml-dev-unsubscribe@lists.xml.org
> subscribe: xml-dev-subscribe@lists.xml.org
> List archive: http://lists.xml.org/archives/xml-dev/
> List Guidelines: http://www.oasis-open.org/maillists/guidelines.php
>
>
>
>

_______________________________________________________________________

XML-DEV is a publicly archived, unmoderated list hosted by OASIS
to support XML implementation and development. To minimize
spam in the archives, you must subscribe before posting.

[Un]Subscribe/change address: http://www.oasis-open.org/mlmanage/
Or unsubscribe: xml-dev-unsubscribe@lists.xml.org
subscribe: xml-dev-subscribe@lists.xml.org
List archive: http://lists.xml.org/archives/xml-dev/
List Guidelines: http://www.oasis-open.org/maillists/guidelines.php
Follow-Ups:
- Re: [xml-dev] Shredding XML
  - From: Fraser Goffin <goffinf@googlemail.com>
References:
- Re: [xml-dev] Shredding XML
  - From: Petite Abeille <petite.abeille@gmail.com>
- RE: [xml-dev] Shredding XML
  - From: "Jim Tivy" <jimt@bluestream.com>
- Re: [xml-dev] Shredding XML
  - From: Fraser Goffin <goffinf@googlemail.com>
- RE: [xml-dev] Shredding XML
  - From: "Jim Tivy" <jimt@bluestream.com>
- Re: [xml-dev] Shredding XML
  - From: Fraser Goffin <goffinf@googlemail.com>
- RE: [xml-dev] Shredding XML
  - From: "Jim Tivy" <jimt@bluestream.com>
- Re: [xml-dev] Shredding XML
  - From: Fraser Goffin <goffinf@googlemail.com>
- RE: [xml-dev] Shredding XML
  - From: "Jim Tivy" <jimt@bluestream.com>
- Re: [xml-dev] Shredding XML
  - From: Fraser Goffin <goffinf@googlemail.com>
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]