OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help



   RE: [xml-dev] Designing XML to Support Information Evolution

[ Lists Home | Date Index | Thread Index ]
  • To: "Rick Marshall" <rjm@zenucom.com>
  • Subject: RE: [xml-dev] Designing XML to Support Information Evolution
  • From: "Hunsberger, Peter" <Peter.Hunsberger@STJUDE.ORG>
  • Date: Mon, 24 May 2004 11:05:21 -0500
  • Cc: "Michael Champion" <mc@xegesis.org>,"xml-dev DEV'" <xml-dev@lists.xml.org>
  • Thread-index: AcQ/h9pHjesiIWL3SbuXWNRDMGUAZACICTAg
  • Thread-topic: [xml-dev] Designing XML to Support Information Evolution

Rick Marshall <rjm@zenucom.com> writes:

> i found one way to fix the performance problem is with associative 
> structures. these are heavily indexed tables and associative lists to 
> work out navigation issues. and then it's very fast - much 
> faster than 
> exitsing techniques. i worked out how to do it with 
> relational databases 
> and now i'm building code for xml. but normalisation is important to 
> make this work.
> the "secret" is being able to traverse lists very quickly

Could you go into a little more detail about what you're doing?  List
traversal is the one thing that relational database do very well...  I
also don't find a lot of problems with list traversal in XSLT.  However,
for building a hierarchical view of data (for presentation purposes) I
find that gluing together lists doesn't perform; you really have to
stick to a hierarchical representation of your data from end to end.
The trick for doing this with databases is a little work, but not
extremely difficult, I've talked about it here and the cocoon-dev lists
a couple of times and can do so again if people want.

I get the feeling that your work is all pure data manipulation and no
hierarchical presentation so I don't think comparing the two approaches
(hierarchical trees vs. associative lists) is meaningful, but perhaps
I'm missing something?

> rick
> Hunsberger, Peter wrote:
> >Rick Marshall <rjm@zenucom.com> writes:
> > 
> >  
> >
> >>hierarchies fail, and this is my struggle with xml at the
> >>moment, when 
> >>they have to support multiple hierarchies simultaneously. and they 
> >>largely fail because of a) the update problem, and b) the new 
> >>hierarchy 
> >>problem. reverse bill of materials is a case in point.
> >>
> >>having said that xml works really well where neither of these are an
> >>issue - documents where the "semantics" don't change only the 
> >>contents; 
> >>and as i said before moving transactions between systems.
> >>
> >>even relational systems have problems because the semantics
> >>is embedded 
> >>in the sql select statements. most so called post 
> relational systems 
> >>(not really sure that's a legitimate term, even though it's 
> >>used a lot) 
> >>basically embed semantics back into the structure.
> >>
> >>things like owl and to a lesser extent name spaces try to 
> express the
> >>semantics as a meta model. imho a far superior approach. i 
> just don't 
> >>like naming relationships - prefer to acknowledge they exist 
> >>and what it 
> >>takes to define them, but not necessarily name them.
> >>
> >>now to xml and the cinderella id tag. the same effect as the
> >>hierarchical xml could be achieved by allowing a name/value 
> >>pairing to 
> >>store the structure as attributes in the xml tag and they should be 
> >>treated as elements as well.
> >>
> >>the id tag is the required unique key, while special
> >>associate elements 
> >>store structure. this has the advantage of flatenning the xml and 
> >>allowing the parsers to create structure on the fly to suit 
> >>the translators.
> >>
> >><home id="456"><home_elements/></home>
> >><person id="123"><associate
> >>type="home">456</associate><other_elements/></person>
> >>
> >>which would be approximately
> >>
> >><home id="456">
> >>    <home_elements/>
> >>    </home>
> >><person id="123">
> >>    <home>456</home>
> >>    <other_elements/>
> >>    </person>
> >>
> >>
> >>early days, but something like this would be much better for data
> >>modelling. perhaps we can have post-xml?  ;)
> >>
> >>    
> >>
> >
> >Interesting, this is essentially the structure I was comparing to a 
> >structured hierarchy in the "Parallel tree traversal" thread.  Turns 
> >out that once I fixed up all my XSLT bugs and cleaned up the 
> code that 
> >the version that used the structured hierarchy runs about an 
> order of 
> >magnitude faster than the version that attempts to stitch 
> the hierarchy 
> >together from flat data using id/idref.
> >
> >I need a little more testing on the insert/update side, but I expect 
> >I'm going to proceed with a version of our code that can spit out 
> >multiple hierarchies cutting across our relationship lattice 
> on demand 
> >instead of trying to glue this together on the XML side.  More XML 
> >output (redundant trees), but at least in our case 
> normalization costs 
> >too much in terms of performance and the extra space 
> consumption can be 
> >handled: the redundant data is generated only as needed from a 
> >normalized database and not persisted anywhere.  It chews up 
> app server 
> >memory, but we're talking at most maybe 100 MB (if every model gets 
> >cached, which in our case will happen over time).  A GB of memory is 
> >cheap enough that once more, throwing hardware at an XML 
> problem trumps 
> >trying to spend too much time optimizing it.
> >
> >More and more, I'm seeing that XML application optimization 
> comes down 
> >to explicitly exploiting the known algorithms for fast tree 
> traversal 
> >and generation and not trying to re-invent normalization from within 
> >XSLT (or Java transforms for that matter)...
> >


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS