[
Lists Home |
Date Index |
Thread Index
]
- To: "Rick Marshall" <rjm@zenucom.com>
- Subject: RE: [xml-dev] Designing XML to Support Information Evolution
- From: "Hunsberger, Peter" <Peter.Hunsberger@STJUDE.ORG>
- Date: Mon, 24 May 2004 11:05:21 -0500
- Cc: "Michael Champion" <mc@xegesis.org>,"xml-dev DEV'" <xml-dev@lists.xml.org>
- Thread-index: AcQ/h9pHjesiIWL3SbuXWNRDMGUAZACICTAg
- Thread-topic: [xml-dev] Designing XML to Support Information Evolution
Rick Marshall <rjm@zenucom.com> writes:
>
> i found one way to fix the performance problem is with associative
> structures. these are heavily indexed tables and associative lists to
> work out navigation issues. and then it's very fast - much
> faster than
> exitsing techniques. i worked out how to do it with
> relational databases
> and now i'm building code for xml. but normalisation is important to
> make this work.
>
> the "secret" is being able to traverse lists very quickly
Could you go into a little more detail about what you're doing? List
traversal is the one thing that relational database do very well... I
also don't find a lot of problems with list traversal in XSLT. However,
for building a hierarchical view of data (for presentation purposes) I
find that gluing together lists doesn't perform; you really have to
stick to a hierarchical representation of your data from end to end.
The trick for doing this with databases is a little work, but not
extremely difficult, I've talked about it here and the cocoon-dev lists
a couple of times and can do so again if people want.
I get the feeling that your work is all pure data manipulation and no
hierarchical presentation so I don't think comparing the two approaches
(hierarchical trees vs. associative lists) is meaningful, but perhaps
I'm missing something?
>
> rick
>
> Hunsberger, Peter wrote:
>
> >Rick Marshall <rjm@zenucom.com> writes:
> >
> >
> >
> >>hierarchies fail, and this is my struggle with xml at the
> >>moment, when
> >>they have to support multiple hierarchies simultaneously. and they
> >>largely fail because of a) the update problem, and b) the new
> >>hierarchy
> >>problem. reverse bill of materials is a case in point.
> >>
> >>having said that xml works really well where neither of these are an
> >>issue - documents where the "semantics" don't change only the
> >>contents;
> >>and as i said before moving transactions between systems.
> >>
> >>even relational systems have problems because the semantics
> >>is embedded
> >>in the sql select statements. most so called post
> relational systems
> >>(not really sure that's a legitimate term, even though it's
> >>used a lot)
> >>basically embed semantics back into the structure.
> >>
> >>things like owl and to a lesser extent name spaces try to
> express the
> >>semantics as a meta model. imho a far superior approach. i
> just don't
> >>like naming relationships - prefer to acknowledge they exist
> >>and what it
> >>takes to define them, but not necessarily name them.
> >>
> >>now to xml and the cinderella id tag. the same effect as the
> >>hierarchical xml could be achieved by allowing a name/value
> >>pairing to
> >>store the structure as attributes in the xml tag and they should be
> >>treated as elements as well.
> >>
> >>the id tag is the required unique key, while special
> >>associate elements
> >>store structure. this has the advantage of flatenning the xml and
> >>allowing the parsers to create structure on the fly to suit
> >>the translators.
> >>
> >><home id="456"><home_elements/></home>
> >><person id="123"><associate
> >>type="home">456</associate><other_elements/></person>
> >>
> >>which would be approximately
> >>
> >><home id="456">
> >> <home_elements/>
> >> </home>
> >><person id="123">
> >> <home>456</home>
> >> <other_elements/>
> >> </person>
> >>
> >>
> >>early days, but something like this would be much better for data
> >>modelling. perhaps we can have post-xml? ;)
> >>
> >>
> >>
> >
> >Interesting, this is essentially the structure I was comparing to a
> >structured hierarchy in the "Parallel tree traversal" thread. Turns
> >out that once I fixed up all my XSLT bugs and cleaned up the
> code that
> >the version that used the structured hierarchy runs about an
> order of
> >magnitude faster than the version that attempts to stitch
> the hierarchy
> >together from flat data using id/idref.
> >
> >I need a little more testing on the insert/update side, but I expect
> >I'm going to proceed with a version of our code that can spit out
> >multiple hierarchies cutting across our relationship lattice
> on demand
> >instead of trying to glue this together on the XML side. More XML
> >output (redundant trees), but at least in our case
> normalization costs
> >too much in terms of performance and the extra space
> consumption can be
> >handled: the redundant data is generated only as needed from a
> >normalized database and not persisted anywhere. It chews up
> app server
> >memory, but we're talking at most maybe 100 MB (if every model gets
> >cached, which in our case will happen over time). A GB of memory is
> >cheap enough that once more, throwing hardware at an XML
> problem trumps
> >trying to spend too much time optimizing it.
> >
> >More and more, I'm seeing that XML application optimization
> comes down
> >to explicitly exploiting the known algorithms for fast tree
> traversal
> >and generation and not trying to re-invent normalization from within
> >XSLT (or Java transforms for that matter)...
> >
|