xml-dev - Re: [xml-dev] Designing XML to Support Information Evolution

Re: [xml-dev] Designing XML to Support Information Evolution

[ Lists Home | Date Index | Thread Index ]

To: "Hunsberger, Peter" <Peter.Hunsberger@STJUDE.ORG>
Subject: Re: [xml-dev] Designing XML to Support Information Evolution
From: Rick Marshall <rjm@zenucom.com>
Date: Wed, 26 May 2004 09:26:07 +1000
Cc: Michael Champion <mc@xegesis.org>, "xml-dev DEV'" <xml-dev@lists.xml.org>
In-reply-to: <1E0CC447E59C974CA5C7160D2A2854EC097E19@SJMEMXMB04.stjude.sjcrh.local>
Organization: Zenucom Pty Ltd
References: <1E0CC447E59C974CA5C7160D2A2854EC097E19@SJMEMXMB04.stjude.sjcrh.local>
User-agent: Mozilla Thunderbird 0.6 (X11/20040502)

Hunsberger, Peter wrote:

>Rick Marshall <rjm@zenucom.com> writes:
>
>  
>
>>i found one way to fix the performance problem is with associative 
>>structures. these are heavily indexed tables and associative lists to 
>>work out navigation issues. and then it's very fast - much 
>>faster than 
>>exitsing techniques. i worked out how to do it with 
>>relational databases 
>>and now i'm building code for xml. but normalisation is important to 
>>make this work.
>>
>>the "secret" is being able to traverse lists very quickly
>>    
>>
>
>Could you go into a little more detail about what you're doing?  List
>traversal is the one thing that relational database do very well...  I
>also don't find a lot of problems with list traversal in XSLT.  However,
>for building a hierarchical view of data (for presentation purposes) I
>find that gluing together lists doesn't perform; you really have to
>stick to a hierarchical representation of your data from end to end.
>The trick for doing this with databases is a little work, but not
>extremely difficult, I've talked about it here and the cocoon-dev lists
>a couple of times and can do so again if people want.
>
>I get the feeling that your work is all pure data manipulation and no
>hierarchical presentation so I don't think comparing the two approaches
>(hierarchical trees vs. associative lists) is meaningful, but perhaps
>I'm missing something?
>  
>
i hope my other posting clears up some details. but here goes on a 
couple of things:

i'm not sure what lists have to do with relational databases as such, 
they are part of the how we manipulate the sets. a relational database 
should be about sets. and all the operators assume that. even the "sort" 
is not an index indication, but a presentation indication.

list processing is an internal issue. some meta description (data 
dictionary) describes how the data works. internally this is built into 
lists of things - lists of tables, lists of attributes, associations, 
calculations, etc. these are expressed as lists of lists that can then 
be traversed quickly to do things (as opposed to precompiled and to a 
large extent static programs). the associations are part of the data 
model, instantiated only when being used.

ps list manipulation can be very quick in c (if you use macros, not 
functions) as you are only playing with the dreaded pointers.

rick



>  
>
>>rick
>>
>>Hunsberger, Peter wrote:
>>
>>    
>>
>>>Rick Marshall <rjm@zenucom.com> writes:
>>>
>>> 
>>>
>>>      
>>>
>>>>hierarchies fail, and this is my struggle with xml at the
>>>>moment, when 
>>>>they have to support multiple hierarchies simultaneously. and they 
>>>>largely fail because of a) the update problem, and b) the new 
>>>>hierarchy 
>>>>problem. reverse bill of materials is a case in point.
>>>>
>>>>having said that xml works really well where neither of these are an
>>>>issue - documents where the "semantics" don't change only the 
>>>>contents; 
>>>>and as i said before moving transactions between systems.
>>>>
>>>>even relational systems have problems because the semantics
>>>>is embedded 
>>>>in the sql select statements. most so called post 
>>>>        
>>>>
>>relational systems 
>>    
>>
>>>>(not really sure that's a legitimate term, even though it's 
>>>>used a lot) 
>>>>basically embed semantics back into the structure.
>>>>
>>>>things like owl and to a lesser extent name spaces try to 
>>>>        
>>>>
>>express the
>>    
>>
>>>>semantics as a meta model. imho a far superior approach. i 
>>>>        
>>>>
>>just don't 
>>    
>>
>>>>like naming relationships - prefer to acknowledge they exist 
>>>>and what it 
>>>>takes to define them, but not necessarily name them.
>>>>
>>>>now to xml and the cinderella id tag. the same effect as the
>>>>hierarchical xml could be achieved by allowing a name/value 
>>>>pairing to 
>>>>store the structure as attributes in the xml tag and they should be 
>>>>treated as elements as well.
>>>>
>>>>the id tag is the required unique key, while special
>>>>associate elements 
>>>>store structure. this has the advantage of flatenning the xml and 
>>>>allowing the parsers to create structure on the fly to suit 
>>>>the translators.
>>>>
>>>><home id="456"><home_elements/></home>
>>>><person id="123"><associate
>>>>type="home">456</associate><other_elements/></person>
>>>>
>>>>which would be approximately
>>>>
>>>><home id="456">
>>>>   <home_elements/>
>>>>   </home>
>>>><person id="123">
>>>>   <home>456</home>
>>>>   <other_elements/>
>>>>   </person>
>>>>
>>>>
>>>>early days, but something like this would be much better for data
>>>>modelling. perhaps we can have post-xml?  ;)
>>>>
>>>>   
>>>>
>>>>        
>>>>
>>>Interesting, this is essentially the structure I was comparing to a 
>>>structured hierarchy in the "Parallel tree traversal" thread.  Turns 
>>>out that once I fixed up all my XSLT bugs and cleaned up the 
>>>      
>>>
>>code that 
>>    
>>
>>>the version that used the structured hierarchy runs about an 
>>>      
>>>
>>order of 
>>    
>>
>>>magnitude faster than the version that attempts to stitch 
>>>      
>>>
>>the hierarchy 
>>    
>>
>>>together from flat data using id/idref.
>>>
>>>I need a little more testing on the insert/update side, but I expect 
>>>I'm going to proceed with a version of our code that can spit out 
>>>multiple hierarchies cutting across our relationship lattice 
>>>      
>>>
>>on demand 
>>    
>>
>>>instead of trying to glue this together on the XML side.  More XML 
>>>output (redundant trees), but at least in our case 
>>>      
>>>
>>normalization costs 
>>    
>>
>>>too much in terms of performance and the extra space 
>>>      
>>>
>>consumption can be 
>>    
>>
>>>handled: the redundant data is generated only as needed from a 
>>>normalized database and not persisted anywhere.  It chews up 
>>>      
>>>
>>app server 
>>    
>>
>>>memory, but we're talking at most maybe 100 MB (if every model gets 
>>>cached, which in our case will happen over time).  A GB of memory is 
>>>cheap enough that once more, throwing hardware at an XML 
>>>      
>>>
>>problem trumps 
>>    
>>
>>>trying to spend too much time optimizing it.
>>>
>>>More and more, I'm seeing that XML application optimization 
>>>      
>>>
>>comes down 
>>    
>>
>>>to explicitly exploiting the known algorithms for fast tree 
>>>      
>>>
>>traversal 
>>    
>>
>>>and generation and not trying to re-invent normalization from within 
>>>XSLT (or Java transforms for that matter)...
>>>
>>>      
>>>
>
>
>  
>

begin:vcard
fn:Rick  Marshall
n:Marshall;Rick 
email;internet:rjm@zenucom.com
tel;cell:+61 411 287 530
x-mozilla-html:TRUE
version:2.1
end:vcard

References:
- RE: [xml-dev] Designing XML to Support Information Evolution
  - From: "Hunsberger, Peter" <Peter.Hunsberger@STJUDE.ORG>

Prev by Date: Re: [xml-dev] Logical models, hierarchy, network model
Next by Date: Re: [xml-dev] SAX and Entity Literals containing content
Previous by thread: RE: [xml-dev] Designing XML to Support Information Evolution
Next by thread: Re: [xml-dev] RDF storage
Index(es):
- Date
- Thread