xml-dev - RE: [xml-dev] Question for updating existing XML file

RE: [xml-dev] Question for updating existing XML file

[ Lists Home | Date Index | Thread Index ]

To: <danny@dannyayers.com>,"Bullard, Claude L (Len)" <len.bullard@intergraph.com>
Subject: RE: [xml-dev] Question for updating existing XML file
From: "Hunsberger, Peter" <Peter.Hunsberger@STJUDE.ORG>
Date: Thu, 29 Jul 2004 10:20:04 -0500
Cc: "Joe English" <jenglish@flightlab.com>, <xml-dev@lists.xml.org>
Thread-index: AcR03bpG3P+0o0IeRG6wOfwMvmgrZAAm//8g
Thread-topic: [xml-dev] Question for updating existing XML file

Danny Ayers <danny666@virgilio.it> writes:
> 
> Bullard, Claude L (Len) wrote:
> 
> >Peter Hunsberger wrote:
> >
> >>A more general comment/question: it recently occurred to me 
> that it is 
> >>likely possible to model any XML Schema as a relational 
> schema (proof 
> >>of this theorem is left as an exercise for the reader  ;-)? 
> Don't know 
> >>what that gets you, but as I've said at least the tools abound...
> >>    
> 
> Yep. You could do this arbitrarily, by crunching down the 
> schema to its 
> constituent entities and relationships, and building lots of fairly 
> trivial tables to manage them. Perhaps an easier way, for which the 
> tools are already available...wait for it...would be to model 
> the schema 
> in RDF (e.g. using the infoset vocab [1]) , so your 
> inter-element/attribute relationships are expressed as 
> properties, then 
> store the result in a triple-oriented relational store - even just a 
> single table of subject, property, object.

Very interesting: I've been sketching out a new version of our systems
architecture and basically came to the conclusion that the underlying
metadata should be triples in a relational store (our system is all
metadata driven).  I'm going a little further than just that, multiple
forms of graph dependencies are also reflected in the metadata, but at
least one level it's a RDF implementation on a relational database.

<snip>ugly schema discussion</snip>

> As I'm fresh from some pleasantly satisfying RDF-in-RDBMS 
> play, I've no 
> qualms about suggesting a core model of triples (to provide the most 
> granular relationships) mapped to SQL VIEWs of the business domain, 
> though preferably spread over several tables to sensibly manage 
> resources and literals. Optimisations can be made 
> schema-specific, e.g. 
> tables where you've got a group of elements as peers. (Whether you'd 
> actually want to do this with such doc-oriented vocabs is another 
> matter, might be a fun approach for journalists wishing to 
> shuffle and 
> republish their stories).

No SQL views here: XML templates and XSLT rule bases specify the
business relationships.  At a lower level we do reflect a sort of
generic optimization, just a couple of days ago I wrote:

"Where specific performance requirements must be met the key is to
access areas of the hierarchy through supplemental tables that
essentially index the required relationship as needed."

To explain a tad more; if you've built the logic to manage a
hierarchical model then reuse it across all hierarchies and then add
domain specific access as needed.  If done with on a relational database
this is normalization over structure (essentially an iteration over 5th
normal form); it's a little early to tell if it scales, but some simple
models with our existing data gives queries against millions of elements
and attributes having response times in the 10s of microseconds. 

> Having said that, rather than mapping over the XML structure and 
> bringing in possibly irrelevant structural artifacts, you 
> could simple 
> refactor your XML to follow one of the varieties of RDF/XML syntax. 
> Either way, your application (in the general case) can look at 
> structures closer to the domain model than either DOM/XPath 
> hierachies 
> or the SQL version of the relational model - thanks to the graph. 
> Depending on the app, it might well be possible to leverage a 
> lot more 
> of the relational set/logic capability, a la  Datalog etc.

Yes, seems we've come to the same conclusions. 

I think several people on the list have reached the conclusion that
there is a set of common best practices for relational, OO and XML that,
if explicitly codified, could lead to a sort of generic approach to
leveraging the three domains for application development.  It's sort of
the same rational that leads to things like xQuery, but at a higher
level.

> What fragmenting the XML schema down to such a level of 
> granularity does 
> get you is the potential for interop, it's easier to match simpler 
> structures. This may be at the cost of performance in the first 
> instance, but at least it can be done, and optimizations 
> could follow. 
> 'Course this is something not altogether lost on the RDF'ers, 
> especially 
> if you throw in a bit of ontological magic to manage the vocabularies.
> 

That's something I have yet to explore in depth. I suspect we're
reinventing the wheel with our relationship traversal logic.  It appears
to be just a specialized version of ontological relationship traversal.
Don't know what kind of performance the more general case RDF/OWL tools
might have, but we're currently using a lot XSLT to attach this problem
so I can't claim we have any kind of optimally performing solution at
the moment...

Follow-Ups:
- Re: [xml-dev] Question for updating existing XML file
  - From: Jim Rankin <jimbokun_lists@mac.com>

Prev by Date: RE: [xml-dev] Question for updating existing XML file
Next by Date: XSLT vs RuleML (WAS RE: [xml-dev] Question for updating existing XML file)
Previous by thread: RE: [xml-dev] Question for updating existing XML file
Next by thread: Re: [xml-dev] Question for updating existing XML file
Index(es):
- Date
- Thread