Lists Home |
Date Index |
Sounds like an excellent paper topic for one of the extreme
conferences or a paid-for article at xml.com.
I've crunched docs down into relational entities, and yes,
it isn't impossible, but it isn't much fun. I'd rather do
a straightforward relational design and map out to the document
type if I can, keeping the mapping in the SQL and code.
Joins don't scare me. :-)
Oh... you want this to be a real time dynamic system and not
a batch system?
From: Danny Ayers [mailto:firstname.lastname@example.org]
>Peter Hunsberger wrote:
>>A more general comment/question: it recently occurred to me that it is
>>likely possible to model any XML Schema as a relational schema (proof of
>>this theorem is left as an exercise for the reader ;-)? Don't know what
>>that gets you, but as I've said at least the tools abound...
Yep. You could do this arbitrarily, by crunching down the schema to its
constituent entities and relationships, and building lots of fairly
trivial tables to manage them. Perhaps an easier way, for which the
tools are already available...wait for it...would be to model the schema
in RDF (e.g. using the infoset vocab ) , so your
inter-element/attribute relationships are expressed as properties, then
store the result in a triple-oriented relational store - even just a
single table of subject, property, object.
>It often gets you a really bad relational schema.
Yep. See above. But not necessarily, and bad schemas aren't exactly a
novelty using other approaches.
>The relational model has a difficult time with things
>like recursive elements (think of nested DIVs in HTML).
>Element types with lots of optional attributes and repeatable
>subelements get hairy when translated the relational model.
>Mixed content is problematic too.
>Another exercise for the reader: try modeling something
>simple like HTML 2.0 (or for the really adventurous, something
>more complex like DocBook) as a relational database.
>It's probably doable, but I doubt you'd really want
>to work with any database that was structured that way.
As I'm fresh from some pleasantly satisfying RDF-in-RDBMS play, I've no
qualms about suggesting a core model of triples (to provide the most
granular relationships) mapped to SQL VIEWs of the business domain,
though preferably spread over several tables to sensibly manage
resources and literals. Optimisations can be made schema-specific, e.g.
tables where you've got a group of elements as peers. (Whether you'd
actually want to do this with such doc-oriented vocabs is another
matter, might be a fun approach for journalists wishing to shuffle and
republish their stories).
Having said that, rather than mapping over the XML structure and
bringing in possibly irrelevant structural artifacts, you could simple
refactor your XML to follow one of the varieties of RDF/XML syntax.
Either way, your application (in the general case) can look at
structures closer to the domain model than either DOM/XPath hierachies
or the SQL version of the relational model - thanks to the graph.
Depending on the app, it might well be possible to leverage a lot more
of the relational set/logic capability, a la Datalog etc.
What fragmenting the XML schema down to such a level of granularity does
get you is the potential for interop, it's easier to match simpler
structures. This may be at the cost of performance in the first
instance, but at least it can be done, and optimizations could follow.
'Course this is something not altogether lost on the RDF'ers, especially
if you throw in a bit of ontological magic to manage the vocabularies.