OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

 


 

   Re: [xml-dev] Tree v. Table - A relational XML object model...?

[ Lists Home | Date Index | Thread Index ]


On Thu, 27 Feb 2003 Matthew.Bennett@facs.gov.au wrote:

> I recognized in CS 101 that a binary tree is just a table, skewed 45
> degrees, a non-binary tree a sparse table, skewed 45 degrees. They are
> abstractions of the *same* concept! Hence, it ought to be just as valid to
> speak of XML tables, as XML trees.

I'm no mathematician, but I do store XML in tables. I use a simple schema,
where each row contains a node, and the columns

   location, depth, type, uri, name, content

are defined thus: location contains a number that gives the node's position
in document order, depth contains its distance from the root node, type
corresponds to an integer constant that maps to a node type (element,
attribute, comment, PI), uri is the node's namespace, name a local name,
and content is the value or content. You'd need to add a prefix or
qualified name column if you wanted to retain namespace prefixes.

So the document

   <doc>
     <div>stuff</div>
     <div xmlns="http://my.stuff.com";>more stuff</div>
   </doc>

Is stored as

   1, 1, ELEMENT, "", doc, null
   2, 2, ELEMENT, "", div, null
   3, 3, PCDATA, null, null, stuff
   4, 2, ELEMENT, "http://my.stuff.com";, div, null
   5, 3, PCDATA, null, null, more stuff 

In practice, I have found that many operations are slow over such a simple
schema, so I add an auxillary column, parent, which contains the rowid of
each node's parent node, and an auxillary table which contains mappings of
element row ids to their children row ids. This makes for quick extraction
of document fragments from the database.

The nice thing about such a simple-minded model, is that it is easy to
build a "parser" that reads an RDBMS cursor and fires off SAX events. Just
add the missing endElement events.

I've given no thought to how one might handle entities, CDATA marked
sections, DOCTYPE declarations, DOCTYPE subset, and probably a dozen other
things.

// Gregory Murphy <Gregory.Murphy@sun.com>







 

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS