OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

 


 

   Re: XML tools and big documents

[ Lists Home | Date Index | Thread Index ]
  • From: "Ingo Macherius" <macherius@darmstadt.gmd.de>
  • To: xml-dev@ic.ac.uk
  • Date: Thu, 3 Sep 1998 02:46:49 +0200

David Megginson <david@megginson.com> wrote at 1 Sep 98, 16:57:
> I do not need to build a tree for the whole document; instead [...] 
> dump it into my SQL database [...]. 
=> Put it into an RDBMS

> [...] it makes more sense to build the specialised object tree
> directly from the event stream rather than building a DOM tree
=> Put it into an OODBMS

"Michael Kay" <M.H.Kay@eng.icl.co.uk> wrote at Wed, 2 Sep 1998 10:31:41 +0100:
> [...] storing the Java serialization of DOM-like models on disk [...]
> takes a lot longer than reparsing original XML
=> Put it in a file and reparse

So when it gets big, use a database ? Did I get this wrong and XML 
was never ment to be a storage paradigm ? 

Anyway, I can affirm Michael's results.
We implemented an experimental database storage for SGML with jjc's 
SP and Informix's IUS. It generalizes something similar to David's 
second suggestion. Object-aggregation is done by marking the content 
of specified element types (e.g. <act> in a Shakespeare play) to be 
stored unparsed. When it comes to queries it is reparsed on the fly. 
Kind of automatic object generation.
Queries turned out to become slow when granularity gets less coarse. 
Most navigations trigger child/sibling lookups, which trigger object 
ID table lookups. That's at least one SQL statement firing for every 
DOM navigation call. Caching helps, but doesn't really the problem. 
Trees in RDBM are no fun. Michael writes they are no fun in OODB, 
too. IMHO the good timings in in-memory DOM implementations result 
from the fact that looking up children is a cheap operation. In 
current DB systems it's not cheap at all.

Is anybody aware of literature for efficient addressing in trees ? 
This should help both in-memory DOMs and DBs.

A bit disillusioned,
	++im

--
Ingo Macherius//Dolivostrasse 15//D-64293 Darmstadt//+49-6151-869-882
GMD-IPSI German National Research Center for Information Technology
mailto:macherius@gmd.de http://www.darmstadt.gmd.de/~inim/
Information!=Knowledge!=Wisdom!=Truth!=Beauty!=Love!=Music==BEST (Zappa)

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)





 

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS