OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help



   Re: Storage of XML documents & Learning from history

[ Lists Home | Date Index | Thread Index ]
  • From: Lars Marius Garshol <larsga@garshol.priv.no>
  • To: xml-dev@xml.org
  • Date: 15 May 2000 14:26:39 +0200

* Dylan Walsh
| People are figuring this out now, but it seems likely that this is
| something that SGMLers hit years ago. Those who do not study history
| are destined to repeat it. What are the lessons that have been
| learned?

SGMLers did indeed hit this years ago, but they did so in the specific
field of document management. The results were pretty much as
described by Michael Kay, with one exception: several companies claim
to have been relatively successful with fine-grained storage in RDBMSs.
And indeed they seem to be no worse off than those who chose OODBMSs.

The specific solutions to the other problems depend very much on your
type of data, and as I said, SGMLers did this with documents[1]. We
implemented searching as a hybrid solution: RDBMS-based searches in
the document metadata and using a SGML-aware text search engine for
document content searches. This has worked fairly well, although one
could still wish for improvements.

The central question seems to me to be what kind of data is involved
and what kind of representation of that data is the most natural to
use.  Ie: is XML just a convenient serialization syntax, or are these
data fundamentally tied to the XML data model somehow? Documents would
be tied to the data model[3], whereas most other kinds of data would

In the cases where XML really is the fundamental data model used,
storage as XML makes a lot of sense, and in those cases blob-based
storage or native XML databases might both be sensible choices.

Tamino is indeed an interesting choice for this kind of application,
although what it offers beyond the blob solution seems to me mainly to
be that one doesn't have to implement check-in/check-out on top of the
database and also that the searching is already in place.

For cases where XML does not have to be the underlying data model it
is harder to say something general about what would be the right
solutions, and in any case, this is not something the SGML community
has a lot of experience with (as far as I know, anyway). Lore[2] might
be a very interesting tool for these kinds of solutions, perhaps more
so for this kind of data than for documents.

--Lars M.

[1] Documents as in: pieces of text intended to be read by humans,
    containing running text divided into paragraphs and other document

[2] <URL: http://www-db.Stanford.edu/lore/ >

[3] Probably not so much by necessity, but more because that has
    traditionally been the way it has been done. I've done some simple
    experiments with modelling document structure in EXPRESS and it
    seems as though that could work very well. Of course, the tools
    are missing, but in theory it should work.

This is xml-dev, the mailing list for XML developers.
To unsubscribe, mailto:majordomo@xml.org&BODY=unsubscribe%20xml-dev
List archives are available at http://xml.org/archives/xml-dev/


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS