Lists Home |
Date Index |
On Mon, 2003-08-25 at 14:18, firstname.lastname@example.org wrote:
> For sure, people are trying to do terabyte systems that integrate
> normalized data and legacy document data stores (such as the journal
> "Nature"), primarily in XML, that is to say by converting out of an RDBMS
> like Oracle and into pure XML docs without a dbms back end. This is the
> source of my concern. They need some guidance, and will certainly receive
> it, in one form (gentle comments in forums like this one) or another (when
> their systems fail in production, or worse, never get past failed prototype
Do you notice that this is a primarily a TEXT!!!! system. Sure the
metadata will go into Oracle nicely, but if what you care about are the
articles, it sure sounds plausible to me that one would use a storage
system optimized for the articles, and then shoehorn in the metadata,
rather than use an RDBMS and shoehorn in the corpus.
Oh and for what it's worth, I have plenty of anecdotal information that
bibliographic metadata was a very hard problem for the relational model.
It may have been solved in the last few years, but it was still a
storage-eating performance killer ten years after the RM was the 'best
practice' for 'widgets in warehouses.'
> Still, I remain open minded, and if someone can offer proofs that support:
> - XML as a best practice (in any regard)
Publishing/documents. I agree, Nature should maintain their subscriber
list in Oracle. That's not where the articles belong.