OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]
Re: [xml-dev] What is the general direction you are seeing thesedays to store and query lots of large complex XML?

On Wed, 2015-03-11 at 12:11 +0000, Ihe Onwuka wrote:
[...] an additional 
> shortcoming I
> found with eXist - the insistence on a strictly hierarchical 
> collection structure. This forces me into choice I do not want to 
> have to make for a
> movie repository -  whether a rom/com goes in the romantic 
> collection or the comedy collecton - and of course genre is not the 
> only facet worthy of
> modelling collections on.

For http://www.fromoldbooks.org/Search/ I modeled this with keywords 
associated with each item; for more rigid categories, have a list of 
categories each with an ID, and point to a list of categories from the 
item,e.g.  using IDREFS if you're using validation.

Then I only use one database collection.

To some extent this is a sort of "XML normal form" (although not in 
the sense of Henry's paper I think) in that there are similar 
principles as for relational data, that information should not be 
repeated, should be broken down so you can process it at the right 
level, and everything uniquely addressable (yay XPath).

> I find
> Peter made a very interesting assertion:
> > 
> > The analytics should run directly on the data,
> > not on some extract.
> > 
> less persuasive.

Same here. People have been assembling views of data for a long time - 
sometimes for performance, sometimes to facilitate later queries, 
sometimes to augment some parts and reduce others.

Having said this, it's always a pain when you want to do a search and 
the data has been removed, like searching for "to be or not to be" in 
full text systems using stopwords [1].


[1] I've also used systems that took "to be or not to be" as a rather 
useless Boolean search containing an "or" operator...

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 1993-2007 XML.org. This site is hosted by OASIS