[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]
Re: [xml-dev] What is the general direction you are seeing thesedays to store and query lots of large complex XML?
- From: "Liam R. E. Quin" <liam@w3.org>
- To: ihe.onwuka@gmail.com
- Date: Wed, 11 Mar 2015 14:52:07 -0400
On Wed, 2015-03-11 at 12:11 +0000, Ihe Onwuka wrote:
>
[...] an additional
> shortcoming I
> found with eXist - the insistence on a strictly hierarchical
> collection structure. This forces me into choice I do not want to
> have to make for a
> movie repository - whether a rom/com goes in the romantic
> collection or the comedy collecton - and of course genre is not the
> only facet worthy of
> modelling collections on.
For http://www.fromoldbooks.org/Search/ I modeled this with keywords
associated with each item; for more rigid categories, have a list of
categories each with an ID, and point to a list of categories from the
item,e.g. using IDREFS if you're using validation.
Then I only use one database collection.
To some extent this is a sort of "XML normal form" (although not in
the sense of Henry's paper I think) in that there are similar
principles as for relational data, that information should not be
repeated, should be broken down so you can process it at the right
level, and everything uniquely addressable (yay XPath).
>
> I find
>
> Peter made a very interesting assertion:
> >
> > The analytics should run directly on the data,
> > not on some extract.
> >
> less persuasive.
Same here. People have been assembling views of data for a long time -
sometimes for performance, sometimes to facilitate later queries,
sometimes to augment some parts and reduce others.
Having said this, it's always a pain when you want to do a search and
the data has been removed, like searching for "to be or not to be" in
full text systems using stopwords [1].
Liam
[1] I've also used systems that took "to be or not to be" as a rather
useless Boolean search containing an "or" operator...
[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]