OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]
Re: [xml-dev] What is the general direction you are seeing these daysto store and query lots of large complex XML?

Well  I am very grateful for Peter's comments and intrigued intrigued and enthused enough to investigate the architectures he describes  for my current project. Part of the reason relates to an additional shortcoming I found with eXist - the insistence on a strictly hierarchical collection structure. This forces me into choice I do not want to have to make for a movie repository -  whether a rom/com goes in the romantic collection or the comedy collecton - and of course genre is not the only facet worthy of modelling collections on. How well  the product would support a very large number of collections with each piece of data being able to belong to several is a concern that I wouldn't have in a graph based architecture.

I find

Peter made a very interesting assertion:


The analytics should run directly on the data,

not on some extract.


less persuasive. First up it depends on where you get your data from. If it originated as HTML then marshalling it into anything but XML usually entails  the (almost certainly) premature imposition of some sort of schema.

Secondly this somewhat overlooks the  significant data management effort latent  in most Big Data projects. At the very least that amount of data will usually have to go through a significant cleansing process. A very vocal section of the analytics community seem to think this is yet another thing they can do with an R library. Rarely is a dissenting voice ever heard yet the no 1 lament of the very same people lament is the time and effort spent on dealing with unclean data.

We (software development people) acquire a data management capability (Oracle etc ) and build BI and/or analytics tools on top of that. They choose their analytics capability and then find they have to bolt a data management function on top of it. Well I think they've got it arse about face. Doing data management with your analytics tool is just as big (if not a bigger) sin as doing analytics with your data management tool.

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 1993-2007 XML.org. This site is hosted by OASIS