OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]
Re: [xml-dev] What is the general direction you are seeing these daysto store and query lots of large complex XML?

Yes, unless there is a need to forward on the XML to some other endpoint I can't really see why it would need to stay as XML?  Roger says his data has shallow structure so just Casandra may suffice.  Pure relational seems expensive for the volumes he has, though as normalized data the real volumes may be significantly smaller. If you need to keep some sense of the original structure and, in particular, if the document has deep structure then a graph database is the way to go. I mentioned Titan, you mention Neo4J. Personally, I have always wanted to link Cassandra with Titan directly in a low level hybrid model but now that Aurelius has been acquired by DataStax that may be unnecessary;they are hinting they will do that.  That, IMO, could prove to be the ultimate in flexible, scalable databases for which is why I lean towards Titan these days....

In any case, the real question is what's the real end use for all this data?  Roger says it get's pulled into SAS and SPSS which suggests some form of further analytics.  Again, that suggests to me that pulling this into something like Titan and using Faunus on Hadoop to get the end analytics directly might be the overall best solution to the entire problem.

Peter Hunsberger

On Sun, Mar 8, 2015 at 2:37 PM, Simon St.Laurent <simonstl@simonstl.com> wrote:

On 03/07/2015 09:05 PM, Liam R. E. Quin wrote:
On Sat, 2015-03-07 at 17:03 -0500, Simon St.Laurent wrote:
On 03/05/2015 11:20 AM, Peter Hunsberger wrote:
Without knowing all the details, given relatively unchanging I'd
likely send up parsing them into a database.

Yep - this is the answer I keep hearing from people, with the
exception of people who have mixed content and tend to think of it
all as documents.

Peter didn't say relational database... although he did mention Hadoop
I think.

There isn't a single right answer.

I agree.  I'm not hearing so much about relational databases per se as I am about shredding approaches, with data stored in anything from MySQL to Cassandra to Neo4j to custom systems that sound both terrifying and unnecessary.

Somehow mixed content leads people to think of chunks as coherent pieces, and some of the shredding cases I've discussed with people keep large pieces of mixed content, typically (X)HTML, in the same shredded bit rather tearing deeper into it.

So for Roger I'd say that I think the direction I see is more hybrid
stores (from Virtuoso to MarkLogic..) and more variation being
acceptable as people come to recognize that different needs are best
served with different technologies.

Yep.  I suspect the underlying reality is that people are doing so many things with so many different kinds of parts that solutions are evolving in multiple directions.

Which is pretty cool, actually!  Though it creates vast confusion for organizations that just want an answer yesterday.



XML-DEV is a publicly archived, unmoderated list hosted by OASIS
to support XML implementation and development. To minimize
spam in the archives, you must subscribe before posting.

[Un]Subscribe/change address: http://www.oasis-open.org/mlmanage/
Or unsubscribe: xml-dev-unsubscribe@lists.xml.org
subscribe: xml-dev-subscribe@lists.xml.org
List archive: http://lists.xml.org/archives/xml-dev/
List Guidelines: http://www.oasis-open.org/maillists/guidelines.php

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 1993-2007 XML.org. This site is hosted by OASIS