XML.orgXML.org
FOCUS AREAS |XML-DEV |XML.org DAILY NEWSLINK |REGISTRY |RESOURCES |ABOUT
OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]
Re: [xml-dev] What is the general direction you are seeing these daysto store and query lots of large complex XML?

Without knowing all the details, given relatively unchanging I'd likely send up parsing them into a database.  Given relatively flat, that might be Cassandra though the query language might not be sufficient for your needs.Overall, the whole Hadoop related tool chain might be a better fit for the larger problem.  Alternately, something like Titan might be even better but would likely require some tooling to get up and sucking in the data. However, depending on what the SAS and SPSS portions of this are used for it could potentially yield "answers" directly

Peter Hunsberger

On Thu, Mar 5, 2015 at 9:50 AM, Costello, Roger L. <costello@mitre.org> wrote:

Here are some other factors that might enter into your recommendation:

 

Ø  Are these files volatile or static? 

 

The XML files are relatively static - a few are updated for errors but most stay the same.

 

Ø  Are there requirements for further processing or consuming them as XML

Ø  elsewhere or are they just a query source? 

 

The XML files are just a query source. The results of the queries on the XML documents are used as input to SAS and SPSS analytics.

 

Ø  What type of queries, with what frequency?

 

We want multiple people to query multiple times a day. Right now the query frequency is low because the queries take days to run.

 

What’s your recommendation?

 

/Roger

From: Costello, Roger L.
Sent: Thursday, March 05, 2015 10:37 AM
To: xml-dev@lists.xml.org
Subject: RE: [xml-dev] What is the general direction you are seeing these days to store and query lots of large complex XML?

 

Ø  I smell a rat here... where did you get them from.

 

Hmm, no rat here. Those are actual, real numbers. We deal with big data … real big data.

 

/Roger

 

From: Ihe Onwuka [mailto:ihe.onwuka@gmail.com]
Sent: Thursday, March 05, 2015 10:33 AM
To: Costello, Roger L.
Cc:
xml-dev@lists.xml.org
Subject: Re: [xml-dev] What is the general direction you are seeing these days to store and query lots of large complex XML?

 

I smell a rat here... where did you get them from.

 

On Thu, Mar 5, 2015 at 3:26 PM, Costello, Roger L. <costello@mitre.org> wrote:

As always Michael, you are very wise. And I certainly appreciate your graciousness.

I have some figures regarding size and complexity:

        There are 50 million XML files, each 50MB in size.
        The files are mostly flat (not deeply nested).
        We need to perform queries across the 50 million files.

What's your recommendation for storing and querying this huge amount of XML files?

/Roger



_______________________________________________________________________

XML-DEV is a publicly archived, unmoderated list hosted by OASIS
to support XML implementation and development. To minimize
spam in the archives, you must subscribe before posting.

[Un]Subscribe/change address: http://www.oasis-open.org/mlmanage/
Or unsubscribe: xml-dev-unsubscribe@lists.xml.org
subscribe: xml-dev-subscribe@lists.xml.org
List archive: http://lists.xml.org/archives/xml-dev/
List Guidelines: http://www.oasis-open.org/maillists/guidelines.php

 




[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 1993-2007 XML.org. This site is hosted by OASIS