Re: [xml-dev] What is the general direction you are seeing thesedays to store and query lots of large complex XML?
be Cassandra though the query language might not be sufficient for your
needs.Overall, the whole Hadoop related tool chain might be a better fit
for the larger problem. Alternately, something like Titan might be even
better but would likely require some tooling to get up and sucking in
the data. However, depending on what the SAS and SPSS portions of this
are used for it could potentially yield "answers" directly
Peter Hunsberger
On Thu, Mar 5, 2015 at 9:50 AM, Costello, Roger L. <costello@mitre.org
<mailto:costello@mitre.org>> wrote:
Here are some other factors that might enter into your
recommendation:____
__ __
__Ø__Are these files volatile or static? ____
__ __
The XML files are relatively static - a few are updated for errors
but most stay the same.____
__ __
__Ø__Are there requirements for further processing or consuming them
as XML ____
__Ø__elsewhere or are they just a query source? ____
__ __
The XML files are just a query source. The results of the queries on
the XML documents are used as input to SAS and SPSS analytics.____
__ __
__Ø__What type of queries, with what frequency?____
__ __
We want multiple people to query multiple times a day. Right now the
query frequency is low because the queries take days to run.____
__ __
What’s your recommendation?____
__ __
/Roger____
*From:*Costello, Roger L.
*Sent:* Thursday, March 05, 2015 10:37 AM
*To:* xml-dev@lists.xml.org <mailto:xml-dev@lists.xml.org>
*Subject:* RE: [xml-dev] What is the general direction you are
seeing these days to store and query lots of large complex XML?____
__ __
__Ø__I smell a rat here... where did you get them from.____
__ __
Hmm, no rat here. Those are actual, real numbers. We deal with big
data … real big data.____
__ __
/Roger____
__ __
*From:*Ihe Onwuka [mailto:ihe.onwuka@gmail.com]
*Sent:* Thursday, March 05, 2015 10:33 AM
*To:* Costello, Roger L.
*Cc:* xml-dev@lists.xml.org <mailto:xml-dev@lists.xml.org>
*Subject:* Re: [xml-dev] What is the general direction you are
seeing these days to store and query lots of large complex XML?____
__ __
I smell a rat here... where did you get them from.____
__ __
On Thu, Mar 5, 2015 at 3:26 PM, Costello, Roger L.
<costello@mitre.org <mailto:costello@mitre.org>> wrote:____
As always Michael, you are very wise. And I certainly appreciate
your graciousness.
I have some figures regarding size and complexity:
There are 50 million XML files, each 50MB in size.
The files are mostly flat (not deeply nested).
We need to perform queries across the 50 million files.
What's your recommendation for storing and querying this huge
amount of XML files?
/Roger____
_______________________________________________________________________
XML-DEV is a publicly archived, unmoderated list hosted by OASIS
to support XML implementation and development. To minimize
spam in the archives, you must subscribe before posting.
[Un]Subscribe/change address: http://www.oasis-open.org/mlmanage/
Or unsubscribe: xml-dev-unsubscribe@lists.xml.org
<mailto:xml-dev-unsubscribe@lists.xml.org>
subscribe: xml-dev-subscribe@lists.xml.org
<mailto:xml-dev-subscribe@lists.xml.org>
List archive: http://lists.xml.org/archives/xml-dev/
List Guidelines:
http://www.oasis-open.org/maillists/guidelines.php____
__ __