XML.orgXML.org
FOCUS AREAS |XML-DEV |XML.org DAILY NEWSLINK |REGISTRY |RESOURCES |ABOUT
OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]
Re: [xml-dev] What is the general direction you are seeing thesedays to store and query lots of large complex XML?

On 03/05/2015 11:20 AM, Peter Hunsberger wrote:
Without knowing all the details, given relatively unchanging I'd likely
send up parsing them into a database.
Yep - this is the answer I keep hearing from people, with the exception of people who have mixed content and tend to think of it all as documents. Even deeply nested but regular structures often find themselves parsed into databases and reassembled when necessary.

Thanks,
Simon


Given relatively flat, that might
be Cassandra though the query language might not be sufficient for your
needs.Overall, the whole Hadoop related tool chain might be a better fit
for the larger problem.  Alternately, something like Titan might be even
better but would likely require some tooling to get up and sucking in
the data. However, depending on what the SAS and SPSS portions of this
are used for it could potentially yield "answers" directly

Peter Hunsberger

On Thu, Mar 5, 2015 at 9:50 AM, Costello, Roger L. <costello@mitre.org
<mailto:costello@mitre.org>> wrote:

    Here are some other factors that might enter into your
    recommendation:____

    __ __

    __Ø__Are these files volatile or static? ____

    __ __

    The XML files are relatively static - a few are updated for errors
    but most stay the same.____

    __ __

    __Ø__Are there requirements for further processing or consuming them
    as XML ____

    __Ø__elsewhere or are they just a query source? ____

    __ __

    The XML files are just a query source. The results of the queries on
    the XML documents are used as input to SAS and SPSS analytics.____

    __ __

    __Ø__What type of queries, with what frequency?____

    __ __

    We want multiple people to query multiple times a day. Right now the
    query frequency is low because the queries take days to run.____

    __ __

    What’s your recommendation?____

    __ __

    /Roger____

    *From:*Costello, Roger L.
    *Sent:* Thursday, March 05, 2015 10:37 AM
    *To:* xml-dev@lists.xml.org <mailto:xml-dev@lists.xml.org>
    *Subject:* RE: [xml-dev] What is the general direction you are
    seeing these days to store and query lots of large complex XML?____

    __ __

    __Ø__I smell a rat here... where did you get them from.____

    __ __

    Hmm, no rat here. Those are actual, real numbers. We deal with big
    data … real big data.____

    __ __

    /Roger____

    __ __

    *From:*Ihe Onwuka [mailto:ihe.onwuka@gmail.com]
    *Sent:* Thursday, March 05, 2015 10:33 AM
    *To:* Costello, Roger L.
    *Cc:* xml-dev@lists.xml.org <mailto:xml-dev@lists.xml.org>
    *Subject:* Re: [xml-dev] What is the general direction you are
    seeing these days to store and query lots of large complex XML?____

    __ __

    I smell a rat here... where did you get them from.____

    __ __

    On Thu, Mar 5, 2015 at 3:26 PM, Costello, Roger L.
    <costello@mitre.org <mailto:costello@mitre.org>> wrote:____

        As always Michael, you are very wise. And I certainly appreciate
        your graciousness.

        I have some figures regarding size and complexity:

                 There are 50 million XML files, each 50MB in size.
                 The files are mostly flat (not deeply nested).
                 We need to perform queries across the 50 million files.

        What's your recommendation for storing and querying this huge
        amount of XML files?

        /Roger____



        _______________________________________________________________________

        XML-DEV is a publicly archived, unmoderated list hosted by OASIS
        to support XML implementation and development. To minimize
        spam in the archives, you must subscribe before posting.

        [Un]Subscribe/change address: http://www.oasis-open.org/mlmanage/
        Or unsubscribe: xml-dev-unsubscribe@lists.xml.org
        <mailto:xml-dev-unsubscribe@lists.xml.org>
        subscribe: xml-dev-subscribe@lists.xml.org
        <mailto:xml-dev-subscribe@lists.xml.org>
        List archive: http://lists.xml.org/archives/xml-dev/
        List Guidelines:
        http://www.oasis-open.org/maillists/guidelines.php____

    __ __




[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 1993-2007 XML.org. This site is hosted by OASIS