XML.orgXML.org
FOCUS AREAS |XML-DEV |XML.org DAILY NEWSLINK |REGISTRY |RESOURCES |ABOUT
OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]
Re: [xml-dev] What is the general direction you are seeing these daysto store and query lots of large complex XML?

Hi Roger,

Interesting data set. Querying 2.4PB (2400TB) of data is going to take a 
long time regardless of how you do it.

I'm assuming that duplicating that much data is off the table, but if not 
you would do best to optimise the storage mechanism to suit the 
query/retrieval access patterns. MarkLogic is marketed as petabyte-scale, 
but XML may not be the most optimal representation of the data, it is 
worth considering the alternatives.

Distributed storage and query are probably the way forward, and xml-dev 
may not be the place for that discussion. If not already I'd be looking 
into Hadoop MapReduce, Presto, etc. Starter high-level discussion of 
querying 300PB data warehouse here: 
https://www.facebook.com/notes/facebook-engineering/presto-interacting-with
-petabytes-of-data-at-facebook/10151786197628920


// Gareth Oakes
// Chief Architect, GPSL
// www.gpsl.co




On 6/03/2015 07:48, "Costello, Roger L." <costello@mitre.org> wrote:

>Hi Peter,
>
>> Does every query always query every document?
>
>The queries are across many or all of the 50 million XML documents.
>
>> Does every document use the same schema? 
>
>No, there are 40 XML Schemas. 
>
>> If not, how widely varied are they?
>
>The XML Schemas are quite similar.
>
>Any recommendations Peter? Anyone?
>
>/Roger
>
>-----Original Message-----
>From: Peter Flynn [mailto:peter@silmaril.ie] 
>Sent: Thursday, March 05, 2015 3:40 PM
>To: xml-dev@lists.xml.org
>Subject: Re: [xml-dev] What is the general direction you are seeing these 
>days to store and query lots of large complex XML?
>
>On 03/05/2015 04:49 PM, Costello, Roger L. wrote:
>>> Do you need full text querying? XPath?
>> 
>> The queries are done using XPath and XQuery.
>> 
>>> Do you know or care what the document vocabularies are?
>> 
>> The XML elements and attributes are very well known. The structure of 
>>the XML is well known.
>
>Does every query always query every document? Or are there known subsets
>that are used for certain types of queries?
>
>Does every document use the same schema? If not, how widely varied are 
>they?
>
>///Peter
>
>
>_______________________________________________________________________
>
>XML-DEV is a publicly archived, unmoderated list hosted by OASIS
>to support XML implementation and development. To minimize
>spam in the archives, you must subscribe before posting.
>
>[Un]Subscribe/change address: http://www.oasis-open.org/mlmanage/
>Or unsubscribe: xml-dev-unsubscribe@lists.xml.org
>subscribe: xml-dev-subscribe@lists.xml.org
>List archive: http://lists.xml.org/archives/xml-dev/
>List Guidelines: http://www.oasis-open.org/maillists/guidelines.php
>
>_______________________________________________________________________
>
>XML-DEV is a publicly archived, unmoderated list hosted by OASIS
>to support XML implementation and development. To minimize
>spam in the archives, you must subscribe before posting.
>
>[Un]Subscribe/change address: http://www.oasis-open.org/mlmanage/
>Or unsubscribe: xml-dev-unsubscribe@lists.xml.org
>subscribe: xml-dev-subscribe@lists.xml.org
>List archive: http://lists.xml.org/archives/xml-dev/
>List Guidelines: http://www.oasis-open.org/maillists/guidelines.php


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 1993-2007 XML.org. This site is hosted by OASIS