Re: [xml-dev] What is the general direction you are seeing these daysto

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]

Re: [xml-dev] What is the general direction you are seeing these daysto store and query lots of large complex XML?

From: Gareth Oakes <goakes@gpsl.co>
To: "Costello, Roger L." <costello@mitre.org>, "xml-dev@lists.xml.org"<xml-dev@lists.xml.org>
Date: Fri, 6 Mar 2015 00:05:51 +0000

Hi Roger,

Interesting data set. Querying 2.4PB (2400TB) of data is going to take a 
long time regardless of how you do it.

I'm assuming that duplicating that much data is off the table, but if not 
you would do best to optimise the storage mechanism to suit the 
query/retrieval access patterns. MarkLogic is marketed as petabyte-scale, 
but XML may not be the most optimal representation of the data, it is 
worth considering the alternatives.

Distributed storage and query are probably the way forward, and xml-dev 
may not be the place for that discussion. If not already I'd be looking 
into Hadoop MapReduce, Presto, etc. Starter high-level discussion of 
querying 300PB data warehouse here: 
https://www.facebook.com/notes/facebook-engineering/presto-interacting-with
-petabytes-of-data-at-facebook/10151786197628920


// Gareth Oakes
// Chief Architect, GPSL
// www.gpsl.co




On 6/03/2015 07:48, "Costello, Roger L." <costello@mitre.org> wrote:

>Hi Peter,
>
>> Does every query always query every document?
>
>The queries are across many or all of the 50 million XML documents.
>
>> Does every document use the same schema? 
>
>No, there are 40 XML Schemas. 
>
>> If not, how widely varied are they?
>
>The XML Schemas are quite similar.
>
>Any recommendations Peter? Anyone?
>
>/Roger
>
>-----Original Message-----
>From: Peter Flynn [mailto:peter@silmaril.ie] 
>Sent: Thursday, March 05, 2015 3:40 PM
>To: xml-dev@lists.xml.org
>Subject: Re: [xml-dev] What is the general direction you are seeing these 
>days to store and query lots of large complex XML?
>
>On 03/05/2015 04:49 PM, Costello, Roger L. wrote:
>>> Do you need full text querying? XPath?
>> 
>> The queries are done using XPath and XQuery.
>> 
>>> Do you know or care what the document vocabularies are?
>> 
>> The XML elements and attributes are very well known. The structure of 
>>the XML is well known.
>
>Does every query always query every document? Or are there known subsets
>that are used for certain types of queries?
>
>Does every document use the same schema? If not, how widely varied are 
>they?
>
>///Peter
>
>
>_______________________________________________________________________
>
>XML-DEV is a publicly archived, unmoderated list hosted by OASIS
>to support XML implementation and development. To minimize
>spam in the archives, you must subscribe before posting.
>
>[Un]Subscribe/change address: http://www.oasis-open.org/mlmanage/
>Or unsubscribe: xml-dev-unsubscribe@lists.xml.org
>subscribe: xml-dev-subscribe@lists.xml.org
>List archive: http://lists.xml.org/archives/xml-dev/
>List Guidelines: http://www.oasis-open.org/maillists/guidelines.php
>
>_______________________________________________________________________
>
>XML-DEV is a publicly archived, unmoderated list hosted by OASIS
>to support XML implementation and development. To minimize
>spam in the archives, you must subscribe before posting.
>
>[Un]Subscribe/change address: http://www.oasis-open.org/mlmanage/
>Or unsubscribe: xml-dev-unsubscribe@lists.xml.org
>subscribe: xml-dev-subscribe@lists.xml.org
>List archive: http://lists.xml.org/archives/xml-dev/
>List Guidelines: http://www.oasis-open.org/maillists/guidelines.php

Follow-Ups:
- Re: [xml-dev] What is the general direction you are seeing thesedays to store and query lots of large complex =?UTF-8?Q?XML=3F?=
  - From: liam <liam@w3.org>

References:
- RE: [xml-dev] What is the general direction you are seeing thesedays to store and query lots of large complex XML?
  - From: "Costello, Roger L." <costello@mitre.org>

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]