OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]
Re: [xml-dev] XML to graph

That won't be a problem for something like Neo or Titan, load up the data as anonymous nodes. Do a second pass to build the relationships based on the fields you wish to correlate. Individual nodes don't have to share the same fields to be correlated. The issues will be size related.  If you have billions of entries to load use Titan, otherwise Neo will have a faster implementation path and likely scale to 10's of millions. If you're talking those sizes things will be slow (hours or days) until the relationships are built after which sub-second to pull out related data should be possible. The learning curve will be languages like Gremlin or Cypher, though you could also write Java plugins for Neo if need be.

Peter Hunsberger

On Wed, Jul 1, 2015 at 12:04 PM, Ihe Onwuka <ihe.onwuka@gmail.com> wrote:
You will note that the data doesn't have a unique id. Title certainly isn't unique, if you consider how many movies there have been called Batman or Treasure Island.

Now I may encounter data about this movie from another source that covers different facets , for example it's box office takings or movie reviews. 

So it's a classic semantic web application. I want to amalgamate disparate data about the same fact in one entity. As I said I have a transformation that does this but it doesn't scale very well because I have to search the entire movie base to find the best match. To overcome this I have to adopt a mapReduce-ish approach to solve the problem.

The thinking is a graphical representation would eliminate that problem because a graph gives me a persistent data structure  already  indexed for retrieval via several different axes, whereas indexes constructed in the XSLT transformation for the same purpose  are ephemeral and would need to be reconstructed every time you ran the transformation.

On Wed, Jul 1, 2015 at 12:46 PM, Peter Hunsberger <peter.hunsberger@gmail.com> wrote:
Should be pretty straight forward to import that into Neo4J or Titan.  Neo might be simplest, in particular via conversion of the data into JSON.  However, Titan might give you other capabilities such as using Hadoop type processing either for import or for subsequent analytics. Without knowing more about the business requirements can't really give you much more than that...

Peter Hunsberger

On Wed, Jul 1, 2015 at 11:32 AM, Ihe Onwuka <ihe.onwuka@gmail.com> wrote:
I would like  to convert the XML snippet below to a multi-relational graph representation. 
One way is to transform a triple store via RDF. Another which I am less familiar with is to transform to graphML followed by a subsequent import into some graph database tool.

The graphical representation is desirable for processing rather than visualization reasons. Chiefly I have a matching algorthim implemented in XSLT which works fine but doesn't scale well, a problem that I think can be solved with a graphical representation. 

I am keen to hear from my elders and betters on the subject.

<movie title="20000 lieues sous les mers">
<person name="Méliès, Georges"/>
<title title="20,000 Leagues Under the Sea " year="1907"/>
<title title="Amid the Workings of the Deep " year="1907"/>
<title title="Deux cent mille lieues sous les mers " year="1907"/>
<title title="Le cauchemar d'un pêcheur " year="1907"/>
<title title="Under the Seas " year="1907"/>
<person name="Méliès, Georges"/>
<tag name="adventure"/>
<tag name="fantasy"/>
<tag name="sci-fi"/>
<tag name="short"/>
<tag name="based-on-novel"/>
<tag name="dream"/>
<tag name="fish"/>
<tag name="number-in-title"/>
<tag name="submarine"/>
<tag name="undersea-monster"/>
<tag name="underwater"/>
<person name="Méliès, Georges"/>

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 1993-2007 XML.org. This site is hosted by OASIS