XML.orgXML.org
FOCUS AREAS |XML-DEV |XML.org DAILY NEWSLINK |REGISTRY |RESOURCES |ABOUT
OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]
[ANN] Rumble 1.1 -- switched to DataFrames, and 2x faster

Dear all,

I am happy to announce the release of Rumble 1.1 beta, the JSONiq engine that queries heterogeneous and nested JSON data on top of Apache Spark.

Until version 1.0, FLWOR expressions were mapped to Spark RDDs.

But in a student project last semester, Can managed to remap FLWOR expressions to DataFrames, while preserving intact support for heterogeneous data. The result is Rumble 1.1, with a notable performance improvement: twice as fast for grouping and sorting.

From the user's perspective, nothing changes -- except the speed.

These are a few examples of use cases that show how the JSONiq syntax (95% inherited from XQuery) is as compact as SQL, but seamlessly deals with heterogeneity and nestedness:

1. How many persons in my dataset?

count(json-file("persons.json"))

2. What are all the cities they come from?

distinct-values(json-file("persons.json").addresses[].city)

3. How many persons in each country?

for $i json-file("persons.json")
group by $c := $i.country
return {
  "Country" : $c,
  "Number" : count($i)
}

If you want to try it out (no need for a cluster, it also spreads computation on your local cores), you can download it for free (open source) here:

http://rumbledb.org/

Thanks and kind regards,
Ghislain



[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 1993-2007 XML.org. This site is hosted by OASIS