[ANN] Rumble 1.1 -- switched to DataFrames, and 2x faster

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]

From: "Ghislain Fourny" <gfourny@inf.ethz.ch>
To: "talk@x-query.com" <talk@x-query.com>, "xml-dev@lists.xml.org"<xml-dev@lists.xml.org>
Date: Thu, 8 Aug 2019 12:44:07 +0000

Dear all,

I am happy to announce the release of Rumble 1.1 beta, the JSONiq engine that queries heterogeneous and nested JSON data on top of Apache Spark.

Until version 1.0, FLWOR expressions were mapped to Spark RDDs.

But in a student project last semester, Can managed to remap FLWOR expressions to DataFrames, while preserving intact support for heterogeneous data. The result is Rumble 1.1, with a notable performance improvement: twice as fast for grouping and sorting.

From the user's perspective, nothing changes -- except the speed.

These are a few examples of use cases that show how the JSONiq syntax (95% inherited from XQuery) is as compact as SQL, but seamlessly deals with heterogeneity and nestedness:

1. How many persons in my dataset?

count(json-file("persons.json"))

2. What are all the cities they come from?

distinct-values(json-file("persons.json").addresses[].city)

3. How many persons in each country?

for $i json-file("persons.json")
group by $c := $i.country
return {
  "Country" : $c,
  "Number" : count($i)
}

If you want to try it out (no need for a cluster, it also spreads computation on your local cores), you can download it for free (open source) here:

http://rumbledb.org/

Thanks and kind regards,
Ghislain

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]