Re: [xml-dev] JSON - The Fat Free Alternative

On Tue, Sep 30, 2014 at 8:54 PM, James Fuller <james.fuller.2007@gmail.com> wrote:

I agree with Michael's assertion and would add that sometimes subtleties (like the truth) gets lost in the headlong rush forward.

Part of the problem is using opaque, non specific terms (like 'fat') to describe something ... 'fat' really could mean anything or nothing in terms of computing.

and this is where I bring out David Lee's excellent balisage paper

http://www.balisage.net/Proceedings/vol10/html/Lee01/BalisageVol10-Lee01.html

What is clear is that there is a cost to modelling something as a fully fledged document versus data and when confronted with a choice, developers choosing json believe they are mitigating their risk (esp. true if your toolchain and execution environment of choice marshals json objects around).

What is interesting to watch is the rarer occasion when someone has chosen json to represent document data ... things can get complicated quickly indeed. Even then the cost of transforming to higher fidelity format may not be as cost effective as everyone thinks.

What is fun to watch is the recent upsurge in the use of Web components; watching developers have fun again with markup is great... which is more to say about the 'semantics' of using markup then anything.

I think the performance argument has always been a red herring ... important for some, but not very relevant for most.

Jim Fuller
On Tue, Sep 30, 2014 at 9:34 PM, Michael Kay <mike@saxonica.com> wrote:
I think the "fat" referred to by the phrase "fat-free alternative to XML" has very little to do with data size, it has much more to do with complexity: the fact that the JSON specification fits on one sheet of paper, whereas XML extends to thousands of pages if you include the whole stack.

Michael Kay
Saxonica
mike@saxonica.com
+44 (0) 118 946 5893
On 30 Sep 2014, at 19:25, Ihe Onwuka <ihe.onwuka@gmail.com> wrote:
So the story goes something like this.
I get into one of these JSON is better/slimmer/faster - oh no it isn't arguments and we are getting entrenched in our respective positions when I realise that I actually have data that can test the slimmer argument,  created not for the purpose of a benchmark but as a solution to a problem I had.
I am creating a movie data mashup and neet to integrate movie data from a JSON repository with some XML movie data. The mashup is an XSLT transformation so the JSON has to be converted. The problem I had was apres download and conversion the XML file was too big for the XSLT processor and I was getting heap space errors. 
Here is a snippet of JSON data for one movie.
{
  "result": [
    {
      "initial_release_date": "2006-11-30",
      "rottentomatoes_id": [],
      "key": [{
        "namespace": "/authority/imdb/title",
        "value": "tt0259822"
      }],
      "name": ".45",
      "type": "/film/film",
      "starring": [
        {
          "actor": [{
            "/common/topic/alias": [
              "Milla",
              "Milica Natasha Jovovich",
              "Milica Jovović",
              "Milla Yovovich",
              "Reigning Queen of Kick-Butt",
              "Milica Nataša Jovović"
            ],
            "name": "Milla Jovovich"
          }]
        },
        {
          "actor": [{
            "/common/topic/alias": [
              "Angus McFadyen",
              "Angus MacFadyen"
            ],
            "name": "Angus Macfadyen"
          }]
        },
        {
          "actor": [{
            "/common/topic/alias": [
              "Aisha N. Tyler"
            ]i
            "name": "Aisha Tyler"
          }]
        },
        {
          "actor": [{
            "/common/topic/alias": [
              "Stephen Dorff Jr.",
              "Brad Matlock"
            ],
            "name": "Stephen Dorff"
          }]
        },
        {
          "actor": [{
            "/common/topic/alias": [],
            "name": "Sarah Strange"
          }]
        },
        {
          "actor": [{
            "/common/topic/alias": [
              "Vincent LaResca",
              "Vinnie the kid"
            ],
            "name": "Vincent Laresca"
          }]
        },
        {
          "actor": [{
            "/common/topic/alias": [
              "Dawn Greenhall",
              "Hazel Dawn Greenhalgh"
            ],
            "name": "Dawn Greenhalgh"
          }]
        },
        {
          "actor": [{
            "/common/topic/alias": [
              "Nola Auguston"
            ],
            "name": "Nola Augustson"
          }]
        },
        {
          "actor": [{
            "/common/topic/alias": [
              "Katherine Mary Craven Hawtrey",
              "Kay Hartrey",
              "Kay Hawtry",
              "Katherine Hawtrey"
            ],
            "name": "Kay Hawtrey"
          }]
        },
        {
          "actor": [{
            "/common/topic/alias": [],
            "name": "Shawn Campbell"
          }]
        }
      ],
      "mid": "/m/0c2l1s",
      "directed_by": [{
        "/common/topic/alias": [],
        "name": "Gary Lennon"
      }]
    }
Following a naive JSON to XML convesion by yours truly I produced this.

<result>
<item>
<key>
<item>
<value>tt0259822</value>
<namespace>/authority/imdb/title</namespace>
</item>
</key>
<type>/film/film</type>
<name>.45</name>
<starring>
<item>
<actor>
<item>
<alias>
<item>Milla</item>
<item>Milica Natasha Jovovich</item>
<item>Milica Jovović</item>
<item>Milla Yovovich</item>
<item>Reigning Queen of Kick-Butt</item>
<item>Milica Nataša Jovović</item>
</alias>
<name>Milla Jovovich</name>
</item>
</actor>
</item>
<item>
<actor>
<item>
<alias>
<item>Angus McFadyen</item>
<item>Angus MacFadyen</item>
</alias>
<name>Angus Macfadyen</name>
</item>
</actor>
</item>
<item>
<actor>
<item>
<alias>
<item>Aisha N. Tyler</item>
</alias>
<name>Aisha Tyler</name>
</item>
</actor>
</item>
<item>
<actor>
<item>
<alias>
<item>Stephen Dorff Jr.</item>
<item>Brad Matlock</item>
</alias>
<name>Stephen Dorff</name>
</item>
</actor>
</item>
<item>
<actor>
<item>
<alias/>
<name>Sarah Strange</name>
</item>
</actor>
</item>
<item>
<actor>
<item>
<alias>
<item>Vincent LaResca</item>
<item>Vinnie the kid</item>
</alias>
<name>Vincent Laresca</name>
</item>
</actor>
</item>
<item>
<actor>
<item>
<alias>
<item>Dawn Greenhall</item>
</alias>
<name>Dawn Greenhalgh</name>
</item>
</actor>
</item>
<item>
<actor>
<item>
<alias>
<item>Nola Auguston</item>
</alias>
<name>Nola Augustson</name>
</item>
</actor>
</item>
<item>
<actor>
<item>
<alias>
<item>Katherine Mary Craven Hawtrey</item>
<item>Kay Hartrey</item>
<item>Kay Hawtry</item>
<item>Katherine Hawtrey</item>
</alias>
<name>Kay Hawtrey</name>
</item>
</actor>
</item>
<item>
<actor>
<item>
<alias/>
<name>Shawn Campbell</name>
</item>
</actor>
</item>
</starring>
<directed_by>
<item>
<alias/>
<name>Gary Lennon</name>
</item>
</directed_by>
<initial_release_date>2006-11-30</initial_release_date>
<alias/>
<mid>/m/0c2l1s</mid>
<rottentomatoes_id/>
</item>

There were a hundred movies per file and the JSON data came in at 325k, by the time it had been converted to the XML above it had ballooned to 1.16MB.

My aim was to compact the XML sufficiently to allow a single transformation to accept the contents of about 1800 or so such files. So here is the data pre the compacting transformation

ihe@ihe-ThinkPad-T410:~/film$ ls rawFreebase/1.xml -l
-rw-r--r-- 1 ihe ihe 1160691 Aug 29 06:46 rawFreebase/1.xml

after the compacting, which was supposed to be lossless the data looked like this

<movie imdb="tt0259822" name=".45" mid="/m/0c2l1s" date="2006-11-30">
<actor name="Milla Jovovich">
<alias>Milla</alias>
<alias>Milica Natasha Jovovich</alias>
<alias>Milica Jovović</alias>
<alias>Milla Yovovich</alias>
<alias>Reigning Queen of Kick-Butt</alias>
<alias>Milica Nataša Jovović</alias>
</actor>
<actor name="Angus Macfadyen">
<alias>Angus McFadyen</alias>
<alias>Angus MacFadyen</alias>
</actor>
<actor name="Aisha Tyler">
<alias>Aisha N. Tyler</alias>
</actor>
<actor name="Stephen Dorff">
<alias>Stephen Dorff Jr.</alias>
<alias>Brad Matlock</alias>
</actor>
<actor name="Sarah Strange"/>
<actor name="Vincent Laresca">
<alias>Vincent LaResca</alias>
<alias>Vinnie the kid</alias>
</actor>
<actor name="Dawn Greenhalgh">
<alias>Dawn Greenhall</alias>
</actor>
<actor name="Nola Augustson">
<alias>Nola Auguston</alias>
</actor>
<actor name="Kay Hawtrey">
<alias>Katherine Mary Craven Hawtrey</alias>
<alias>Kay Hartrey</alias>
<alias>Kay Hawtry</alias>
<alias>Katherine Hawtrey</alias>
</actor>
<actor name="Shawn Campbell"/>
<director name="Gary Lennon"/>
</movie>

and the size of a file of 100 such entries....

ihe@ihe-ThinkPad-T410:~/film$ ls freebase/1.xml -l
-rw-r--r-- 1 ihe ihe 351067 Aug 29 06:53 freebase/1.xml

350k compared to 324k of JSON.

I then decided to see what would happen to the file sizes after they were compressed.

Here are the results.

ihe@ihe-ThinkPad-T410:~/film$ ls -l rawFreebase/*.zip
-rw-r--r-- 1 ihe ihe 83833 Sep 30 12:41 rawFreebase/1.xml.zip

This is the compacted XML
ihe@ihe-ThinkPad-T410:~/film$ ls -l freebase/*.zip
-rw-r--r-- 1 ihe ihe 69058 Sep 30 12:41 freebase/1.xml.zip

This is the compressed JSON
-rw-r--r-- 1 ihe ihe 61528 Sep 30 12:42 Downloads/films.json.zip

83K for the naive XML,
69k for the compacted XML and
61k for the JSON.

Hmmmmmmmmmmm!