XML.orgXML.org
FOCUS AREAS |XML-DEV |XML.org DAILY NEWSLINK |REGISTRY |RESOURCES |ABOUT
OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]
Re: [xml-dev] JSON - The Fat Free Alternative - Redux

I think the "fat" referred to by the phrase "fat-free alternative to XML" has very little to do with data size, it has much more to do with complexity: the fact that the JSON specification fits on one sheet of paper, whereas XML extends to thousands of pages if you include the whole stack.

Michael Kay
Saxonica
mike@saxonica.com
+44 (0) 118 946 5893




On 30 Sep 2014, at 19:25, Ihe Onwuka <ihe.onwuka@gmail.com> wrote:


So the story goes something like this.
I get into one of these JSON is better/slimmer/faster - oh no it isn't arguments and we are getting entrenched in our respective positions when I realise that I actually have data that can test the slimmer argument,  created not for the purpose of a benchmark but as a solution to a problem I had.
I am creating a movie data mashup and neet to integrate movie data from a JSON repository with some XML movie data. The mashup is an XSLT transformation so the JSON has to be converted. The problem I had was apres download and conversion the XML file was too big for the XSLT processor and I was getting heap space errors. 
Here is a snippet of JSON data for one movie.
{
  "result": [
    {
      "initial_release_date": "2006-11-30",
      "rottentomatoes_id": [],
      "key": [{
        "namespace": "/authority/imdb/title",
        "value": "tt0259822"
      }],
      "name": ".45",
      "type": "/film/film",
      "starring": [
        {
          "actor": [{
            "/common/topic/alias": [
              "Milla",
              "Milica Natasha Jovovich",
              "Milica Jovović",
              "Milla Yovovich",
              "Reigning Queen of Kick-Butt",
              "Milica Nataša Jovović"
            ],
            "name": "Milla Jovovich"
          }]
        },
        {
          "actor": [{
            "/common/topic/alias": [
              "Angus McFadyen",
              "Angus MacFadyen"
            ],
            "name": "Angus Macfadyen"
          }]
        },
        {
          "actor": [{
            "/common/topic/alias": [
              "Aisha N. Tyler"
            ]i
            "name": "Aisha Tyler"
          }]
        },
        {
          "actor": [{
            "/common/topic/alias": [
              "Stephen Dorff Jr.",
              "Brad Matlock"
            ],
            "name": "Stephen Dorff"
          }]
        },
        {
          "actor": [{
            "/common/topic/alias": [],
            "name": "Sarah Strange"
          }]
        },
        {
          "actor": [{
            "/common/topic/alias": [
              "Vincent LaResca",
              "Vinnie the kid"
            ],
            "name": "Vincent Laresca"
          }]
        },
        {
          "actor": [{
            "/common/topic/alias": [
              "Dawn Greenhall",
              "Hazel Dawn Greenhalgh"
            ],
            "name": "Dawn Greenhalgh"
          }]
        },
        {
          "actor": [{
            "/common/topic/alias": [
              "Nola Auguston"
            ],
            "name": "Nola Augustson"
          }]
        },
        {
          "actor": [{
            "/common/topic/alias": [
              "Katherine Mary Craven Hawtrey",
              "Kay Hartrey",
              "Kay Hawtry",
              "Katherine Hawtrey"
            ],
            "name": "Kay Hawtrey"
          }]
        },
        {
          "actor": [{
            "/common/topic/alias": [],
            "name": "Shawn Campbell"
          }]
        }
      ],
      "mid": "/m/0c2l1s",
      "directed_by": [{
        "/common/topic/alias": [],
        "name": "Gary Lennon"
      }]
    }

Following a naive JSON to XML convesion by yours truly I produced this.

    <result>
        <item>
            <key>
                <item>
                    <value>tt0259822</value>
                    <namespace>/authority/imdb/title</namespace>
                </item>
            </key>
            <type>/film/film</type>
            <name>.45</name>
            <starring>
                <item>
                    <actor>
                        <item>
                            <alias>
                                <item>Milla</item>
                                <item>Milica Natasha Jovovich</item>
                                <item>Milica Jovović</item>
                                <item>Milla Yovovich</item>
                                <item>Reigning Queen of Kick-Butt</item>
                                <item>Milica Nataša Jovović</item>
                            </alias>
                            <name>Milla Jovovich</name>
                        </item>
                    </actor>
                </item>
                <item>
                    <actor>
                        <item>
                            <alias>
                                <item>Angus McFadyen</item>
                                <item>Angus MacFadyen</item>
                            </alias>
                            <name>Angus Macfadyen</name>
                        </item>
                    </actor>
                </item>
                <item>
                    <actor>
                        <item>
                            <alias>
                                <item>Aisha N. Tyler</item>
                            </alias>
                            <name>Aisha Tyler</name>
                        </item>
                    </actor>
                </item>
                <item>
                    <actor>
                        <item>
                            <alias>
                                <item>Stephen Dorff Jr.</item>
                                <item>Brad Matlock</item>
                            </alias>
                            <name>Stephen Dorff</name>
                        </item>
                    </actor>
                </item>
                <item>
                    <actor>
                        <item>
                            <alias/>
                            <name>Sarah Strange</name>
                        </item>
                    </actor>
                </item>
                <item>
                    <actor>
                        <item>
                            <alias>
                                <item>Vincent LaResca</item>
                                <item>Vinnie the kid</item>
                            </alias>
                            <name>Vincent Laresca</name>
                        </item>
                    </actor>
                </item>
                <item>
                    <actor>
                        <item>
                            <alias>
                                <item>Dawn Greenhall</item>
                            </alias>
                            <name>Dawn Greenhalgh</name>
                        </item>
                    </actor>
                </item>
                <item>
                    <actor>
                        <item>
                            <alias>
                                <item>Nola Auguston</item>
                            </alias>
                            <name>Nola Augustson</name>
                        </item>
                    </actor>
                </item>
                <item>
                    <actor>
                        <item>
                            <alias>
                                <item>Katherine Mary Craven Hawtrey</item>
                                <item>Kay Hartrey</item>
                                <item>Kay Hawtry</item>
                                <item>Katherine Hawtrey</item>
                            </alias>
                            <name>Kay Hawtrey</name>
                        </item>
                    </actor>
                </item>
                <item>
                    <actor>
                        <item>
                            <alias/>
                            <name>Shawn Campbell</name>
                        </item>
                    </actor>
                </item>
            </starring>
            <directed_by>
                <item>
                    <alias/>
                    <name>Gary Lennon</name>
                </item>
            </directed_by>
            <initial_release_date>2006-11-30</initial_release_date>
            <alias/>
            <mid>/m/0c2l1s</mid>
            <rottentomatoes_id/>
        </item>

There were a hundred movies per file and the  JSON data came in at 325k, by the time it had been converted to the XML above it had ballooned to 1.16MB.

My aim was to compact the XML sufficiently to allow a single transformation to accept the contents of about 1800 or so such files. So here is the data pre the compacting transformation

ihe@ihe-ThinkPad-T410:~/film$ ls rawFreebase/1.xml -l
-rw-r--r-- 1 ihe ihe 1160691 Aug 29 06:46 rawFreebase/1.xml

after the compacting, which was supposed to be  lossless the data looked like this

   <movie imdb="tt0259822" name=".45" mid="/m/0c2l1s" date="2006-11-30">
      <actor name="Milla Jovovich">
         <alias>Milla</alias>
         <alias>Milica Natasha Jovovich</alias>
         <alias>Milica Jovović</alias>
         <alias>Milla Yovovich</alias>
         <alias>Reigning Queen of Kick-Butt</alias>
         <alias>Milica Nataša Jovović</alias>
      </actor>
      <actor name="Angus Macfadyen">
         <alias>Angus McFadyen</alias>
         <alias>Angus MacFadyen</alias>
      </actor>
      <actor name="Aisha Tyler">
         <alias>Aisha N. Tyler</alias>
      </actor>
      <actor name="Stephen Dorff">
         <alias>Stephen Dorff Jr.</alias>
         <alias>Brad Matlock</alias>
      </actor>
      <actor name="Sarah Strange"/>
      <actor name="Vincent Laresca">
         <alias>Vincent LaResca</alias>
         <alias>Vinnie the kid</alias>
      </actor>
      <actor name="Dawn Greenhalgh">
         <alias>Dawn Greenhall</alias>
      </actor>
      <actor name="Nola Augustson">
         <alias>Nola Auguston</alias>
      </actor>
      <actor name="Kay Hawtrey">
         <alias>Katherine Mary Craven Hawtrey</alias>
         <alias>Kay Hartrey</alias>
         <alias>Kay Hawtry</alias>
         <alias>Katherine Hawtrey</alias>
      </actor>
      <actor name="Shawn Campbell"/>
      <director name="Gary Lennon"/>
   </movie>

and the size of a  file of 100 such entries....

ihe@ihe-ThinkPad-T410:~/film$ ls freebase/1.xml -l
-rw-r--r-- 1 ihe ihe 351067 Aug 29 06:53 freebase/1.xml

350k compared to 324k of JSON.

I then decided to see what would happen to the file sizes after they were compressed.

Here are the results.

ihe@ihe-ThinkPad-T410:~/film$ ls -l rawFreebase/*.zip
-rw-r--r-- 1 ihe ihe 83833 Sep 30 12:41 rawFreebase/1.xml.zip

This is the compacted XML
ihe@ihe-ThinkPad-T410:~/film$ ls -l freebase/*.zip
-rw-r--r-- 1 ihe ihe 69058 Sep 30 12:41 freebase/1.xml.zip

This is the compressed  JSON
-rw-r--r-- 1 ihe ihe   61528 Sep 30 12:42 Downloads/films.json.zip

83K for the naive XML, 
69k for the compacted XML and 
61k for the JSON.

Hmmmmmmmmmmm!






[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 1993-2007 XML.org. This site is hosted by OASIS