xml-dev - Re: [xml-dev] XML / HTML Transport size

Re: [xml-dev] XML / HTML Transport size

[ Lists Home | Date Index | Thread Index ]

To: James Clark <jjc@jclark.com>
Subject: Re: [xml-dev] XML / HTML Transport size
From: Benjamin Franz <snowhare@nihongo.org>
Date: Mon, 18 Nov 2002 07:27:55 -0800 (PST)
Cc: John Cowan <jcowan@reutershealth.com>, "Gerben Rampaart (Casnet Mechelen)" <Gerben@Casnet.net>, <xml-dev@lists.xml.org>
In-reply-to: <139597501.1037649224@[192.168.0.222]>

On Mon, 18 Nov 2002, James Clark wrote:

> 
> > A variety
> > of small-scale studies have shown that general-purpose compression is
> > generally as good as, or better than, some scheme that knows it's
> > compressing XML.
> 
> Hmmm.  What about
> 
>   http://www.research.att.com/sw/tools/xmill/

Nice try - but no cigar.

Using a real file from my office (I chose a nice big one at random to
test):

-rw-rw----    1 snowhare snowhare  7267803 Nov 18 07:15 recipes.xml

xmill returned:

-rw-rw-r--    1 snowhare snowhare  1993131 Nov 18 07:15 recipes.xmi

but a simple bzip2 returned:

-rw-rw----    1 snowhare snowhare  1922919 Nov 18 07:15 recipes.xml.bz2

So, the _optimized_ XML compressor did more poorly than a default general
purpose compressor by a few percentage points (at least for this data).  
Their description sounds remarkably similiar to a block sort compressor -
which is _precisely_ what bzip2 is (minus the 'patent rights' language
from AT&T ;) ). Which is probably why the end sizes are fairly close for
both.

Its _DAMNED HARD_ to improve on a modern general purpose compression
program for textual data.

-- 
Jerry

I should either have been less specific or more correct ...

                ---Andy Armstrong <andy@wonderworks.co.uk>

Follow-Ups:
- Re: [xml-dev] XML / HTML Transport size
  - From: "Karl Waclawek" <karl@waclawek.net>
- Re: [xml-dev] XML / HTML Transport size
  - From: Robin Berjon <robin.berjon@expway.fr>

References:
- Re: [xml-dev] XML / HTML Transport size
  - From: James Clark <jjc@jclark.com>

Prev by Date: RE: [xml-dev] RDF for unstructured databases, RDF for axiomatic
Next by Date: Re: [xml-dev] XML / HTML Transport size
Previous by thread: Re: [xml-dev] XML / HTML Transport size
Next by thread: Re: [xml-dev] XML / HTML Transport size
Index(es):
- Date
- Thread