OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

 


 

   Re: [xml-dev] XML / HTML Transport size

[ Lists Home | Date Index | Thread Index ]

On Mon, 18 Nov 2002, James Clark wrote:

> 
> > A variety
> > of small-scale studies have shown that general-purpose compression is
> > generally as good as, or better than, some scheme that knows it's
> > compressing XML.
> 
> Hmmm.  What about
> 
>   http://www.research.att.com/sw/tools/xmill/

Nice try - but no cigar.

Using a real file from my office (I chose a nice big one at random to
test):

-rw-rw----    1 snowhare snowhare  7267803 Nov 18 07:15 recipes.xml

xmill returned:

-rw-rw-r--    1 snowhare snowhare  1993131 Nov 18 07:15 recipes.xmi

but a simple bzip2 returned:

-rw-rw----    1 snowhare snowhare  1922919 Nov 18 07:15 recipes.xml.bz2

So, the _optimized_ XML compressor did more poorly than a default general
purpose compressor by a few percentage points (at least for this data).  
Their description sounds remarkably similiar to a block sort compressor -
which is _precisely_ what bzip2 is (minus the 'patent rights' language
from AT&T ;) ). Which is probably why the end sizes are fairly close for
both.

Its _DAMNED HARD_ to improve on a modern general purpose compression
program for textual data.

-- 
Jerry

I should either have been less specific or more correct ...

                ---Andy Armstrong <andy@wonderworks.co.uk>






 

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS