OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help



   Re: [xml-dev] Xqueeze: Compact XML Alternative

[ Lists Home | Date Index | Thread Index ]

Alaric B. Snell wrote:

>>* Competition with compression: xqML in it's current format is as
>>  structured as XML so it too compresses well. In an experiment[1] a
>>  12 kB HTML document zipped to 2 kB. The (handwritten) BE for the
>>  same document took 3 kB and when zipped, it took less than 1000
>>  bytes.
>Mmmm, it bugs me when people compare gzipped XML with $binary_format. They 
>should compare XML with $binary_format and gzipped XML with gzipepd 
>$binary_format. gzipped $binary_format will, in general, be the smallest of 
>them all, and yet faster to read/write than gzipped XML.
It doesn't necessarily (or even generally) work that way - compact 
binary formats don't generally compress down as well as text, so you end 
up with size(text) > size(binary) > size(compressed-binary) > 
size(compressed-text). That seemed to be the case with my XMLS format 
(http://www.sosnoski.com/opensrc/xmls - still on hold, though I hope to 
get back to it soon). One of the oddities of how compression works... 
David Mertz has done some research in this area - see his article at 
http://www-106.ibm.com/developerworks/library/x-matters13.html Also see 
James Cheney's paper "Compressing XML with Mulitplexed Hierarchical PPM 
Models" at http://www.cs.cornell.edu/People/jcheney/xmlppm/paper/paper.html

Try a range of documents and see how the compression works out before 
making any claims. For compression of XML text bzip2 looks like the best 
choice from what I've seen, so that should probably be the basis for 

  - Dennis


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS