[
Lists Home |
Date Index |
Thread Index
]
Alaric B. Snell wrote:
>>* Competition with compression: xqML in it's current format is as
>> structured as XML so it too compresses well. In an experiment[1] a
>> 12 kB HTML document zipped to 2 kB. The (handwritten) BE for the
>> same document took 3 kB and when zipped, it took less than 1000
>> bytes.
>>
>>
>
>Mmmm, it bugs me when people compare gzipped XML with $binary_format. They
>should compare XML with $binary_format and gzipped XML with gzipepd
>$binary_format. gzipped $binary_format will, in general, be the smallest of
>them all, and yet faster to read/write than gzipped XML.
>
It doesn't necessarily (or even generally) work that way - compact
binary formats don't generally compress down as well as text, so you end
up with size(text) > size(binary) > size(compressed-binary) >
size(compressed-text). That seemed to be the case with my XMLS format
(http://www.sosnoski.com/opensrc/xmls - still on hold, though I hope to
get back to it soon). One of the oddities of how compression works...
David Mertz has done some research in this area - see his article at
http://www-106.ibm.com/developerworks/library/x-matters13.html Also see
James Cheney's paper "Compressing XML with Mulitplexed Hierarchical PPM
Models" at http://www.cs.cornell.edu/People/jcheney/xmlppm/paper/paper.html
Try a range of documents and see how the compression works out before
making any claims. For compression of XML text bzip2 looks like the best
choice from what I've seen, so that should probably be the basis for
comparison.
- Dennis
|