[
Lists Home |
Date Index |
Thread Index
]
On Friday 07 February 2003 09:24, Robin Berjon wrote:
> > I've always found that compressed binary is smaller than compressed text,
> > as Tahir found. That makes sense logically too; both the binary and text
> > formats have the same CDATA in but the binary format has more compact
> > representations of the elements and so on.
>
> That's not necessarily the case, it very much depends on the binarisation
> process. It is not necessary that both have the same CDATA, especially if
> said CDATA is information available from a schema.
Yep, but in the latter case the benefit above will still occur, plus the
extra magic of not obfuscating patterns and skewed distributions in the
underlying data so the compressor can get to work on it!
> > Of course, one could design binary formats which compress badly, but I've
> > never found that they do by default.
>
> I certainly hope that future improvements on our binary format will in fact
> make it compress badly :) That should happen by making it more compact than
> it currently is (while keeping similar speed, which is why compression is
> not always an option).
Nooo! It's not the compression *ratio* that matters here. It's the eventual
size.
If a binary encoding of 10k gzips to 9k, saving 10%, that's better than a
textual encoding of the same data at 20k gzipping to 15k, saving 25%!
> It's true however that binary infosets do tend to compress further. In yet
> another benchmark I read yesterday, the smallest results were bin-xml+gz
> and bin-xml+bz2 (well, excluding the same ones with SVG quantize codecs,
> lossy compression of XML documents still scares me ;).
That's not compressing XML any more... it's compressing a higher level data
model I think! :-)
ABS
--
A city is like a large, complex, rabbit
- ARP
|