Lists Home |
Date Index |
On Wednesday 27 March 2002 12:35, you wrote:
> Sorry, I was using "gzip" in a vague way to represent modern compression
> libraries. My assertion -- which is just my sense of
> the previous discussions, not a competent professional opinion --
> was that it's unlikely that a "binary XML" scheme would compress
> data significantly better than an off the shelf text compression algorithm.
They're orthogonal... binary storage of XML is about using less bits for the
structure and for storing integers. Compressing a file is something that can
be done to both textual and binary XML files! Compression is about efficient
storage of redundant strings of bits, it's not about 'text' at all.
Executable code compresses quite well, in fact.
Pedants will be eager to point out that in executable files, the executable
code is often referred to as 'text' anyway. Yes, I know! Shut up!
Compressing textual-XML will not get as good a compression percentage as
compressing binary-XML since there is more redundancy in the textual XML -
and since gzip doesn't know that the element name in a closing element tag is
redundant it will faithfully record (for each and every one) that it should
contain that string, even if it refers to the string by a sliding window
reference. The difference between gzipped TXML and gzipped BXML will not be
large if the underlying XML is mainly text anyway like XHTML or Docbook, but
it will make more of a difference if the underlying XML is actually something
like XSLT or XSD or SOAP that's mainly elements and attributes and numbers.
So comparing gzipped TXML to plain BXML isn't particularly fair! Not least of
which because the gzipping involves quite vast CPU and memory costs which the
BXML does not. The BXML parser involves less CPU/memory cost than a TXML
Compare plain TXML with plain BXML. Compare gzipped TXML with gzipped BXML.
Please stop comparing gzipped TXML with plain BXML, everyone!!!!
Grrr... I'll write a simple binary XML transcoder this weekend and run some
tests, both with and without gzipping the result, OK?
Alaric B. Snell
http://www.alaric-snell.com/ http://RFC.net/ http://www.warhead.org.uk/
Any sufficiently advanced technology can be emulated in software