[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: almost four years ago....
- From: Elliotte Rusty Harold <elharo@metalab.unc.edu>
- To: The Deviants <xml-dev@lists.xml.org>
- Date: Tue, 19 Jun 2001 08:45:54 -0400
At 4:09 PM +0100 6/16/01, Alaric Snell wrote:
>This is easy to do. GZIP is massively crippled by having no information about
>the structure of the file - it's just a string of bytes that it has to make
>some assumptions about the probable structure of with regards to frequency
>distributions that won't even apply very well to XML; it's trivial to write
>something that compresses better, especially if you use gzip for
>what it's best
>at (the CDATA) and handle the <> bits yourself.
>
I've heard that one before too. In practice, it isn't nearly as easy
as people think it is. After a great deal of effort, you may be be
able to shrink 1% or 2% more on some files. However, most people who
try this end up producing something that is noticeably larger than
gzip.
Of course you could use a better general purpose compression
algorithm. bzip can grab you 5% or so a lot of the time, though it
isn't as widely supported. Frankly, if you can't provide at least a
10% improvement then it's not worth my time to worry about.
Better than 10% smaller, I don't think you can do without a lossy
algorithm. You simply run into the limits of information theory.
>> 3. Human legible/human editable data doesn't matter.
>
>Indeed, we must never use image files, filesystems, or gzip - they'll never
>take off :-)
>
This is a canard. Nobody uses XML for this stuff anyway.
>> All three beliefs have been empirically proven false time and time
>> again.
>
>Chuckle!
>
Hey, don't let me stop you from trying! I could be wrong, in which
case we can all benefit from your efforts. But I think that if you're
really smart and try really hard and devote months of your life to
this problem, you aren't even going to get a 10% improvement over
gzip. (You might not get any improvement at all.) And even if you do
get that 10% improvement, I suspect you'll discover you're system is
so inconvenient compared to plain or gzipped XML that nobody will use
it. But after all, it's your life. If you've got the time to spend on
this, feel free to try. I'm just afraid you'll get the same results
as the last two dozen people who tried this.
--
+-----------------------+------------------------+-------------------+
| Elliotte Rusty Harold | elharo@metalab.unc.edu | Writer/Programmer |
+-----------------------+------------------------+-------------------+
| The XML Bible (IDG Books, 1999) |
| http://metalab.unc.edu/xml/books/bible/ |
| http://www.amazon.com/exec/obidos/ISBN=0764532367/cafeaulaitA/ |
+----------------------------------+---------------------------------+
| Read Cafe au Lait for Java News: http://metalab.unc.edu/javafaq/ |
| Read Cafe con Leche for XML News: http://metalab.unc.edu/xml/ |
+----------------------------------+---------------------------------+