[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]
RE: [xml-dev] Is it time for the binary XML permathread to start up again?
- From: noah_mendelsohn@us.ibm.com
- To: "Alexander Philippou" <alex@noemax.com>
- Date: Fri, 20 Jul 2007 12:44:02 -0400
Alexander Philippou writes:
> And since the processing penalty of compression is proportional to doc
size,
Yes, typically, at least to a first approximation (actually, some
compression algorithms do a bit better on large documents, to the extent
that the overhead of building dictionaries of commonly used terms gets
done toward the beginning, and leveraged throughout).
> using FI instead of text makes sense even when doing http+gzip.
To the extent FI itself compresses, that's surprising. I'm not
disagreeing that gzip might run faster on the FI form than on the larger
text form; I'm surprised that size(gzip(FI)) << size(FI). You wouldn't
expect compression systems like gzip to do well on things that are already
tightly coded. On the contrary, many compression algorithms will actually
somewhat expand things that are already compressed using other algorithms.
Basically, compression algorithms take a gamble that they can recognize
some form(s) of redundancy and get them out. If the input doesn't have
redundancy in such forms, then you tend to wind up at best restating the
input, plus a bit of overhead for the compression framework itself.
If gzip is going to make the FI form larger, or not much smaller, then
it's a bad use of time to run it, even if the time to gzip the FI is
indeed much lower than the time to gzip the original text.
--------------------------------------
Noah Mendelsohn
IBM Corporation
One Rogers Street
Cambridge, MA 02142
1-617-693-4036
--------------------------------------
[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]