[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]
RE: [xml-dev] Is it time for the binary XML permathread to start up again?
- From: Alessandro Triglia <sandro@mclink.it>
- To: xml-dev@lists.xml.org, noah_mendelsohn@us.ibm.com
- Date: Fri, 20 Jul 2007 20:25:10 +0200
> -----Original Message-----
> From: noah_mendelsohn@us.ibm.com [mailto:xml-dev@lists.xml.org]
> Sent: Friday, July 20, 2007 12:44
> To: Alexander Philippou
> Cc: 'Costello, Roger L.'; xml-dev@lists.xml.org
> Subject: RE: [xml-dev] Is it time for the binary XML
> permathread to start up again?
>
> Alexander Philippou writes:
>
> > And since the processing penalty of compression is
> proportional to doc
> size,
>
> Yes, typically, at least to a first approximation (actually, some
> compression algorithms do a bit better on large documents, to
> the extent
> that the overhead of building dictionaries of commonly used
> terms gets
> done toward the beginning, and leveraged throughout).
>
> > using FI instead of text makes sense even when doing http+gzip.
>
> To the extent FI itself compresses, that's surprising. I'm not
> disagreeing that gzip might run faster on the FI form than on
> the larger
> text form; I'm surprised that size(gzip(FI)) << size(FI).
> You wouldn't
> expect compression systems like gzip to do well on things
> that are already
> tightly coded.
Fast Infoset doesn't try to be extremely tightly coded. We tried to find a good balance between ease of implementation, encoding/decoding speed, and compactness. So there is still room for gzip to remove some of the residual redundancy.
Alessandro Triglia
> On the contrary, many compression algorithms
> will actually
> somewhat expand things that are already compressed using
> other algorithms.
> Basically, compression algorithms take a gamble that they
> can recognize
> some form(s) of redundancy and get them out. If the input
> doesn't have
> redundancy in such forms, then you tend to wind up at best
> restating the
> input, plus a bit of overhead for the compression framework itself.
>
> If gzip is going to make the FI form larger, or not much
> smaller, then
> it's a bad use of time to run it, even if the time to gzip the FI is
> indeed much lower than the time to gzip the original text.
>
> --------------------------------------
> Noah Mendelsohn
> IBM Corporation
> One Rogers Street
> Cambridge, MA 02142
> 1-617-693-4036
> --------------------------------------
>
[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]