RE: [xml-dev] Is it time for the binary XML permathread to start up agai

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]

RE: [xml-dev] Is it time for the binary XML permathread to start up again?

From: Alessandro Triglia <sandro@mclink.it>
To: xml-dev@lists.xml.org, noah_mendelsohn@us.ibm.com
Date: Fri, 20 Jul 2007 20:25:10 +0200

 

> -----Original Message-----
> From: noah_mendelsohn@us.ibm.com [mailto:xml-dev@lists.xml.org] 
> Sent: Friday, July 20, 2007 12:44
> To: Alexander Philippou
> Cc: 'Costello, Roger L.'; xml-dev@lists.xml.org
> Subject: RE: [xml-dev] Is it time for the binary XML 
> permathread to start up again?
> 
> Alexander Philippou writes:
> 
> > And since the processing penalty of compression is 
> proportional to doc
> size,
> 
> Yes, typically, at least to a first approximation (actually, some 
> compression algorithms do a bit better on large documents, to 
> the extent 
> that the overhead of building dictionaries of commonly used 
> terms gets 
> done toward the beginning, and leveraged throughout).
> 
> > using FI instead of text makes sense even when doing http+gzip.
> 
> To the extent FI itself compresses, that's surprising.  I'm not 
> disagreeing that gzip might run faster on the FI form than on 
> the larger 
> text form;  I'm surprised that size(gzip(FI)) << size(FI).  
> You wouldn't 
> expect compression systems like gzip to do well on things 
> that are already 
> tightly coded.  


Fast Infoset doesn't try to be extremely tightly coded.  We tried to find a good balance between ease of implementation, encoding/decoding speed, and compactness.  So there is still room for gzip to remove some of the residual redundancy.

Alessandro Triglia


> On the contrary, many compression algorithms 
> will actually 
> somewhat expand things that are already compressed using 
> other algorithms. 
>  Basically, compression algorithms take a gamble that they 
> can recognize 
> some form(s) of redundancy and get them out.  If the input 
> doesn't have 
> redundancy in such forms, then you tend to wind up at best 
> restating the 
> input, plus a bit of overhead for the compression framework itself. 
> 
> If gzip is going to make the FI form larger, or not much 
> smaller, then 
> it's a bad use of time to run it, even if the time to gzip the FI is 
> indeed much lower than the time to gzip the original text.
> 
> --------------------------------------
> Noah Mendelsohn 
> IBM Corporation
> One Rogers Street
> Cambridge, MA 02142
> 1-617-693-4036
> --------------------------------------
>

Follow-Ups:
- RE: [xml-dev] Is it time for the binary XML permathread to start up again?
  - From: noah_mendelsohn@us.ibm.com

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]