OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

 


 

   Re: [xml-dev] [Shannon: information ~ uncertainty] Ramificationsto XML d

[ Lists Home | Date Index | Thread Index ]

Rick Marshall wrote:

> actually modern compressors don't know about the representation and 
> don't much mind. they actually work on the entropy of the message and 
> the message as a bit stream - ie they don't know there are tags, ascii 
> data, binary, data, schema etc. there's not room to go into it here but 
> they will compress a message fairly consistently based on the entropy of 
> the message, not the representation. different algorithms are marginally 
> better than others (bzip2 vs gzip eg), but seem to give proportionally 
> similar results.


There have been a number of compressors designed specifically for XML, 
though, that take advantage of the knowledge of the structure of XML 
documents to gain some benefits in compression compared to generic 
algorithms AT&T Labs' XMill is one example:

http://www.research.att.com/sw/tools/xmill/

Personally, though, I doubt even the best of these justify the added 
cost and complexity compared to a generic algorithm like deflate.

-- 
Elliotte Rusty Harold  elharo@metalab.unc.edu
XML in a Nutshell 3rd Edition Just Published!
http://www.cafeconleche.org/books/xian3/
http://www.amazon.com/exec/obidos/ISBN%3D0596007647/cafeaulaitA/ref%3Dnosim




 

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS