xml-dev - Re: [xml-dev] Data streams

Re: [xml-dev] Data streams

[ Lists Home | Date Index | Thread Index ]

To: Rick Marshall <rjm@zenucom.com>
Subject: Re: [xml-dev] Data streams
From: Bob Foster <bob@objfac.com>
Date: Mon, 06 Dec 2004 16:52:58 -0800
Cc: "Stephen E. Beller" <sbeller@nhds.com>, xml-dev@lists.xml.org
In-reply-to: <41B4F55F.8050806@zenucom.com>
References: <00cd01c4dbbe$8c3ca3b0$6501a8c0@dell8100> <41B4F55F.8050806@zenucom.com>
User-agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.6) Gecko/20040113

I don't think so.c Did you look at the sample he posted?

 > Every one of the XML data elements had this tagging structure:
 > <Row>
 >    <Cell><Data ss:Type="Number">1</Data></Cell>
 > </Row>
 >
 > In contrast, the CSV had this structure: 1,

Since the "more information" in the XML, precious little that it adds, 
is identical for every data value, the XML format has approximately the 
same entropy as the CSV file. This looks more like a failure of the 
compression algorithm.

Bob Foster

Rick Marshall wrote:
> all you've done is shown that the entropy of the xml file is 
> significantly lower than the csv file. that would mean it carries 
> significantly more information and as others have pointed out, when 
> inspecting the xml, this is indeed the case.
> 
> put another way the correct interpretation of your experiment is that 
> the ratio of the compressed file sizes points to a significant 
> difference in information content. the csv file and the xml file aren't 
> the same stuff.
> 
> rick

Follow-Ups:
- Re: [xml-dev] Data streams
  - From: Elliotte Harold <elharo@metalab.unc.edu>
- Re: [xml-dev] Data streams
  - From: Rick Marshall <rjm@zenucom.com>

References:
- RE: [xml-dev] Data streams
  - From: "Stephen E. Beller" <sbeller@nhds.com>
- Re: [xml-dev] Data streams
  - From: Rick Marshall <rjm@zenucom.com>

Prev by Date: Re: [xml-dev] Data streams
Next by Date: Re: [xml-dev] Data streams
Previous by thread: Re: [xml-dev] Data streams
Next by thread: Re: [xml-dev] Data streams
Index(es):
- Date
- Thread