xml-dev - RE: [xml-dev] Data streams

RE: [xml-dev] Data streams

[ Lists Home | Date Index | Thread Index ]

To: 'Peter Hunsberger' <peter.hunsberger@gmail.com>
Subject: RE: [xml-dev] Data streams
From: "Stephen E. Beller" <sbeller@nhds.com>
Date: Mon, 06 Dec 2004 18:17:25 -0500
Cc: xml-dev@lists.xml.org
Importance: Normal
In-reply-to: <cc159a4a04120614233f7bcc6@mail.gmail.com>
Organization: NHDS, Inc.

As I said initially, larger data elements do change the ratios. To go to the
opposite extreme, large blocks of text can actually be handled MORE
efficiently with XML than CSV.

On the other hand, the larger the attributes and other tag "labels," the
greater the ratio, and visa versa.

So, all I'm saying is that there are times when XML make more sense than
CSV, and certain situations make CSV superior. No one solution is right for
all circumstances. 

By choosing the method that fits most sensibly with the data will help
alleviate some of the XML backlash. A good seems to be that, everything else
being equal, (a) the longer the tags or the shorter the data elements, the
less sense it makes to transport the data via XML and (b) the shorter the
tags or the longer the data elements, the more sense it makes to transport
the data via XML. Anyone disagree?

Steve

-----Original Message-----
From: Peter Hunsberger [mailto:peter.hunsberger@gmail.com] 
Sent: Monday, December 06, 2004 5:24 PM
To: Stephen E. Beller
Cc: xml-dev@lists.xml.org
Subject: Re: [xml-dev] Data streams

On Mon, 06 Dec 2004 16:35:48 -0500, Stephen E. Beller <sbeller@nhds.com>
wrote:
> In consideration of Elliotte's reply, I went back and looked at the XML
file
> Excel generated. Here's what I found ...
> 
> Every one of the XML data elements had this tagging structure:
> <Row>
>    <Cell><Data ss:Type="Number">1</Data></Cell>
> </Row>
> 
> In contrast, the CSV had this structure: 1,
> 
> That's a 50 characters to 1 difference for each data element.
> 
> I doubt that all those XML tags are necessary if you're rendering the data
> in something other than a spreadsheet. But if you are planning to use a
> spreadsheet, then the 50 to 1 ratio is valid, it seems to me.

Use the number 10, now the difference is 51 to 2 or a ratio of ~26 to
1.  Use the number 100 and the ratio is 52 to 3 or ~17 to 1.  Six
digits? 56 to 6 or ~10 to 1. Now add multiple columns of data (as any
realistic example would do) and the ratio falls even farther.

<snip/>
> 
> So, this benchmark test still points to a huge difference in file size and
> in unzipping and parsing time when you compare a large data array in CSV
> compared to XML.

Maybe, maybe not, the bench mark needs to be more realistic before you
draw any conclusions about "huge".

-- 
Peter Hunsberger

Follow-Ups:
- Re: [xml-dev] Data streams
  - From: "Nathan Young" <natyoung@cisco.com>
- RE: [xml-dev] Data streams
  - From: "Michael Kay" <mike@saxonica.com>

References:
- Re: [xml-dev] Data streams
  - From: Peter Hunsberger <peter.hunsberger@gmail.com>

Prev by Date: Re: [xml-dev] Data streams
Next by Date: RE: [xml-dev] Data streams
Previous by thread: Re: [xml-dev] Data streams
Next by thread: RE: [xml-dev] Data streams
Index(es):
- Date
- Thread