xml-dev - Re: [xml-dev] Data streams

Re: [xml-dev] Data streams

[ Lists Home | Date Index | Thread Index ]

To: "Stephen E. Beller" <sbeller@nhds.com>, <xml-dev@lists.xml.org>
Subject: Re: [xml-dev] Data streams
From: "Bill Kearney" <wkearney@syndic8.com>
Date: Mon, 6 Dec 2004 15:56:11 -0500
Organization: http://www.ideaspace.net/users/wkearney/foaf.xrdf
References: <00cd01c4dbbe$8c3ca3b0$6501a8c0@dell8100>


This also speaks to the somewhat verbose form of XML that Office might be
producing.

It's certainly no surprise to anyone that the data was larger and compressed
differently in XML than CSV.  Especially not with the example you proposed.

I think your conclusion about CSV effectiveness is short-sighted.  While CSV
can certainly be "bit stingy" it often comes at the considerable cost of
being brittle.  Without effective metadata those numbers just become
gibberish.  While it's fair to say an XML file may be larger it does so in a
remarkably self-documenting way.  Where's the balance to be struck?  In
lightweight CSV that's fraught with processing perils?  Or in methodically
documented XML that simply takes a few cycles longer?  CPU and Disk is
cheap, programming time and budget to work around crappy, brittle data
isn't.

It might be a more interesting experiment to discuss using more
purpose-built XML schemas.  Doing a better job of describing the data in
with XML without being so verbose.  While Office may not offer it at this
point that doesn't preclude others from doing a better job of it.

-Bill Kearney
Syndic8.com

----- Original Message ----- 
From: "Stephen E. Beller" <sbeller@nhds.com>


> I tried Steven's experiment from a different angle. I filled an Excel XP
> spreadsheet with a single-digit number, saved it in both XML and in a
> comma-delimited text file (CSV). I then compressed both with WinZip and
then
> opened both with Excel. Here's what I found:
>
> The XML file was 840MB, the CSV 34MB -- a 2,500% difference
> Compressed, the XML file was 2.5MB, the CSV 0.00015MB (150KB) -- a 1,670%
> difference.
>
> Equally dramatic is the time it took to uncompress and render the files as
> an Excel spreadsheet: It took about 20 minutes with the XML file; the CSV
> took 1 minute -- a 2,000% difference.
>
> My conclusion is that delimited text files handle large arrays of data
more
> efficiently. This stems, in part, from the fact that a comma delimiter (or
> some other single character) carries much less overhead than tags; CSV
> requires only a comma, while XML requires a minimum of 5 characters
(<></>)
> -- that's makes CSV a minimum of 500% more efficient ... and when you add
> the semantic labels and attributes to the tags, and the size of XML
> increases dramatically.
>
> Note, however, that when dealing with large blocks of text instead of
> numbers (or small text strings), the difference between XML and delimited
> text files is considerably less.
>
> Of course, XML offers benefits that a plain data array in a CSV file does
> not, such as attribute definitions and hierarchical associations between
the
> data (if that's necessary) ... even though there are ways comma-delimited
> data can be used to perform the same functions of XML when rendering
> serialized data arrays as charts.

Follow-Ups:
- RE: [xml-dev] Data streams
  - From: "Stephen E. Beller" <sbeller@nhds.com>

References:
- RE: [xml-dev] Data streams
  - From: "Stephen E. Beller" <sbeller@nhds.com>

Prev by Date: Re: XML Pipeline
Next by Date: RE: [xml-dev] Data streams
Previous by thread: RE: [xml-dev] Data streams
Next by thread: RE: [xml-dev] Data streams
Index(es):
- Date
- Thread