Lists Home |
Date Index |
In consideration of Elliotte's reply, I went back and looked at the XML file
Excel generated. Here's what I found ...
Every one of the XML data elements had this tagging structure:
In contrast, the CSV had this structure: 1,
That's a 50 characters to 1 difference for each data element.
I doubt that all those XML tags are necessary if you're rendering the data
in something other than a spreadsheet. But if you are planning to use a
spreadsheet, then the 50 to 1 ratio is valid, it seems to me.
Does anyone know what a reasonable tagging equivalent might be if you're,
say, distributing a data array in XML for SVG rendering? It might be fewer
than 50, but it will still be a lot more than 1, especially if you have data
In addition, the XML doc had about 50 lines of additional tags at the
beginning and end of the file, which was Microsoft Office metadata not in
the CSV. While some are certainly necessary for a valid XML doc, I'm sure
some are superfluous. But even if you subtracted all those lines from the
total characters, it had almost no affect on the size comparisons when
you're dealing with a large data array.
So, this benchmark test still points to a huge difference in file size and
in unzipping and parsing time when you compare a large data array in CSV
compared to XML.
From: Elliotte Harold [mailto:firstname.lastname@example.org]
Sent: Monday, December 06, 2004 2:43 PM
To: Stephen E. Beller
Subject: Re: [xml-dev] Data streams
Stephen E. Beller wrote:
> I tried Steven's experiment from a different angle. I filled an Excel XP
> spreadsheet with a single-digit number, saved it in both XML and in a
> comma-delimited text file (CSV). I then compressed both with WinZip and
> opened both with Excel. Here's what I found:
That sounds like a bad test. The XML file contains a lot more
information than the CSV file. Specifically it contains a lot of
Microsoft Office metadata about things like the name of the person who
created the file that are not in the CSV file. There is information in
the XML file that is not present in the CSV file.
Elliotte Rusty Harold email@example.com
XML in a Nutshell 3rd Edition Just Published!