OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

 


 

   RE: [xml-dev] Data streams

[ Lists Home | Date Index | Thread Index ]

In consideration of Elliotte's reply, I went back and looked at the XML file
Excel generated. Here's what I found ...

Every one of the XML data elements had this tagging structure:
<Row>
    <Cell><Data ss:Type="Number">1</Data></Cell>
</Row>

In contrast, the CSV had this structure: 1,

That's a 50 characters to 1 difference for each data element.

I doubt that all those XML tags are necessary if you're rendering the data
in something other than a spreadsheet. But if you are planning to use a
spreadsheet, then the 50 to 1 ratio is valid, it seems to me. 

Does anyone know what a reasonable tagging equivalent might be if you're,
say, distributing a data array in XML for SVG rendering? It might be fewer
than 50, but it will still be a lot more than 1, especially if you have data
type attributes.

In addition, the XML doc had about 50 lines of additional tags at the
beginning and end of the file, which was Microsoft Office metadata not in
the CSV. While some are certainly necessary for a valid XML doc, I'm sure
some are superfluous. But even if you subtracted all those lines from the
total characters, it had almost no affect on the size comparisons when
you're dealing with a large data array.

So, this benchmark test still points to a huge difference in file size and
in unzipping and parsing time when you compare a large data array in CSV
compared to XML.

Steve


-----Original Message-----
From: Elliotte Harold [mailto:elharo@metalab.unc.edu] 
Sent: Monday, December 06, 2004 2:43 PM
To: Stephen E. Beller
Cc: xml-dev@lists.xml.org
Subject: Re: [xml-dev] Data streams

Stephen E. Beller wrote:

> I tried Steven's experiment from a different angle. I filled an Excel XP
> spreadsheet with a single-digit number, saved it in both XML and in a
> comma-delimited text file (CSV). I then compressed both with WinZip and
then
> opened both with Excel. Here's what I found:

That sounds like a bad test. The XML file contains a lot more 
information than the CSV file. Specifically it contains a lot of 
Microsoft Office metadata about things like the name of the person who 
created the file that are not in the CSV file. There is information in 
the XML file that is not present in the CSV file.

-- 
Elliotte Rusty Harold  elharo@metalab.unc.edu
XML in a Nutshell 3rd Edition Just Published!
http://www.cafeconleche.org/books/xian3/
http://www.amazon.com/exec/obidos/ISBN=0596007647/cafeaulaitA/ref=nosim






 

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS