OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

 


 

   RE: [xml-dev] Data streams

[ Lists Home | Date Index | Thread Index ]


I think delta data files (sending diffs) make sense in certain situations,
as in data updates, which can be done in XML and CSV. Of course, there would
be more processing time involved both in data file construction and parsing.


But since a 150KB CSV file able to store near 17 million data elements, it
may not be necessary for that extra resource consumption in most situations.

Steve   

-----Original Message-----
From: david.lyon@computergrid.net [mailto:david.lyon@computergrid.net] 
Sent: Monday, December 06, 2004 7:22 PM
To: Stephen E. Beller
Subject: RE: [xml-dev] Data streams


Hi Stephen,
> << Put another way, the compressed xml file was 2.5MB and the
> CSV file was 34MB. >>
>
> This is incorrect. The compressed CSV was 150 KB.
>

It is correct. I am comparing a compressed xml file to an
uncompressed csv file. Maybe in those days it could all
fit on a floppy disk and didn't need to be zipped.

And only 20 minutes to process.... so fast.. it used
to take me 6+1/2 days to process csv files in dbase
way back when.

But for the comparison of apples and oranges in todays
world, to make it really fair, we should introduce xml
diffing which is the transmission of the changes from
one version to another.

When that is introduced, and compared with "just send a
new copy of the file", the csv technology starts looking
like a steam engine.

The classic context is the old pricelist. Sent over and
over in CSV till it grows up to 40,000 items.

but if the diffs are sent, the transmission only takes
a second or two.

It's not exactly a new technology, but how much faster
is that than sending the whole file.

Maybe that is not what you are doing, but maybe looking
into this stuff is something that might be worth the
time.

I agree, 20 minutes to process an xml file in todays world
is just a tad on the slow side ! (but it's happened to me)

David


> And XML took 5 minutes simply to uncompress (unzip) and another 10 minutes
> to parse. The CSV did both in about 1 minute.
>
> <<Most business apps need to hold multiple sets of arrays
> and thus the need for something like xml.>>
>
> A CSV can hold many different arrays in a single file.
>
> Steve
>
>
> -----Original Message-----
> From: david.lyon@computergrid.net [mailto:david.lyon@computergrid.net]
> Sent: Monday, December 06, 2004 6:19 PM
> To: Stephen E. Beller
> Cc: xml-dev@lists.xml.org
> Subject: RE: [xml-dev] Data streams
>
>
> All,
>
> Mind if I pull apart this report for some further analysis?
>
> Quoting "Stephen E. Beller" <sbeller@nhds.com>:
>
> > I tried Steven's experiment from a different angle. I filled an Excel XP
> > spreadsheet with a single-digit number, saved it in both XML and in a
> > comma-delimited text file (CSV). I then compressed both with WinZip and
> then
> > opened both with Excel. Here's what I found:
> >
> > The XML file was 840MB, the CSV 34MB -- a 2,500% difference
> > Compressed, the XML file was 2.5MB, the CSV 0.00015MB (150KB) -- a
1,670%
> > difference.
>
> True. XML files are usually bigger.
>
> > Equally dramatic is the time it took to uncompress and render the files
as
> > an Excel spreadsheet: It took about 20 minutes with the XML file; the
CSV
> > took 1 minute -- a 2,000% difference.
>
> True. The old parts of Excel are written in assembly language
> by true masters. They are efficient. The CSV era was at the
> same time as the assembly language coding.
>
> The new XML parts are written by programmers of the bloatware
> era. They are not optimised to the same degree.
>
> They are probably written in high level languages and I would
> guess have never been "profiled". That's an old word... maybe
> it's something that is never done with xml... wouldn't be surprised.
>
> In perspective, Excel isn't a tool (imho) that a user would
> use to deal with xml data in a commercial environment. As
> rendering tags is absolutely no use to a business user. They
> want the product data printed like a pricelist,or a purchase
> order printed like a purchase order. xml tags are alienspeek
> or geekspeek at best.
>
> But some people do optimise and profile their XML. A "real"
> xml trading app I would bet would fare better than excel.
>
> > My conclusion is that delimited text files handle large
> > arrays of data more efficiently.
>
> Maybe, but providing only a single array is used.
>
> Most business apps need to hold multiple sets of arrays
> and thus the need for something like xml.
>
> Finally...
>
> > The XML file was 840MB, the CSV 34MB -- a 2,500% difference
> > Compressed, the XML file was 2.5MB, the CSV 0.00015MB (150KB) -- a
1,670%
> > difference.
>
> Put another way, the compressed xml file was 2.5MB and the
> CSV file was 34MB.
>
> Therefore, sending compressed XML data is more efficient
> than using CSV and requires less resources to transmit
> and send.
>
> David
>
>
>
>
>




----------------------------------------------------------------
This message was sent using IMP, the Internet Messaging Program.






 

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS