OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help



   RE: [xml-dev] Data streams

[ Lists Home | Date Index | Thread Index ]

David said:

>Any dbms writer can tell you adding indexes, which take
>up more space can dramatically increase speed.
>My final point is that 20 minutes to process a few hundred
>megabytes is not slow....
>I remember when it used to take about 6 days to process 5MB
>of CSVs on a old 486 with dbase......
>So with a bit of perspective, xml and what we have now,
>even with 20 minutes... that's really flying along...

Frank said:

>If you really care, goto single letter tag names, and even for single digit
>data (how real is that?) you have '<d>1</d>' vs '1,' or 8 to 2, not 50 to 1.


Considering that I started this thread, let me ask if we are all on 
the same page.

In keeping with the direction of the tread, I created a xml file that 
consisted of one million data entries of:


The size was correspondingly 171.6 Megs, and this is consistent with 
17 characters which includes a CR for each entry.

This file compressed down to 4.7 Megs in a zip file, which I believe 
uses a Huffman compression technique. As such Frank, I would not 
expect to see much difference between using <d> vs <data> tags -- 
after all, the most redundant strings are reduced to the smallest 
replacement characters. My test show less than 1 percent difference 
in sizes between the two.

It's interesting to note that the time for me to compress my file was 
3 minutes and 14 seconds, not 20 minutes. The time to decompress this 
file was under 6 seconds.

David, was the processing time you mentioned, the time it took for 
you to compress your file?

As for your comment about dbms indexing, you are correct David, I 
just wrote a set of b-tree routines which uses splaying and the times 
to access a 10 million node db were on the order of 4 million random 
searches per second.

Please forgive me, and I do not want to start a "which is best" 
controversy, but what kind of machines are you people using? I'm 
using a two year old Macintosh running OS-X.




News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS