[
Lists Home |
Date Index |
Thread Index
]
- From: Andrew Layman <andrewl@microsoft.com>
- To: xml-dev@ic.ac.uk
- Date: Wed, 22 Sep 1999 11:19:04 -0700
These results are consistent with tests that I have run
against actual XML files generated from databases. After compression, there is little difference
between different syntactic families.
At 12:16 PM 09/22/99 -0400, Hunter, David
wrote:
So even if you compress the files, the
attribute version will be able to compress to 50% smaller than the other
file. Again, 2KB isn't a lot, but if we're talking megabytes in
size, 50% is a lot. I wrote a quick perl script to take
/usr/dict/words and turn it into an XML file, with some artificially generated
"attributes". In the resulting file named attrib.xml, each <word>
tag contains the additional information as attributes. I did the same
thing to produce a file called child.xml, except that the additional
information is presented as a child element instead of as an attribute.
Here are the results:
$ ./make.pl $ ls -l total
13004 -rw-rw-r-- 1 mnutter mnutter 5811852
Sep 22 13:16 attrib.xml -rw-rw-r-- 1 mnutter
mnutter 7445892 Sep 22 13:16 child.xml -rwxr-xr-x 1
mnutter mnutter 976 Sep 22 13:16
make.pl $ gzip attrib.xml $ gzip child.xml $ ls -l total
1127 -rw-rw-r-- 1 mnutter mnutter
671039 Sep 22 13:16 attrib.xml.gz -rw-rw-r-- 1 mnutter
mnutter 472394 Sep 22 13:16
child.xml.gz -rwxr-xr-x 1 mnutter
mnutter 976 Sep 22 13:16
make.pl
I used gzip as an example of off-the-shelf compression
technology. As you can see, even though the raw child.xml file is
larger, the compressed version is *smaller* than the corresponding
implementation with attributes.
This may not be true in all cases, of
course, but I expect it often will, due to the way such compression algorithms
work.
For your reference, here is the Perl script I used to create the
two files:
open WORDS, "</usr/dict/words" or die "Couldn't open
dictionary.\n"; open ATTRIB, ">attrib.xml" or die "Couldn't open
attrib.xml\n"; open CHILD, ">child.xml" or die "Couldn't open
child.xml\n";
@twenty_strings = qw(one two three four five six seven
eight nine
ten
eleven twelve thirteen fourteen fifteen
sixteen
seventeen eighteen nineteen twenty);
print ATTRIB
"<attrib>\n"; print CHILD "<child>\n";
while($word =
<WORDS>) { $time =
time(); $timestr =
localtime($time); $twenty = rand %
20; $twentystr =
$twenty_strings[$twenty]; print ATTRIB
<<EOM; <word time="$time" timestr="$timestr"
twenty="$twenty"
twentystr="$twentystr">$word</word> EOM
print CHILD <<EOM; <word>
<time>$time</time>
<timestr>$timestr</timestr>
<twenty>$twenty</twenty>
<twentystr>$twentystr</twentystr>
</word> EOM }
print ATTRIB "</attrib>\n"; print
CHILD "</child>\n";
close CHILD; close ATTRIB; close
WORDS;
-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-
Mark Nutter, <mnutter@fore.com>
Internet Applications Developer
FORE Systems
Some people are atheists 'til the day they
die.
|