[
Lists Home |
Date Index |
Thread Index
]
- From: Mark Nutter <mnutter@fore.com>
- To: <xml-dev@ic.ac.uk>
- Date: Wed, 22 Sep 1999 13:25:36 -0400
At 12:16 PM 09/22/99 -0400, Hunter, David wrote:
So even if you
compress the files, the attribute version will be able to compress to
50%
smaller than the other file. Again, 2KB isn't a lot, but if we're
talking
megabytes in size, 50% is a lot.
I wrote a quick perl script to take /usr/dict/words and turn it into an
XML file, with some artificially generated "attributes".
In the resulting file named attrib.xml, each <word> tag contains
the additional information as attributes. I did the same thing to
produce a file called child.xml, except that the additional information
is presented as a child element instead of as an attribute. Here
are the results:
$ ./make.pl
$ ls -l
total 13004
-rw-rw-r-- 1 mnutter mnutter 5811852 Sep 22
13:16 attrib.xml
-rw-rw-r-- 1 mnutter mnutter 7445892 Sep 22
13:16 child.xml
-rwxr-xr-x 1 mnutter
mnutter 976 Sep 22 13:16
make.pl
$ gzip attrib.xml
$ gzip child.xml
$ ls -l
total 1127
-rw-rw-r-- 1 mnutter mnutter 671039
Sep 22 13:16 attrib.xml.gz
-rw-rw-r-- 1 mnutter mnutter 472394
Sep 22 13:16 child.xml.gz
-rwxr-xr-x 1 mnutter
mnutter 976 Sep 22 13:16
make.pl
I used gzip as an example of off-the-shelf compression
technology. As you can see, even though the raw child.xml file is
larger, the compressed version is *smaller* than the corresponding
implementation with attributes.
This may not be true in all cases, of course, but I expect it often will,
due to the way such compression algorithms work.
For your reference, here is the Perl script I used to create the two
files:
open WORDS, "</usr/dict/words" or die "Couldn't open
dictionary.\n";
open ATTRIB, ">attrib.xml" or die "Couldn't open
attrib.xml\n";
open CHILD, ">child.xml" or die "Couldn't open
child.xml\n";
@twenty_strings = qw(one two three four five six seven eight nine
ten
eleven twelve thirteen fourteen fifteen sixteen
seventeen eighteen nineteen twenty);
print ATTRIB "<attrib>\n";
print CHILD "<child>\n";
while($word = <WORDS>)
{
$time = time();
$timestr = localtime($time);
$twenty = rand % 20;
$twentystr = $twenty_strings[$twenty];
print ATTRIB <<EOM;
<word time="$time" timestr="$timestr"
twenty="$twenty"
twentystr="$twentystr">$word</word>
EOM
print CHILD <<EOM;
<word>
<time>$time</time>
<timestr>$timestr</timestr>
<twenty>$twenty</twenty>
<twentystr>$twentystr</twentystr>
</word>
EOM
}
print ATTRIB "</attrib>\n";
print CHILD "</child>\n";
close CHILD;
close ATTRIB;
close WORDS;
-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-
Mark Nutter, <mnutter@fore.com>
Internet Applications Developer
FORE Systems
Some people are atheists 'til the day they die.
|