xml-dev - RE: RFC: Attributes and XML-RPC

RE: RFC: Attributes and XML-RPC

[ Lists Home | Date Index | Thread Index ]

From: Andrew Layman <andrewl@microsoft.com>
To: xml-dev@ic.ac.uk
Date: Wed, 22 Sep 1999 11:19:04 -0700

These results are consistent with tests that I have run against actual XML files generated from databases. After compression, there is little difference between different syntactic families.

-----Original Message-----
From: Mark Nutter [mailto:mnutter@fore.com]
Sent: Wednesday, September 22, 1999 10:26 AM
To: xml-dev@ic.ac.uk
Subject: RE: RFC: Attributes and XML-RPC

At 12:16 PM 09/22/99 -0400, Hunter, David wrote:

So even if you
compress the files, the attribute version will be able to compress to 50%
smaller than the other file. Again, 2KB isn't a lot, but if we're talking
megabytes in size, 50% is a lot.

I wrote a quick perl script to take /usr/dict/words and turn it into an XML file, with some artificially generated "attributes". In the resulting file named attrib.xml, each <word> tag contains the additional information as attributes. I did the same thing to produce a file called child.xml, except that the additional information is presented as a child element instead of as an attribute. Here are the results:

$ ./make.pl $ ls -l total 13004 -rw-rw-r--   1 mnutter mnutter   5811852 Sep 22 13:16 attrib.xml -rw-rw-r--   1 mnutter mnutter   7445892 Sep 22 13:16 child.xml -rwxr-xr-x   1 mnutter mnutter       976 Sep 22 13:16 make.pl $ gzip attrib.xml $ gzip child.xml $ ls -l total 1127 -rw-rw-r--   1 mnutter mnutter    671039 Sep 22 13:16 attrib.xml.gz -rw-rw-r--   1 mnutter mnutter    472394 Sep 22 13:16 child.xml.gz -rwxr-xr-x   1 mnutter mnutter       976 Sep 22 13:16 make.plI used gzip as an example of off-the-shelf compression technology. As you can see, even though the raw child.xml file is larger, the compressed version is *smaller* than the corresponding implementation with attributes.

This may not be true in all cases, of course, but I expect it often will, due to the way such compression algorithms work.

For your reference, here is the Perl script I used to create the two files:

open WORDS, "</usr/dict/words" or die "Couldn't open dictionary.\n";
open ATTRIB, ">attrib.xml" or die "Couldn't open attrib.xml\n";
open CHILD, ">child.xml" or die "Couldn't open child.xml\n";

@twenty_strings = qw(one two three four five six seven eight nine ten
                     eleven twelve thirteen fourteen fifteen sixteen
                     seventeen eighteen nineteen twenty);

print ATTRIB "<attrib>\n";
print CHILD "<child>\n";

while($word = <WORDS>)
{
    $time = time();
    $timestr = localtime($time);
    $twenty = rand % 20;
    $twentystr = $twenty_strings[$twenty];
    print ATTRIB <<EOM;
<word time="$time" timestr="$timestr" twenty="$twenty"
        twentystr="$twentystr">$word</word>
EOM
    print CHILD <<EOM;
<word>
    <time>$time</time>
    <timestr>$timestr</timestr>
    <twenty>$twenty</twenty>
    <twentystr>$twentystr</twentystr>
</word>
EOM
}

print ATTRIB "</attrib>\n";
print CHILD "</child>\n";

close CHILD;
close ATTRIB;
close WORDS;

-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-

Mark Nutter, <mnutter@fore.com>

Internet Applications Developer

FORE Systems

Some people are atheists 'til the day they die.

Follow-Ups:
- Re: RFC: Attributes and XML-RPC
  - From: "Steven Livingstone" <ceo@citix.com>

Prev by Date: RE: RFC: Attributes and XML-RPC
Next by Date: Re: groves dissent (was RE: RFC: Attributes and XML-RPC)
Previous by thread: RE: RFC: Attributes and XML-RPC
Next by thread: Re: RFC: Attributes and XML-RPC
Index(es):
- Date
- Thread