OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

 


 

   Re: Free Tool for Efficient XML Data Compression

[ Lists Home | Date Index | Thread Index ]
  • From: "Thomas B. Passin" <tpassin@idsonline.com>
  • To: <xml-dev@ic.ac.uk>
  • Date: Sun, 19 Dec 1999 13:58:07 -0500


Philip Boutros
>
>
> Thomas B. Passin wrote:
>
> > I also took a 2-column Word97 document - 63.5K -
> > and opened it with Abiword (www.abisource.com)
> > then saved it.  Abiword uses an XML file format as
> > its native format.  Abiword XML file size: 48.8K.
>
> While I would love to chime in on the whole compressed XML discussion (my
> guess is word dictionaries and skip lists should slaughter gzip in terms
of
> compression size) I would like to address this statement in particular.
>
> 1.
> Comparing Word97's file size to that of the current version of Abiword is
a
> ludicrous exercise. I am very familiar with the Microsoft Word file format
> (no, I don't work for Microsoft and never have) and while it contains a
> number of inefficiencies and chunks of legacy garbage, given how much it
> encodes it is reasonably efficient for large documents. I can name at
least
> a hundred features (styles, page layout, frames, borders, backgrounds,
> graphics, fields, properties, etc.) that Word97 must deal with in its file
> format that Abiword's format does not address. In fact, given how little
> Abiword encodes, I was surprised that Abiword wasn't 10 times as
efficient.
> See #2 for a tirade about that.
>
<snip/>
Well, of course I know Word documents include a ton of stuff that, say,
Abiword files don't.  And this particular file doesn't even have any VBA
macros of my own in it. And I'm not arguing for Abiword's file format,
either.  In this case, though, my document doesn't need the rest of that
stuff Word includes.  So doing this conversion gave me some rough way to
compare sizes when the two documents were typical of the type I often use.
For an XML document, you could have argued that some other format doesn't
need end tags so of course XML would be bigger.  That's not the point.  The
point is that - I think it will turn out this way - for many actual cases,
the supposed size disadvantage of an XML document  will be relatively small
or non-existent.

Of course, the XML standard was developed under the guideline that
"terseness ... is of minimal importance", so if file size alone is going to
be the driver, XML might not be a favorable candidate.

Tom Passin



xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
To unsubscribe, mailto:majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)






 

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS