OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

 


 

   RE: standard compressed XML format?

[ Lists Home | Date Index | Thread Index ]
  • From: "John Evdemon" <john.evdemon@xmls.com>
  • To: "'XML-Dev Mailing list'" <xml-dev@xml.org>
  • Date: Thu, 16 Mar 2000 15:17:07 -0500

We have a free XML compression tool called XMLZip.  XMLZip can be
incorporated into the DOM, avoiding having to learn any product-specific
syntax.   XMLZip is written entirely in Java and has been tested on NT,
Linux (Red Hat 5.2 and higher) and Solaris (2.5.1 and higher).

XMLZip files are constructed based upon a level parameter. The level
parameter specifies how many nested entity levels down from the document
root to go before compressing subtrees. For example, a level 2 XMLZip file
has each child of the document root replaced with a reference to the
compressed subtree.

For example:

  <xml version="1.0"?>
  <root>
    <child id="1">
    </child>
    <child id="2">
      <grandchild id="1">
      </grandchild>
    </child>
  </root>

Is represented as follows (this example is a level 2 tree):

  Tree prefix:
  <xml version="1.0"?>
  <root>
    <xmlzip id="1"/>
    <xmlzip id="2"/>
  </root>

  Subtree fragment 1:
  <child id="1">
  </child>

  Subtree fragment 2:
  <child id="2">
    <grandchild id="1">
    </grandchild>
  </child>

An XMLZip file is a ZIP format file. The first entry in the file is the XML
document tree prefix. The tree prefix is the original XML file with selected
subtrees replaced with a single, empty tag. The subtrees that are removed
are all the sibling subtrees at a particular level, defined by the user
during the compression process. The first entry is denoted in the XMLZip
file as a ZIP entry with a name ending in ".0".

The last entry in the XMLZip file is the index. It lists all the compressed
XMLZip file entries and the ID attribute of the corresponding <xmlzip> tag.
The index entry in the XMLZip file has a name ending in ".i".

Both the tree prefix (the .0 entry) and the index (the .i entry) are
uncompressed for quick extraction.

The remaining entries in the XMLZip file are the compressed, or deflated,
subtree fragments. Each entry has a name that ends in a dot followed by a
number greater than 0.

PlanetIT published a review of XMLZip at:
http://www.planetit.com/techcenters/docs/internet_&_intranet/product/PIT1999
1028S0044

You can download a copy of XMLZip for free at:
http://www.xmls.com/products/xmlzip/xmlzipPricing.html

John Evdemon
Chief Architect
XML Solutions
http://www.xmls.com

-----Original Message-----
From: owner-xml-dev@xml.org [mailto:owner-xml-dev@xml.org]On Behalf Of
Simon St.Laurent
Sent: Thursday, March 16, 2000 12:34 PM
To: XML-Dev Mailing list
Subject: standard compressed XML format?


Is anyone doing any work on a standard compression format for XML documents?

I'm starting to get concerned about the volume of complaints I'm getting
from readers and folks in Web development forums who are starting to argue
that XML's verbosity is a problem, especially for things like transmitting
vector graphics information.  There are a lot of wasted bits in XML
documents - and of course in HTML and other text documents as well.

I'm not happy about the prospect of sending documents to browsers as .zip
or some other compressed format and making users go through multiple steps
to decompress and view the content.  I'd like to think that we could come
up with a compression/decompression algorithm for markup (maybe just XML,
maybe all text) that we can use transparently.  Ideally, it would be an
algorithm explicitly placed in the public domain, avoiding licensing and
legal battles.

Some folks have argued that this belongs in transfer protocols, while
others have argued that it should be a 3rd party function, like .zip and
.sit are today.  I'm not convinced by the first because so many competing
formats (gif, jpeg, flash, etc.) already include compression, and I'm not
convinced by the second because I don't think users are willing to
micromanage such a process.

It also has an impact on some of the discussions on the IETF-XML-MIME
discussion (see http://www.imc.org/ietf-xml-mime/ for archives and
information) because we're already discussing how best to mark information
as XML for possible generic processing.  If a compression standard emerged,
it might well have an impact on MIME types - and I'd like to see that
discussion start before we settle the MIME types for XML debate.

Any thoughts?  I like the fact that XML is verbose when I'm editing and
processing, but it's not so good in transmission.  I'd like to think that
there's a good _general_ solution that will let us have the best of both
worlds.

Simon St.Laurent
XML Elements of Style / XML: A Primer, 2nd Ed.
Building XML Applications
Inside XML DTDs: Scientific and Technical
Cookies / Sharing Bandwidth
http://www.simonstl.com

***************************************************************************
This is xml-dev, the mailing list for XML developers.
To unsubscribe, mailto:majordomo@xml.org&BODY=unsubscribe%20xml-dev
List archives are available at http://xml.org/archives/xml-dev/
***************************************************************************


***************************************************************************
This is xml-dev, the mailing list for XML developers.
To unsubscribe, mailto:majordomo@xml.org&BODY=unsubscribe%20xml-dev
List archives are available at http://xml.org/archives/xml-dev/
***************************************************************************




 

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS