OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: Request: Techniques for reducing the size of XML instances

On July 26, 2001 12:07 PM Roger L. Costello wrote:
> Does anyone have a summary of techniques for reducing
> the size of XML instances (as would be required in limited 
> bandwidth applications)?
I have tried to include a summary of some of the approaches I am aware of
(much of this stuff comes from a presentation I gave last year):

Several people mentioned gzip/jar so I won't repeat it.

HTTP supports compression via Accept-Encoding and Content-Encoding (although
I'm not sure how many people actually use it).  There are three approaches
available - gzip, compress and deflate   ("zlib" format).  

Alain Trottier introduced an approach that he calls "zxml".  Zxml simply
eliminates duplicate/redundant tags and adds a header and template area to
lists tag names and values.  This is really only useful for simple
structures since it requires a separator character to delineate values (e.g.
what if the separator is a valid element value?).  See
5JViLUq6XGetnKFfr for the original article.

Rick Jelliffe developed the Short Tagged XML (STAX) approach based on
ISO8879 short-tag minimization.   Compression is accomplished by eliminating
redundant element names.  Element names can be omitted from start tags if
there are no attributes (since the name is preserved in the first occurrence
of the element).  There are sample implementations available here:

XMill is a free XML-based compression tool developed by the University of PA
and AT&T Labs.   XMill groups data items according to their elements (XPath
can be used define grouping).  Each group is then compressed separately via
gzip.  The original paper describing XMill discovered that Converting
proprietary data formats into XML for use with Xmill improves the rate of
compression for non-XML data formats (e.g. the compressed XML file is
smaller than the compressed non-XML file).  XMill is available at:

XMLZip is a free tool that reduces the size of XML files while retaining the
accessibility of the DOM API, allowing applications to access data in its
compressed form.  (Note: I have an unsupported enhancement to Xerces that
embeds XMLZip support within the parser itself - email me directly if you
would like more information.)  You can get a free copy of XMLZip at the
following URL:

ASN.1 defines Basic Encoding Rules (BER) and Packed Encoding Rules (PER)
that translate into compressed bytes 'over the wire'.  XML Encoding Rules
(XER) provide guidelines for mapping BER to XML.  More on XER:

WAP Binary XML (WBXML) defines a compact binary representation of XML for
transmission of WML content
(requires a WML Encoder/Decoder).  WBXML preserves element structure but the
encoding process removes 
DOCTYPEs, Comments, INCLUDE/IGNORE sections.
There is a W3C Note available at:

Millau defines an encoding format that extends WBXML.  Millau preserves the
XML structure and compresses the data separately (rather than in-line like
WBXML).  A paper explaining Millau is available at:

Hope this helps!

John Evdemon
CTO - XML/Director of Engineering