OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: Binary XML Progress

> If the biggest performance difference happens at the low end of
> the document size scale, it may not be that useful for http document
transport.  Any
> documents that would benifit would probably fit into a single TCP segment
> making the added compression for small docs not as important.

Yep, I would have to agree. I cannot see XMLSoC being used to improve speed
for "regular" web content such as XHTML documents. After all, it's
proprietary (at least from my understanding that's the type of license
Syntion has in mind) and gzip, which is even in the HTTP 1.1 RFC AFAIK works
almost as well for these document types. On the other hand, for protocols
like SOAP that often work over HTTP, where messages are typically only a
couple of bytes and may occur in large numbers, it may yet make sense,
especially because of the smaller cost of parsing aka decompressing it.

> As an aside, almost all dictionary-based compression scheme perform poorly
> on small files. It just takes time for them to build up any useful
> information in thier dictionary.  It is possible to preload dictionaries,
> something I've done for the compression of database pages in a static
> database intended for CD-ROM.

Yup, I know :) I've looked in quite some of this stuff and from application
examples I collected, I saw that there are quite a big number of small XML
documents. I identified this as a market where a format specific compression
scheme such as XMLSoC may be helpful.

> Do you have any furthur information that you could provide on your method.
> I would certainly like to see what you have in mind and potentially give
> feedback.

Sure, the method is based on a patented technology called Syntax-oriented
Compression (SoC). In a nutshell, the idea behind the original SoC is that
both encoder and decoder know the context-free grammar (CFG) of the
document. The encoder only encodes decisions that exist within the
boundaries of the grammar. Textual parts (attribute values and character
data) are currently encoded using regular entropy coders, mostly dictionary
based... One key benefit of XMLSoC is that it integrates very well with
existing applications (the prototype uses the SAX2 API) and is quite
adaptable, for different requirements, different components, i.e. different
coders for strings etc, can be used.

Stefan Zier
Software Developer
Syntion AG - http://www.syntion.com
Leonrodplatz 2 - 80636 Munich/Germany
Phone +49 89 52 30 45-0
Fax +49 89 52 30 45-20