[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
STAX Re: Abbreviated Tag Names
- From: Rick Jelliffe <ricko@allette.com.au>
- To: xml-dev@lists.xml.org
- Date: Tue, 23 Jan 2001 01:34:59 +0800
If anyone is interested, there is a very simple compression possible called
STAX.
It could be built into XML parsers trivially, or be a separate layer to XML.
It converts
<?xml version="1.0"?>
<x>
<y>aaaa</y>
<y>aaaa</y>
</x>
to
<?stax?>
<?xml version="1.0"?>
<x>
<y>aaaa</>
<>aaaa</>
</x>
and does not need a stack or reserve big header space (it could have one,
e.g. a fixed size stack of the deepest 16 elements would be nice).
It would be best with documents with long names/data, repeated elements, and
fairly blunt nesting. Obviously it doesn't give great compression except for
those documents, and even then it cannot compare with binary. Except there
are three other considerations: first, it does not compress to binary but
keeps the document as text (recoverable, readable, MIME email does not have
to bin64 encode), second, the code is trivial to implement on even a very
lightweight system (e.g., rolled into the parser,it is just an extra
transition or two); third one can use text processing tools (e.g. perl) to
perform the uncompression without going into a binary mode.
Obviously there are lots of other extensions possible, but I wanted to keep
SGML compliant (STAX is still SGML, caveat emptor) and avoid headers (to
keep streaming and lightweight.)
Fairly old source code for a compressor based on this (STAX
format=ShortTAgged Xml) is at
http://www.ascc.net/~ricko/src/short-tag-compress.c
http://www.ascc.net/~ricko/src/short-tag-uncompress.c
I think it would be good to have (something like) this kind of ultra-low-end
compression available (i.e. as a matter of compression negotiation), because
I think many servers are two busy to compress data going out (STAX can be
generated by the XML-generating API, and read directly into a SAX stream).
I think it would be useful to have several different compression methods
widely deployed to suit different situations-- STAX fitting into the extreme
low-end.
If anyone is interested in taking this further, I think it would be good.
And it is probably the kind of small infrastructure upgrades that could be
fun and doable for open-source and collaborative development.
Cheers
Rick Jelliffe