[
Lists Home |
Date Index |
Thread Index
]
Elliotte Rusty Harold wrote:
> I expect any plausible binary compression scheme to be lossless with
> respect to the infoset, not the PSVI mind you but the I. I don't
> expect to lose any significant data just because:
>
> 1. The data is invalid
> 2. I happen to use a different schema for decoding than you used for
> encoding
>
> If the binary compression fails these tests, I cry shenanigans on you.
> :-)
For an example of encoding XML documents without loss of data you can
see my old XMLS project at
http://www.sosnoski.com/opensrc/xmls/index.html This is designed for
serialization/deserialization speed rather than maximum compression.
Even so, it reduced sizes by about 40% overall for the set of documents
I used in testing. It also ran several times faster than text for going
to and from dom4j and JDOM document models. I didn't actually compare
parsing speed directly (this was originally intended as an alternative
to Java serialization for moving document models over the wire, not as a
general-purpose XML transport), but I'd suspect it's at least twice as
fast as any parser. In answer to your earlier email about actual
results, the page at http://www.sosnoski.com/opensrc/xmls/results.html
gives full benchmark information.
I've thought about extending this to full Infoset compatibility, and
while I'm at it there are still a few optimizations I can make for
faster handling of character data content. Don't know when/if I'll ever
get back to it as things sit right now, but if anyone is interested let
me know.
- Dennis
|