OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help



   Re: [xml-dev] Parsing efficiency? - why not 'compile'????

[ Lists Home | Date Index | Thread Index ]

Elliotte Rusty Harold wrote:

> I expect any plausible binary compression scheme to be lossless with 
> respect to the infoset, not the PSVI mind you but the I. I don't 
> expect to lose any significant data just because:
> 1. The data is invalid
> 2. I happen to use a different schema for decoding than you used for 
> encoding
> If the binary compression fails these tests, I cry shenanigans on you. 
> :-) 

For an example of encoding XML documents without loss of data you can 
see my old XMLS project at 
http://www.sosnoski.com/opensrc/xmls/index.html This is designed for 
serialization/deserialization speed rather than maximum compression. 
Even so, it reduced sizes by about 40% overall for the set of documents 
I used in testing. It also ran several times faster than text for going 
to and from dom4j and JDOM document models. I didn't actually compare 
parsing speed directly (this was originally intended as an alternative 
to Java serialization for moving document models over the wire, not as a 
general-purpose XML transport), but I'd suspect it's at least twice as 
fast as any parser. In answer to your earlier email about actual 
results, the page at http://www.sosnoski.com/opensrc/xmls/results.html 
gives full benchmark information.

I've thought about extending this to full Infoset compatibility, and 
while I'm at it there are still a few optimizations I can make for 
faster handling of character data content. Don't know when/if I'll ever 
get back to it as things sit right now, but if anyone is interested let 
me know.

  - Dennis


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS