xml-dev - Re: [xml-dev] Parsing efficiency?

Re: [xml-dev] Parsing efficiency? - why not 'compile'????

[ Lists Home | Date Index | Thread Index ]

To: Elliotte Rusty Harold <elharo@metalab.unc.edu>
Subject: Re: [xml-dev] Parsing efficiency? - why not 'compile'????
From: Dennis Sosnoski <dms@sosnoski.com>
Date: Thu, 27 Feb 2003 11:52:55 -0800
Cc: xml-dev@lists.xml.org
In-reply-to: <p0433010aba83dd0e8813@[192.168.254.4]>
References: <OF047F14BF.0A3C2919-ONCA256CD8.00031C17@facs.gov.au> <20030227073457.2C45F5542@calm.warhead.org.uk> <p04330106ba83be615336@[192.168.254.4]> <E18oPCa-0001s1-00@calvin.frontwire.com> <p0433010aba83dd0e8813@[192.168.254.4]>
User-agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.2.1) Gecko/20021130

Elliotte Rusty Harold wrote:

> I expect any plausible binary compression scheme to be lossless with 
> respect to the infoset, not the PSVI mind you but the I. I don't 
> expect to lose any significant data just because:
>
> 1. The data is invalid
> 2. I happen to use a different schema for decoding than you used for 
> encoding
>
> If the binary compression fails these tests, I cry shenanigans on you. 
> :-) 

For an example of encoding XML documents without loss of data you can 
see my old XMLS project at 
http://www.sosnoski.com/opensrc/xmls/index.html This is designed for 
serialization/deserialization speed rather than maximum compression. 
Even so, it reduced sizes by about 40% overall for the set of documents 
I used in testing. It also ran several times faster than text for going 
to and from dom4j and JDOM document models. I didn't actually compare 
parsing speed directly (this was originally intended as an alternative 
to Java serialization for moving document models over the wire, not as a 
general-purpose XML transport), but I'd suspect it's at least twice as 
fast as any parser. In answer to your earlier email about actual 
results, the page at http://www.sosnoski.com/opensrc/xmls/results.html 
gives full benchmark information.

I've thought about extending this to full Infoset compatibility, and 
while I'm at it there are still a few optimizations I can make for 
faster handling of character data content. Don't know when/if I'll ever 
get back to it as things sit right now, but if anyone is interested let 
me know.

  - Dennis

References:
- Parsing efficiency? - why not 'compile'????
  - From: Matthew.Bennett@facs.gov.au
- Re: [xml-dev] Parsing efficiency? - why not 'compile'????
  - From: Alaric Snell <alaric@alaric-snell.com>
- Re: [xml-dev] Parsing efficiency? - why not 'compile'????
  - From: Elliotte Rusty Harold <elharo@metalab.unc.edu>
- Re: [xml-dev] Parsing efficiency? - why not 'compile'????
  - From: "Alaric B. Snell" <alaric@alaric-snell.com>
- Re: [xml-dev] Parsing efficiency? - why not 'compile'????
  - From: Elliotte Rusty Harold <elharo@metalab.unc.edu>

Prev by Date: RE: [xml-dev] Registered Namespace prefixes
Next by Date: Re: [xml-dev] Registered Namespace prefixes
Previous by thread: Re: [xml-dev] Parsing efficiency? - why not 'compile'????
Next by thread: Re: [xml-dev] Parsing efficiency? - why not 'compile'????
Index(es):
- Date
- Thread