xml-dev - Re: [xml-dev] Binary XML (AKA "spawn of the devil")

Re: [xml-dev] Binary XML (AKA "spawn of the devil")

[ Lists Home | Date Index | Thread Index ]

To: Rick Jelliffe <ricko@allette.com.au>
Subject: Re: [xml-dev] Binary XML (AKA "spawn of the devil")
From: Dennis Sosnoski <dms@sosnoski.com>
Date: Wed, 17 Sep 2003 23:51:55 -0700
Cc: xml-dev@lists.xml.org
In-reply-to: <3F692EF2.4040001@allette.com.au>
References: <3F689FE0.5000602@sosnoski.com> <10811402305.20030917224143@systemwire.com> <3F692EF2.4040001@allette.com.au>
User-agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.3) Gecko/20030312

I was actually pretty surprised to see how little difference there was 
between Piccolo and Xerces using the Sun JVM. This used to be much 
worse, based on my past tests (including the ones in the JavaWorld 
article you linked - thanks for the plug, Rick!).

I don't see any reason to expect that multiple concurrent parsers and 
pooling would make a difference. The tests reuse a single parser (and a 
single instance of the XBIS reader), and data is read into memory ahead 
of time so there's no I/O involved in the timings. That seems to 
eliminate any possible gains from running multiple copies simultaneously 
(and incidentally, it also actually weakens the XBIS advantage of 
reduced document size for these tests, as opposed to real-world 
applications where I/O is a factor).

As for the settings issue that Christian raised, I haven't looked into 
whether any parser settings would improve performance. The documents are 
all deliberately kept pretty simple, though (no external entities or 
such), and with validation off there doesn't seem to be a lot else to 
tune. Everything is in the download from the Sourceforge site except the 
restricted-license xml.xml file (modified from the W3C original to 
eliminate an external DTD, hence not redistributable), so it's pretty 
easy to try it out for yourself if you'd like to experiment.

  - Dennis

Rick Jelliffe wrote:

>Good to see Dennis using Piccolo as well as Xerces. The tests confirm 
>that Xerces-J
>still one of the worst-performing XML parsers*  (though, of course,
>things change). 
>
>It would be useful to also have a test of scalability: what the rates 
>are for multiple
>concurrent parsers, with pooling.  Or do you think the results should scale?
>
>Cheers
>Rick Jelliffe
>
>* http://piccolo.sourceforge.net/bench.html
>http://www.extreme.indiana.edu/~aslom/exxp/
>http://archive.devx.com/xml/articles/sf020101/sf0201-3.asp
>http://www.javaworld.com/javaworld/jw-04-2002/jw-0426-xmljava3-p2.html
>http://forum.java.sun.com/thread.jsp?forum=34&thread=338038&message=1386505
>
>
>Christian Nentwich wrote:
>
>  
>
>>this is really cool, finally some hard data. One thing that would make
>>your presentation even more convincing is if you dumped the property
>>and feature values set in Xerces at the time of running into your
>>experiment information.
>>(http://xml.apache.org/xerces2-j/properties.html)
>>
>>This would help to convince the reader that you didn't overlook
>>anything, e.g. leave some kind of whitespace normalisation turned on,
>>didn't leave any schema validation turned on, etc.
>>
>>Christian
>> 
>>    
>>

References:
- Binary XML (AKA "spawn of the devil")
  - From: Dennis Sosnoski <dms@sosnoski.com>
- Re: [xml-dev] Binary XML (AKA "spawn of the devil")
  - From: Christian Nentwich <christian@systemwire.com>
- Re: [xml-dev] Binary XML (AKA "spawn of the devil")
  - From: Rick Jelliffe <ricko@allette.com.au>

Prev by Date: Re: [xml-dev] xml namespace prefix resolution
Next by Date: RE: [xml-dev] Create DB Structure based on Schema/DTD set
Previous by thread: Re: [xml-dev] Binary XML (AKA "spawn of the devil")
Next by thread: RE: [xml-dev] Create DB Structure based on Schema/DTD set
Index(es):
- Date
- Thread