OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help



   Re: [xml-dev] Fast text output from SAX?

[ Lists Home | Date Index | Thread Index ]

Thomas B. Passin wrote:

>Dennis Sosnoski wrote:
>>The problem, which I've expressed more than once, is to compare the 
>>performance for the alternatives of using text XML vs. some post-parse 
>>representation of XML documents. For the reasons given in my earlier 
>>email I'm chosing to base my timing comparisons on the parse event 
>But presumably the alternative "quasi-xml" you will be testing will not 
>likely be producing SAX events, but instead some proprietary parse 
>system instead.  Do you mean that you want to write a proprietary - to - 
>textual xml file and compare that with writing a SAX-to-textual xml file?
The focus of what I'm doing is looking at general XML document 
interchange performance. The XBIS results I posted prior to the W3C 
workshop last fall (http://xbis.sourceforge.net/performance.html) are 
from a similar set of tests, which I'm now extending in a couple of ways.

As I see it the most useful comparisons to be made for general XML 
document interchange performance are (1) how much time is required to 
convert an incoming document to a form usable by the application, (2) 
how much time is required to convert from the internal form used by the 
application to a form that's serialized for transmission, and (3) what's 
the size of the serialized form. For the specific tests I'm running now 
I want to compare XBIS, text, and zipped text.

Obviously I can't test every possible internal form that might be used 
by an application. However, the vast majority of XML document processing 
in Java is currently built on the event streams produced by SAX parsers. 
Any general XML format should be convertible to and from an event stream 
of this type, and in practice that's the way any alternative general 
formats are likely to be used (at least in the near term).

This type of testing is admittedly only relevant for general-purpose 
formats. Schema-specific formats (such as the ASN.1 schema 
representation) get into a whole separate set of issues and should be 
compared differently. You *can* compare time and space performance for 
schema-specific formats vs. text (as Sun did in their "Fast Web 
Services" paper) or alternatives such as XBIS, but it's in some sense an 
apples-to-oranges comparison. Schema-specific formats are best suited to 
use with data binding type approaches, where the application doesn't 
really see XML as such, only objects that are mapped to XML components. 
I would expect that in these circumstances schema-specific formats would 
always be able to deliver better performance than general-purpose 
formats such as XBIS, let alone text. However, schema-specific formats 
are only usable when the documents being exchanged are known to follow 
those particular schemas. Even then there can be problems - since the 
schema-specific formats do not preserve raw text they generally won't be 
usable with signing and such, for instance. General-purpose formats such 
as XBIS would not have this problem.

On the other hand, I think this *would* be a fair comparison test for 
the ASN.1 "fast infoset" approach that's been mentioned in related 
emails. If there's an implementation of this available for Java (that 
goes to and from SAX2) I'd be very interested in including it in my tests.

  - Dennis

Dennis M. Sosnoski
Enterprise Java, XML, and Web Services
Training and Consulting
Redmond, WA  425.885.7197


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS