Lists Home |
Date Index |
Thomas B. Passin wrote:
>Dennis Sosnoski wrote:
>>The problem, which I've expressed more than once, is to compare the
>>performance for the alternatives of using text XML vs. some post-parse
>>representation of XML documents. For the reasons given in my earlier
>>email I'm chosing to base my timing comparisons on the parse event
>But presumably the alternative "quasi-xml" you will be testing will not
>likely be producing SAX events, but instead some proprietary parse
>system instead. Do you mean that you want to write a proprietary - to -
>textual xml file and compare that with writing a SAX-to-textual xml file?
The focus of what I'm doing is looking at general XML document
interchange performance. The XBIS results I posted prior to the W3C
workshop last fall (http://xbis.sourceforge.net/performance.html) are
from a similar set of tests, which I'm now extending in a couple of ways.
As I see it the most useful comparisons to be made for general XML
document interchange performance are (1) how much time is required to
convert an incoming document to a form usable by the application, (2)
how much time is required to convert from the internal form used by the
application to a form that's serialized for transmission, and (3) what's
the size of the serialized form. For the specific tests I'm running now
I want to compare XBIS, text, and zipped text.
Obviously I can't test every possible internal form that might be used
by an application. However, the vast majority of XML document processing
in Java is currently built on the event streams produced by SAX parsers.
Any general XML format should be convertible to and from an event stream
of this type, and in practice that's the way any alternative general
formats are likely to be used (at least in the near term).
This type of testing is admittedly only relevant for general-purpose
formats. Schema-specific formats (such as the ASN.1 schema
representation) get into a whole separate set of issues and should be
compared differently. You *can* compare time and space performance for
schema-specific formats vs. text (as Sun did in their "Fast Web
Services" paper) or alternatives such as XBIS, but it's in some sense an
apples-to-oranges comparison. Schema-specific formats are best suited to
use with data binding type approaches, where the application doesn't
really see XML as such, only objects that are mapped to XML components.
I would expect that in these circumstances schema-specific formats would
always be able to deliver better performance than general-purpose
formats such as XBIS, let alone text. However, schema-specific formats
are only usable when the documents being exchanged are known to follow
those particular schemas. Even then there can be problems - since the
schema-specific formats do not preserve raw text they generally won't be
usable with signing and such, for instance. General-purpose formats such
as XBIS would not have this problem.
On the other hand, I think this *would* be a fair comparison test for
the ASN.1 "fast infoset" approach that's been mentioned in related
emails. If there's an implementation of this available for Java (that
goes to and from SAX2) I'd be very interested in including it in my tests.
Dennis M. Sosnoski
Enterprise Java, XML, and Web Services
Training and Consulting
Redmond, WA 425.885.7197