OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help



   Re: [xml-dev] Fast text output from SAX?

[ Lists Home | Date Index | Thread Index ]

Elliotte Rusty Harold wrote:

>What are you actually trying to do? Serialization is not normally the 
>concern here. Is the question how fast this stuff can be generated? 
>Or something else?
Yes, the question is how fast this stuff - text XML - can be generated. 
I'm interested in XML interchange costs, which includes both the input 
and output processing overhead as well as the actual document size. In 
order to give a comparison that's as fair as possible I'm using SAX2 
parse event streams as the common base document representation, which 
also corresponds to actual usage - if you were using XBIS as the 
transport for an XML document representation between programs you'd 
generally do so by plugging in an encoder that takes the place of text 
serialization at the sending end and a decoder that takes the place of a 
parser at the receiving end.

SAX2 is actually not all that ideal for this purpose, from either the 
usability (as discussed in the past on this list) or the performance 
standpoints. In terms of performance, SAX2 avoids strings for actual 
character data content and comments, but still requires them for 
attribute values - this means that immutable String objects have to be 
created for each attribute accessed on the receiving end, when 
oftentimes a temporary reference (using a JDK 1.4 CharBuffer, for 
instance) is all that's really needed. Another unfortunate performance 
aspect of SAX2 is that the namespace processing often needs to be done 
twice - once by the parser, then again by the application in many cases 
since the prefix-URI connection will otherwise be lost (unless the 
application just ignores namespaces and works directly with qualified 
names). If the API were to use Namespace objects directly (as 
implemented by any of the document models) and report names as 
(Namespace, String) (or (Namespace, CharBuffer)) pairs it would help 
eliminate some extra processing.

Thoughts for SAX3, I suppose.

  - Dennis

Dennis M. Sosnoski
Enterprise Java, XML, and Web Services
Training and Consulting
Redmond, WA  425.885.7197


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS