OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

 


 

   Re: [xml-dev] Fast text output from SAX?

[ Lists Home | Date Index | Thread Index ]

At 10:00 AM -0400 4/14/04, Stephen D. Williams wrote:


>The fact is that creating, populating, and manipulating a data model 
>has costs.  This is true of DOM, SAX (where the data model is 
>managed by the application), esXML (where the data model is also the 
>'serialized' format so all costs are manipulation), and all other 
>applications that involve internal and external data (Corba, DCOM, 
>ONC-RPC, ASN.1/xER, etc.).  It's not fair to ignore part of the 
>processing cycle for a format (esXML) that trades some manipulation 
>overhead for all parsing/serialization/object creation/object 
>population overhead.
>

I consider creating and populating the data model to be part of 
parsing if it's done from an event stream. For instance, the time to 
build a DOM document object is significant. Sorry if that wasn't 
clear. My point is that once the object exists in memory the 
manipulations from that point until you start serializing are 
irrelevant. In my tests with my model, parsing/object creation is 
about 2/3 of the time, serialization is about 1/3, and manipulation 
is unmeasurable. Various optimizations adjust the absolute numbers, 
but the 2-1-0 ratio seems pretty consistent. Possibly other formats 
have different ratios. However, given that real world programs read 
data from input streams and write them to output streams rather than 
byte arrays like benchmarks do, it doesn't seem credible that 
in-memory XML operations like add and remove are worth optimizing.

>Additionally, the whole parsing etc. stream for XML must be 
>completely performed, in DOM cases and many SAX cases, for every 
>element of a document/object.  With esXML, if a 3000 element 
>document/object were read in and 5 elements manipulated, you only 
>spend 5*element-manipulation-overhead.

I flat out don't believe this. I think there's an underlying 
assumption here (and in some of the other binary formats) which once 
again demonstrates that they are not as much like XML as they claim. 
The only way you can limit this is by assuming the data in your 
stream is well-formed. In XML, we don't assume that. One of the 3000 
nodes you don't process may be malformed. You're assuming that's not 
the case, and therefore avoiding a lot of overhead in checking for 
it. A large chunk of any speed gain such a format achieves over real 
XML is by cutting corners on well-formedness checking.

If this is not the case for esXML and indeed it does make all 
mandated well-formedness checks, then please correct my error. 
However, I'd be very surprised that in that case that one could 
indeed limit parsing overhead to the raw I/O.
-- 

   Elliotte Rusty Harold
   elharo@metalab.unc.edu
   Effective XML (Addison-Wesley, 2003)
   http://www.cafeconleche.org/books/effectivexml
   http://www.amazon.com/exec/obidos/ISBN%3D0321150406/ref%3Dnosim/cafeaulaitA




 

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS