OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help



   Use cases for parsing efficiency (was Re: [xml-dev] Parsingefficiency? -

[ Lists Home | Date Index | Thread Index ]

On Wed, 26 Feb 2003 07:44:31 -0500, David Megginson <david@megginson.com> 

> In the past, I've observed that actual XML parsing generally accounts
> for under 1% of a batch application's running time (much less, if
> you're building a big object tree or doing any database access).  That
> means that if you speed up the XML parsing by 10%, you might have sped
> up your application by less than 0.1% (or realistically, not at all,
> if the parser was already idling waiting for data over the network).

I for one wouldn't dispute that most people on this list have had similar 
experiences, and we all know that you won't get much real speedup by 
optimizing non-bottlenecks.  As a matter of fact, until a few months ago I 
was as much a scoffer at the arguments that Al and Robin raise as any of 

My day job colleagues changed my mind by pointing out that in industrial- 
strength, native XML processing environments, nothing much is happening 
besides XML being parsed, processed (stored, queried, transformed) and 
serialized again.  The better code gets and the more efficient customers 
get in using the code (e.g. building DB indexes and optimizing queries, in 
our case),the more and more that parse/serialization step becomes a 
bottleneck.  I've heard the same thing from industrial-strength SOAP 
developers -- as the volume of messages goes up and processing resources 
get dedicated to XML (i.e., no application logic or DB access happening on 
the machine parsing, processing, serializing the XML), then the bottlenecks 
in XML parsing become increasingly apparent.  Sure, Father Moore will 
ultimately solve this problem with faster hardware, but that's not a great 
marketing pitch for software people.

So why should you all care about standardization of processing pipelines 
that are generally *internal* to products?  I'm not completely sure you 
should.  One might argue that you as customers of / developers for 
enterprise-class XML processing software may wish to tap into the pipelines 
at a lower level, e.g. grab the rawest Infoset data out of a DBMS before it 
gets sanitized and standardized by the API level, or insert your own 
specialized SOAP processors (e.g. to support a new choroegraphy standard) 
deep into IBM or Microsoft's architecture.  If the vendors all go their 
separate ways on efficient infoset representations, we're back to the Bad 
Old Days (e.g., where SQL is today) in which "standards" are more or less 
conceptual frameworks rather than the basis for interoperable code, at 
least at the down-n-dirty level. Another argument for standardization of 
this stuff is that -- as Robin points out repeatedly -- lots and lots of 
wheels are being reinvented daily.  There's something to be said for 
cooperation and joint research / development / testing under the aegis of a 
standards body (perhaps like XQuery, which is also more of a joint research 
project than a standardization of existing pratice).

So, I'm not at all sure that standardization of efficient infoset 
serializations is something that the W3C or anyone else should undertake at 
this time. But I don't want to see the W3C preclude it (or XML geeks to 
conclude that it is evil) either.  XML processing is moving more and more 
into the core of real enterprises. We'll see the previous situation where 
XML is just a transient serialization format between DBs and applications 
turned around, so that most of the components of a processing pipeline are 
taking XML in, storing/processing it natively,  and putting XML out.    In 
that scenario, lots of people are going to be looking for ways to reduce 
the parsing bottlenecks ... either by subsetting (entity expansion is a 
notorious bottleneck in high-performance XML processors, to the point where 
the SOAP community simply refuses to do it), by exploring "binary" 
serializations, or both.  I don't want to see this "pollute" document XML, 
but some of the assumptions of what is universal across document and data 
XML will probably have to change to make this happen without a major fork.


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS