OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help



   Re: [xml-dev] Use cases for parsing efficiency (was Re: [xml-dev]Parsing

[ Lists Home | Date Index | Thread Index ]

Mike Champion wrote:

> My day job colleagues changed my mind by pointing out that in 
> industrial- strength, native XML processing environments, nothing much 
> is happening besides XML being parsed, processed (stored, queried, 
> transformed) and serialized again.  

That's quite a lot happening (other than parsing). I mean what else 
/could/ happen?

> The better code gets and the more 
> efficient customers get in using the code (e.g. building DB indexes and 
> optimizing queries, in our case),the more and more that 
> parse/serialization step becomes a bottleneck.  I've heard the same 
> thing from industrial-strength SOAP developers -- as the volume of 
> messages goes up and processing resources get dedicated to XML (i.e., no 
> application logic or DB access happening on the machine parsing, 
> processing, serializing the XML), then the bottlenecks in XML parsing 
> become increasingly apparent.  Sure, Father Moore will ultimately solve 
> this problem with faster hardware, but that's not a great marketing 
> pitch for software people.

So, following David, in one hundred secs, you spend one second 
parsing XML and 99 seconds doing somehting else. Suppose you get a 
tenfold speedup doing something else (cigars all round). You're down 
to 11 seconds. Parsing is approaching 10%. A tenfold speedup in 
parsing only saves you 1/2 a second, or approaching 9%, now. And 
because it's /still/ the wrong side of the 80/20 split, it's /still/ 
  not place to be looking, unless you know that processing time is 
evenly distributed through the code base (but that would be rare, 
and probably worth writing a paper on). The same reasoning applies 
at 10% time to begin with.

> So, I'm not at all sure that standardization of efficient infoset 
> serializations is something that the W3C or anyone else should undertake 
> at this time. But I don't want to see the W3C preclude it (or XML geeks 
> to conclude that it is evil) either.  XML processing is moving more and 
> more into the core of real enterprises. We'll see the previous situation 
> where XML is just a transient serialization format between DBs and 
> applications turned around, so that most of the components of a 
> processing pipeline are taking XML in, storing/processing it natively,  
> and putting XML out.    In that scenario, lots of people are going to be 
> looking for ways to reduce the parsing bottlenecks ... 

Performance arguments mean nothing without measurement. And even if 
parsing is a problem, it does not follow that XML requries 
subsetting. For example, you might be better off with an 'enterpise 
class parser', or an 'enterpise class datamodel' than with an 
'industrial class subset'.

I'd like to see some some numbers on parsing concerns, so we could 
figure out a) is there a problem, b) where is a problem, c) what's 
the solution.

Bill de hÓra


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS