OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help



   Re: [xml-dev] Validation vs performance - was Re: [xml-dev] Fast text ou

[ Lists Home | Date Index | Thread Index ]

Stephen D. Williams wrote:

> Processing overhead, including the major components of parsing / object 
> creation / data copies / serialization, is not a 'future problem'.  It 
> has always been a problem.

We don't know how much and what kind of a problem XML will be until we've
had time to gain experience -- if we try to optimize too early, we'll end up
optimizing the wrong thing.

For example, I set up a test for a customer a while back to see how fast
Expat could parse documents.  On my 900 MHz Dell notebook, with 256MB RAM
and Gnome, Mozilla, and XEmacs competing for memory and CPU, Expat could
parse about 3,000 1K XML documents per second (if memory does not fail me).
  If I had tried to, say, build DOM trees from that, I expect that the number
would have fallen into the double digits (in C++) or worse.  In this case,
obviously, there would be far more to be gained from optimizing the code on
the other side of the parser (say, by implementing a reusable object pool or
lazy tree building) than there would be from replacing XML with something
that parsed faster.

I have never benchmarked SOAP implementations, so I have no idea how well
they perform, but my Expat datapoint suggests that XML parsing is unlikely
to be the bottleneck.  In fact, you might be able to gain more by writing an
optimized HTTP library that fed content as a stream rather than doing an
extra buffer copy.

> The scarce resource is time.  Anything that eats time is bad.  This could
> be bandwidth usage, CPU, memory, or suboptimal communication and semantic
>  models.

I have some experience with high-volume, high-speed systems as well.  They
tend to be so finely hand-tuned that they couldn't use *any* off-the-shelf
format or protocol, much less XML or SOAP -- even HTTP (or in some cases,
TCP) is out of the question.  These are the kinds of people who will use
deltas to avoid wasting four bytes on every number.

All the best,



News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS