Lists Home |
Date Index |
David Megginson wrote:
> Stephen D. Williams wrote:
>> Processing overhead, including the major components of parsing /
>> object creation / data copies / serialization, is not a 'future
>> problem'. It has always been a problem.
> We don't know how much and what kind of a problem XML will be until we've
> had time to gain experience -- if we try to optimize too early, we'll
> end up
> optimizing the wrong thing.
I suppose "early" and "time to gain experience" are relative.
> For example, I set up a test for a customer a while back to see how fast
> Expat could parse documents. On my 900 MHz Dell notebook, with 256MB RAM
> and Gnome, Mozilla, and XEmacs competing for memory and CPU, Expat could
> parse about 3,000 1K XML documents per second (if memory does not fail
> If I had tried to, say, build DOM trees from that, I expect that the
> would have fallen into the double digits (in C++) or worse. In this
> obviously, there would be far more to be gained from optimizing the
> code on
> the other side of the parser (say, by implementing a reusable object
> pool or
> lazy tree building) than there would be from replacing XML with something
> that parsed faster.
Why make the assumption that "optimizing the code on the other side of
the parser" is the first or only step? I posit that this is not the
best way to proceed and artificially narrows possible solutions. The
steps needed to parse XML, such as processing Expat events, cause a
minimum amount of work. When that data has been parsed, it must be in a
usable form and data in a usable form must be serialized at some point.
The format and the difference between it and memory formats create a
minimum bound on the theoretical least amount of work. Other data
formats have lower minimum bounds.
>> The scarce resource is time. Anything that eats time is bad. This
>> be bandwidth usage, CPU, memory, or suboptimal communication and
> I have some experience with high-volume, high-speed systems as well.
> tend to be so finely hand-tuned that they couldn't use *any*
> format or protocol, much less XML or SOAP -- even HTTP (or in some cases,
> TCP) is out of the question. These are the kinds of people who will use
> deltas to avoid wasting four bytes on every number.
Of course ;-).
I'm just trying to spread the efficiency to something standard.
> All the best,
firstname.lastname@example.org http://www.hpti.com Per: email@example.com http://sdw.st
Stephen D. Williams 703-724-0118W 703-995-0407Fax 20147-4622 AIM: sdw