OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help



   Re: [xml-dev] Validation vs performance - was Re: [xml-dev] Fast text ou

[ Lists Home | Date Index | Thread Index ]

David Megginson wrote:

> Stephen D. Williams wrote:
>> Processing overhead, including the major components of parsing / 
>> object creation / data copies / serialization, is not a 'future 
>> problem'.  It has always been a problem.
> We don't know how much and what kind of a problem XML will be until we've
> had time to gain experience -- if we try to optimize too early, we'll 
> end up
> optimizing the wrong thing.

I suppose "early" and "time to gain experience" are relative.

> For example, I set up a test for a customer a while back to see how fast
> Expat could parse documents.  On my 900 MHz Dell notebook, with 256MB RAM
> and Gnome, Mozilla, and XEmacs competing for memory and CPU, Expat could
> parse about 3,000 1K XML documents per second (if memory does not fail 
> me).
>  If I had tried to, say, build DOM trees from that, I expect that the 
> number
> would have fallen into the double digits (in C++) or worse.  In this 
> case,
> obviously, there would be far more to be gained from optimizing the 
> code on
> the other side of the parser (say, by implementing a reusable object 
> pool or
> lazy tree building) than there would be from replacing XML with something
> that parsed faster.

Why make the assumption that "optimizing the code on the other side of 
the parser" is the first or only step?  I posit that this is not the 
best way to proceed and artificially narrows possible solutions.  The 
steps needed to parse XML, such as processing Expat events, cause a 
minimum amount of work.  When that data has been parsed, it must be in a 
usable form and data in a usable form must be serialized at some point.  
The format and the difference between it and memory formats create a 
minimum bound on the theoretical least amount of work.  Other data 
formats have lower minimum bounds.

> ...
>> The scarce resource is time.  Anything that eats time is bad.  This 
>> could
>> be bandwidth usage, CPU, memory, or suboptimal communication and 
>> semantic
>>  models.
> I have some experience with high-volume, high-speed systems as well.  
> They
> tend to be so finely hand-tuned that they couldn't use *any* 
> off-the-shelf
> format or protocol, much less XML or SOAP -- even HTTP (or in some cases,
> TCP) is out of the question.  These are the kinds of people who will use
> deltas to avoid wasting four bytes on every number.

Of course ;-).
I'm just trying to spread the efficiency to something standard.

> All the best,
> David


swilliams@hpti.com http://www.hpti.com Per: sdw@lig.net http://sdw.st
Stephen D. Williams 703-724-0118W 703-995-0407Fax 20147-4622 AIM: sdw

fn:Stephen Williams


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS