[
Lists Home |
Date Index |
Thread Index
]
On Wed, 26 Feb 2003 07:44:31 -0500, David Megginson <david@megginson.com>
wrote:
>
> In the past, I've observed that actual XML parsing generally accounts
> for under 1% of a batch application's running time (much less, if
> you're building a big object tree or doing any database access). That
> means that if you speed up the XML parsing by 10%, you might have sped
> up your application by less than 0.1% (or realistically, not at all,
> if the parser was already idling waiting for data over the network).
I for one wouldn't dispute that most people on this list have had similar
experiences, and we all know that you won't get much real speedup by
optimizing non-bottlenecks. As a matter of fact, until a few months ago I
was as much a scoffer at the arguments that Al and Robin raise as any of
you.
My day job colleagues changed my mind by pointing out that in industrial-
strength, native XML processing environments, nothing much is happening
besides XML being parsed, processed (stored, queried, transformed) and
serialized again. The better code gets and the more efficient customers
get in using the code (e.g. building DB indexes and optimizing queries, in
our case),the more and more that parse/serialization step becomes a
bottleneck. I've heard the same thing from industrial-strength SOAP
developers -- as the volume of messages goes up and processing resources
get dedicated to XML (i.e., no application logic or DB access happening on
the machine parsing, processing, serializing the XML), then the bottlenecks
in XML parsing become increasingly apparent. Sure, Father Moore will
ultimately solve this problem with faster hardware, but that's not a great
marketing pitch for software people.
So why should you all care about standardization of processing pipelines
that are generally *internal* to products? I'm not completely sure you
should. One might argue that you as customers of / developers for
enterprise-class XML processing software may wish to tap into the pipelines
at a lower level, e.g. grab the rawest Infoset data out of a DBMS before it
gets sanitized and standardized by the API level, or insert your own
specialized SOAP processors (e.g. to support a new choroegraphy standard)
deep into IBM or Microsoft's architecture. If the vendors all go their
separate ways on efficient infoset representations, we're back to the Bad
Old Days (e.g., where SQL is today) in which "standards" are more or less
conceptual frameworks rather than the basis for interoperable code, at
least at the down-n-dirty level. Another argument for standardization of
this stuff is that -- as Robin points out repeatedly -- lots and lots of
wheels are being reinvented daily. There's something to be said for
cooperation and joint research / development / testing under the aegis of a
standards body (perhaps like XQuery, which is also more of a joint research
project than a standardization of existing pratice).
So, I'm not at all sure that standardization of efficient infoset
serializations is something that the W3C or anyone else should undertake at
this time. But I don't want to see the W3C preclude it (or XML geeks to
conclude that it is evil) either. XML processing is moving more and more
into the core of real enterprises. We'll see the previous situation where
XML is just a transient serialization format between DBs and applications
turned around, so that most of the components of a processing pipeline are
taking XML in, storing/processing it natively, and putting XML out. In
that scenario, lots of people are going to be looking for ways to reduce
the parsing bottlenecks ... either by subsetting (entity expansion is a
notorious bottleneck in high-performance XML processors, to the point where
the SOAP community simply refuses to do it), by exploring "binary"
serializations, or both. I don't want to see this "pollute" document XML,
but some of the assumptions of what is universal across document and data
XML will probably have to change to make this happen without a major fork.
|