[
Lists Home |
Date Index |
Thread Index
]
Hi,
Mike said:
Most of you probably write a lot more code than I do these days, so
forgive this brief pontification: I'm not sure about
Roger Costello's situation, but there may be a general "antipattern"
that assumes a priori that the overhead of XML text
parsing is a significant bottleneck and goes to great lengths to
preserve the parsed representation somehow. Of
course, you'll just have to profile your own applications to figure this
out, but I know I have wasted months trying to
figure out how to avoid the "inefficiency" of parsing XML data multiple
times as it flowed between different modules of a
system. Subsequent analysis showed the overhead of parsing the XML to be
roughly similar to the overhead of converting
binary data structures back and forth, and sending the text around
greatly simplified the architecture.
Didier replies:
I agree with your conclusion and let me share some of the experience I
gained in Didier's labs.
You know, when you integrate systems together you have to decide if this
is done
A) through function calls (i.e. integration through function/procedures)
B) Through messaging (i.e. integration through data)
What is funny is that both RPCs (case A) or XML documents (case B) have
from the processing standpoint about the same overhead. This is because
if you use RPCs you have to somehow use a marshaling protocol and this
implies that the system has to encode, package and decode the
procedures/function calls. This is not time or CPU free since for each
function/procedure call you have to perform this overhead operation.
In the case when you are using an XML document or more particularly if
you choose to integrate through data, then, in that case, you have the
document parsing overhead. The funny thing is that anyway, you have some
processing overhead.
In Didier's lab I have found that the rule of thumb is to think at the
global level. To look at the entire system and not at one of its
particular element. For instance, at the global level, if you pick the
integration through function/procedure, then you may end up with a lot
of interaction between the two systems. This could be translated in an
overall processing time taking a lot more time when you count network
latencies. In general, both from the theoretical and practical point of
view, it is better to limit the interactions between the systems since
accessing memory and local processing is a lot faster then transmitting
data through the network. The image I am using in my mind to remember
this is to see a processing unit waiting for something to do. I can
clearly see the processor having what it needs a lot faster from the
local memory than from a remote distance especially when the data has to
travel lands that sometime resemble a lord of the ring adventure.
Conclusion: It is better to use local processing and minimize data
transmission either as data or as function/procedure call. However I
never made any experiment involving huge and very huge XML documents. As
a good (or fair) mathematician, I know that the whole system may behave
differently in that case and that we may not necessarily infer that the
behavior will be linearly proportional to what we got with smaller
documents. All my experiments involved XML documents with a maximum size
of 1Mbyte on a DSL, T1, Fast Ethernet (100Mbits/s) and cable broadband
line.
Cheers
Didier PH Martin
|