Lists Home |
Date Index |
At 20:16 06/03/2002 +1100, Rick Jelliffe wrote:
>From: "Sean McGrath" <email@example.com>
> > At 18:32 06/03/2002 +1100, Rick Jelliffe wrote:
> > >There is a case to be made that, for implementability reasons, it is
> > >good to bundle together as many orthogonal functions that can act as
> > >visitors on the same traversal of the infoset.
> > This makes no sense to me. I must be missing something. What do you
> > mean "for implementability reasons". My gut tells me quite the
> > opposite!
>Oops, I meant "for efficiency reasons". Rather than traversing an infoset
>many times, it can be more efficient if different functions can be performed
>in a single pass through a document. *
Rick, I agree with you most of the time but I have to totally disagree
with this line of thinking.
(This started out as a short reply and sort of grew and grew. Sorry.)
Efficiency of XML processing is classic premature optimization territory.
I''ve learned from bitter experience that doing XML processing monolithically
for efficiency reasons is almost always a bad idea. Its bad design, leads
evolvability and - more often not - is based on a false impression of where the
bottleknecks really are. I find again and again that if you design and
loosely coupled XML systems - ignoring efficiency concerns - efficiency has
a way of sorting itself out without adversely impacting the design.
Two examples germane to this discussion. In XPipe, We are prototyping some
XPipe compilers that are looking very promising. High efficiency execution
but loose coupling of processing types. Also in XPipe, we are working
on some P2P execution environments (XGrids) that allow multiple
processors to cooperate to perform XML processing. This exhibits
efficiencies that will bring tears to your eyes - yet not at the expense
of monolithic designs.
XGrid shows up one significant foible in software developers - a predisposition
to thinking in von Neumann architecture terms. i.e. if it takes 1 minute to
process 1 XML document and I have 100,000 documents, then the processing
will take 100,000 minutes = about 70 days.
Now, in many cases, the processing is trivially parallelizable. In the
case where there are no interdependancies between the XML instances, the total
processing time can get as close to 1 minute as you like with the aid of
multiple processors. With the aid of some judicious domain decomposition, we
find that *most* XML processing can be made trivially parallelizable.
Optimizing the 1 minute figure for processing a single XML instance - the
end-to-end "make time" of a single XML document only makes sense in
near-realtime scenarios. For everything else, XGrid style distributed
computing beats will beat hand-crafted "optimized" single pass systems
hands down. Both in throughput terms and in evolvability terms.
Any schema language, query language or any other XML technology that
justifies the complexity that is concomitant with monolithic design on
efficiency grounds should be treated skeptically. Remember this - complexity
is a well established business model. Justifying complexity on the
grounds of efficiency plays on the collective weakness of us practictioners
in the software engineering profession to spot the baselessness of most
"for efficiency" pitches.