[
Lists Home |
Date Index |
Thread Index
]
On Apr 16, 2004, at 2:30 PM, Elliotte Rusty Harold wrote:
> . I have seen any number of binary formats that achieve speed gains
> precisely by doing this. And it is my contention that if this is
> disallowed (as I think it should be) much, perhaps all, of the speed
> advantages of these binary formats disappears.
>
Well, there is an immense amount of truth in this, but I have to take
issue with the "as I think it should be" aside. For example,
there are AFAIK plenty of enterprise systems out there that do a
billion transactions a day during peak times. Even on big honking
hardware that doesn't allow many cycles per transaction for data
validation if you have to do it more than 10,000 times per second.
As best I understand it, people get this kind of performance in an
enterprise environment by various methods, including a) doing the
business-rule validation and data cleansing earlier in the pipeline,
b) trusting the overall business process to have produced valid data at
crunch time; and c) auditing the results so that if somebody tries to
exploit this trust, sooner or later they will be caught. The same
basic approaches are available in "XML" environments, e.g. validating
and optimizing the data early in the pipeline, and using efficiently
formatted and trusted data for downstream processing. AFAIK
essentially everyone using XML in a performance-critical environment
(such as a DBMS or an enterprise messaging system) does something along
these lines, including a couple of mega-corporations who do not see the
value of *standardizing* the efficient XML formats. <duck>
Echoes of the great RSS well-formedness debate: the choice isn't
between unquestioningly accepting whatever data you are given and doing
draconian checking at every single step in the pipeline, it's a
question of how to setup the pipeline to detect corrupt data early on
and do what it takes to get it fixed or rejected, and then efficiently
process the data in those parts of the pipeline where speed is
critical. Sometimes XML syntax level validation against a DTD or
schema is useful as part of this, sometimes not. Sometimes double and
triple checking of data validity against business rules by procedural
code makes good business sense, sometimes not. Sometimes you can get
away with throwing the data back at the originator to fix, and
sometimes you gotta fix it yourself.
I cringe at the "Right Thing vs the Cowboy Way" characterizations at
various points in this these threads. There are a lot of ways to set up
a business process or transformation/aggregation pipeline to get both
scalability and validity, and recommendations "disallowing" particular
approaches at one step by global fiat are certain to be ignored. It
would be nice to get these threads turned into a discussion of best
practices that people see in real life to find the optimal tradeoffs
between desirable but somewhat incompatible properties such as loose
coupling and high performance ... and away from discussion of alleged
universal principles that should be promoted or disallowed.
|