[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: Textual transmission typically faster?
- From: Eric Bohlman <ebohlman@earthlink.net>
- To: Danny Ayers <danny@panlanka.net>, xml-dev@lists.xml.org
- Date: Sun, 21 Jan 2001 17:34:47 -0600
1/21/01 6:48:39 PM, Danny Ayers <danny@panlanka.net> wrote:
>An example of a binary format reducing interoperability is MS compressed
>HTML. This is used in some Windows help systems, but is out of reach to
>other systems such as JavaHelp. It seems unlikely however that MS came up
>with this format with efficiency in mind, copyright paranoia perhaps being a
>more likely motivation.
This is just speculation, but I suspect the motivation was a sort of inertia:
the new MS HTML help system uses a highly compressed format because the old MS
RTF-based help system used a highly compressed format so compression became
The Way We've Always Done It, despite the fact that the conditions that made
heavy compression a sensible decision Way Back When (memory and disk space
were expensive, and you needed tricky code to access any data structure over
64K) no longer exist.
As far as "optimization" of data formats goes in general, remember that the
law of evolution known as Fisher's Fundamental Theorem of Natural Selection
applies to human inventions as well as biological organisms. It says that the
better adapted an organism is to its current environment, the less change in
its environment it can survive. In the realm of data formats, this means that
the effort spent optimizing a data format is likely to be wasted as soon as
the data to be conveyed changes, because the optimization took advantage of
what were effectively limitation on the data to be conveyed. Gerald Weinberg
tells the story of a group that was writing an assembler for an early
computer. They managed to come up with a perfect hash function that could map
all the machine's opcodes into single-byte codes, thus avoiding the overhead
of chaining or probing. They were really proud of their ingenuity, but then
the next version of the hardware came out, adding a couple new opcodes, and
there was no longer any perfect hash for them. The group had to replace their
carefully-optimized code with a standard hash-table lookup; if they had done
that in the first place, a lot of wasted effort would have been saved.