OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: Textual transmission typically faster?

1/21/01 6:48:39 PM, Danny Ayers <danny@panlanka.net> wrote:
>An example of a binary format reducing interoperability is MS compressed
>HTML. This is used in some Windows help systems, but is out of reach to
>other systems such as JavaHelp. It seems unlikely however that MS came up
>with this format with efficiency in mind, copyright paranoia perhaps being a
>more likely motivation.

This is just speculation, but I suspect the motivation was a sort of inertia: 
the new MS HTML help system uses a highly compressed format because the old MS 
RTF-based help system used a highly compressed format so compression became 
The Way We've Always Done It, despite the fact that the conditions that made 
heavy compression a sensible decision Way Back When (memory and disk space 
were expensive, and you needed tricky code to access any data structure over 
64K) no longer exist.

As far as "optimization" of data formats goes in general, remember that the 
law of evolution known as Fisher's Fundamental Theorem of Natural Selection 
applies to human inventions as well as biological organisms.  It says that the 
better adapted an organism is to its current environment, the less change in 
its environment it can survive.  In the realm of data formats, this means that 
the effort spent optimizing a data format is likely to be wasted as soon as 
the data to be conveyed changes, because the optimization took advantage of 
what were effectively limitation on the data to be conveyed.  Gerald Weinberg 
tells the story of a group that was writing an assembler for an early 
computer.  They managed to come up with a perfect hash function that could map 
all the machine's opcodes into single-byte codes, thus avoiding the overhead 
of chaining or probing.  They were really proud of their ingenuity, but then 
the next version of the hardware came out, adding a couple new opcodes, and 
there was no longer any perfect hash for them.  The group had to replace their 
carefully-optimized code with a standard hash-table lookup; if they had done 
that in the first place, a lot of wasted effort would have been saved.