Lists Home |
Date Index |
Paul Prescod wrote:
> The costs and benefits of UTF-8 are well-known. Random-access at the
> character level becomes quite inefficient. Neither UCS-2 nor UTF-8 are
> right as the in-memory model for all applications.
I find that I use UTF-8 more & more even for internal processing. I
suspect that some of the shock & horror I first felt upon encountering
this severe bit-munging lives on somewhere in the Web to be thrown in my
face at some future point.
Seems weird, but I just *never* seem to need direct indexing into
character buffers any more. I seem to remember that I used to do this a
lot... don't know what changed. Also, the notion of building a
fast-searchable page table for enabling quick lookup of variable-size
whatevers has become an awfully common idiom, not constant time but
o(log(N)) is pretty damn good in RAM.
I'm out of touch with academe... I wonder if the focus of data
structures courses has changed as the price of RAM storage
asymptotically approaches zero. -Tim