xml-dev - Re: [xml-dev] Specifying a Unicode subset

Re: [xml-dev] Specifying a Unicode subset

[ Lists Home | Date Index | Thread Index ]

To: xml-dev@lists.xml.org
Subject: Re: [xml-dev] Specifying a Unicode subset
From: Tim Bray <tbray@textuality.com>
Date: Wed, 23 Oct 2002 14:46:24 -0700
In-reply-to: <AF104122-E511-11D6-BFB3-0030657E2F34@mac.com>
References: <AF104122-E511-11D6-BFB3-0030657E2F34@mac.com> <200210211640.MAA28778@mail2.reutershealth.com> <20021022173710.E12115@redhat.com> <3DB5E10F.2010507@prescod.net>
User-agent: Mozilla/5.0 (Macintosh; U; PPC Mac OS X; en-US; rv:1.2b) Gecko/20021016

Paul Prescod wrote:

> The costs and benefits of UTF-8 are well-known. Random-access at the
> character level becomes quite inefficient. Neither UCS-2 nor UTF-8 are
> right as the in-memory model for all applications.

I find that I use UTF-8 more & more even for internal processing.  I 
suspect that some of the shock & horror I first felt upon encountering 
this severe bit-munging lives on somewhere in the Web to be thrown in my 
face at some future point.

Seems weird, but I just *never* seem to need direct indexing into 
character buffers any more.  I seem to remember that I used to do this a 
lot... don't know what changed.  Also, the notion of building a 
fast-searchable page table for enabling quick lookup of variable-size 
whatevers has become an awfully common idiom, not constant time but 
o(log(N)) is pretty damn good in RAM.

I'm out of touch with academe... I wonder if the focus of data 
structures courses has changed as the price of RAM storage 
asymptotically approaches zero. -Tim

Follow-Ups:
- Re: [xml-dev] Specifying a Unicode subset
  - From: tblanchard@mac.com

References:
- Re: [xml-dev] Specifying a Unicode subset
  - From: tblanchard@mac.com
- Re: [xml-dev] Specifying a Unicode subset
  - From: John Cowan <jcowan@reutershealth.com>
- Re: [xml-dev] Specifying a Unicode subset
  - From: Daniel Veillard <veillard@redhat.com>
- Re: [xml-dev] Specifying a Unicode subset
  - From: Paul Prescod <paul@prescod.net>

Prev by Date: RE: [xml-dev] What is XML For?
Next by Date: RE: [xml-dev] XML as "passive data" (Re: [xml-dev] The Browser Wars are Dead! Long Live the Browser Wars!)
Previous by thread: Re: [xml-dev] Specifying a Unicode subset
Next by thread: Re: [xml-dev] Specifying a Unicode subset
Index(es):
- Date
- Thread