xml-dev - Re: [xml-dev] Specifying a Unicode subset

Re: [xml-dev] Specifying a Unicode subset

[ Lists Home | Date Index | Thread Index ]

To: veillard@redhat.com
Subject: Re: [xml-dev] Specifying a Unicode subset
From: Paul Prescod <paul@prescod.net>
Date: Tue, 22 Oct 2002 16:36:47 -0700
Cc: xml-dev@lists.xml.org
References: <AF104122-E511-11D6-BFB3-0030657E2F34@mac.com> <200210211640.MAA28778@mail2.reutershealth.com> <20021022173710.E12115@redhat.com>
User-agent: Mozilla/5.0 (Macintosh; U; PPC Mac OS X; en-US; rv:1.1) Gecko/20020826

Daniel Veillard wrote:
> ...
> 
>   And using UCS-2 for memory encoding is also in a lot of cases
> a really bad choice. Processor performances are cache related nowadays.
> Filling them up with 0 for half of your data processed can simply
> trash your caches. I will stick to UTF8 internally, it also allows
> some processor to use hardcoded CISC instructions for 0 terminated C
> strings (IIRC the Power line of processors have such a set of instructions).

The costs and benefits of UTF-8 are well-known. Random-access at the 
character level becomes quite inefficient. Neither UCS-2 nor UTF-8 are 
right as the in-memory model for all applications.

  Paul Prescod

Follow-Ups:
- Re: [xml-dev] Specifying a Unicode subset
  - From: Tim Bray <tbray@textuality.com>

References:
- Re: [xml-dev] Specifying a Unicode subset
  - From: tblanchard@mac.com
- Re: [xml-dev] Specifying a Unicode subset
  - From: John Cowan <jcowan@reutershealth.com>
- Re: [xml-dev] Specifying a Unicode subset
  - From: Daniel Veillard <veillard@redhat.com>

Prev by Date: XML 1.1 Names
Next by Date: XML 1.1 documents
Previous by thread: Re: [xml-dev] Specifying a Unicode subset
Next by thread: Re: [xml-dev] Specifying a Unicode subset
Index(es):
- Date
- Thread