[
Lists Home |
Date Index |
Thread Index
]
Daniel Veillard wrote:
> ...
>
> And using UCS-2 for memory encoding is also in a lot of cases
> a really bad choice. Processor performances are cache related nowadays.
> Filling them up with 0 for half of your data processed can simply
> trash your caches. I will stick to UTF8 internally, it also allows
> some processor to use hardcoded CISC instructions for 0 terminated C
> strings (IIRC the Power line of processors have such a set of instructions).
The costs and benefits of UTF-8 are well-known. Random-access at the
character level becomes quite inefficient. Neither UCS-2 nor UTF-8 are
right as the in-memory model for all applications.
Paul Prescod
|