OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help



   Re: [xml-dev] Specifying a Unicode subset

[ Lists Home | Date Index | Thread Index ]

Daniel Veillard wrote:
> ...
>   And using UCS-2 for memory encoding is also in a lot of cases
> a really bad choice. Processor performances are cache related nowadays.
> Filling them up with 0 for half of your data processed can simply
> trash your caches. I will stick to UTF8 internally, it also allows
> some processor to use hardcoded CISC instructions for 0 terminated C
> strings (IIRC the Power line of processors have such a set of instructions).

The costs and benefits of UTF-8 are well-known. Random-access at the 
character level becomes quite inefficient. Neither UCS-2 nor UTF-8 are 
right as the in-memory model for all applications.

  Paul Prescod


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS