Lists Home |
Date Index |
On Mon, Oct 21, 2002 at 12:27:15PM -0400, John Cowan wrote:
> email@example.com scripsit:
> > Lets move on. UTF-8 is your transfer encoding, use UCS-2 in memory
> > (unless planning to process ancient Sumerian or something - then use
> > UCS-4) and lets all move on to something remotely interesting.
> In CJK environments, using UTF-16 for transfer makes sense, because UTF-8
> imposes a 50% growth in the size of native-language characters.
> That's basically why XML requires both UTF-8 and UTF-16 support of all
> conforming parsers.
And using UCS-2 for memory encoding is also in a lot of cases
a really bad choice. Processor performances are cache related nowadays.
Filling them up with 0 for half of your data processed can simply
trash your caches. I will stick to UTF8 internally, it also allows
some processor to use hardcoded CISC instructions for 0 terminated C
strings (IIRC the Power line of processors have such a set of instructions).
Daniel Veillard | Red Hat Network https://rhn.redhat.com/
firstname.lastname@example.org | libxml GNOME XML XSLT toolkit http://xmlsoft.org/
http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/