xml-dev - Re: [xml-dev] Specifying a Unicode subset

Re: [xml-dev] Specifying a Unicode subset

[ Lists Home | Date Index | Thread Index ]

To: veillard@redhat.com
Subject: Re: [xml-dev] Specifying a Unicode subset
From: tblanchard@mac.com
Date: Wed, 23 Oct 2002 10:30:59 +0200
Cc: xml-dev@lists.xml.org
In-reply-to: <20021022173710.E12115@redhat.com>

Uh, yeah it sort of depends on your processing model I guess - the main 
reason I use UCS-2 is I can get to character n in constant time.

With utf-8, character n is reached in time proportional to n.  Maybe 
thats OK for you - I don't find it so great though.

On Tuesday, October 22, 2002, at 11:37  PM, Daniel Veillard wrote:

> On Mon, Oct 21, 2002 at 12:27:15PM -0400, John Cowan wrote:
>> tblanchard@mac.com scripsit:
>>
>>> Lets move on.  UTF-8 is your transfer encoding, use UCS-2 in memory
>>> (unless planning to process ancient Sumerian or something - then use
>>> UCS-4) and lets all move on to something remotely interesting.
>>
>> In CJK environments, using UTF-16 for transfer makes sense, because 
>> UTF-8
>> imposes a 50% growth in the size of native-language characters.
>> That's basically why XML requires both UTF-8 and UTF-16 support of all
>> conforming parsers.
>
>   And using UCS-2 for memory encoding is also in a lot of cases
> a really bad choice. Processor performances are cache related nowadays.
> Filling them up with 0 for half of your data processed can simply
> trash your caches. I will stick to UTF8 internally, it also allows
> some processor to use hardcoded CISC instructions for 0 terminated C
> strings (IIRC the Power line of processors have such a set of 
> instructions).
>
> Daniel
>
> -- 
> Daniel Veillard      | Red Hat Network https://rhn.redhat.com/
> veillard@redhat.com  | libxml GNOME XML XSLT toolkit  
> http://xmlsoft.org/
> http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/
>

References:
- Re: [xml-dev] Specifying a Unicode subset
  - From: Daniel Veillard <veillard@redhat.com>

Prev by Date: Re: [xml-dev] The Browser Wars are Dead! Long Live the Browser Wars!
Next by Date: Re: [xml-dev] XML as "passive data" (Re: [xml-dev] The Browser Wars are Dead! Long Live the Browser Wars!)
Previous by thread: Re: [xml-dev] Specifying a Unicode subset
Next by thread: Re: [xml-dev] Specifying a Unicode subset
Index(es):
- Date
- Thread