xml-dev - Re: [xml-dev] Microsoft FUD on binary XML...

Re: [xml-dev] Microsoft FUD on binary XML...

[ Lists Home | Date Index | Thread Index ]

To: xml-dev@lists.xml.org
Subject: Re: [xml-dev] Microsoft FUD on binary XML...
From: Tony Graham <Tony.Graham@Sun.COM>
Date: Fri, 21 Nov 2003 16:03:49 +0000 (GMT)
In-reply-to: <3FBE14D8.7040405@alaric-snell.com>
References: <3FBDF386.9090307@alaric-snell.com><20031121.121152.50253888.Tony.Graham@Sun.COM><3FBE14D8.7040405@alaric-snell.com>

Alaric B Snell <alaric@alaric-snell.com> wrote at Fri, 21 Nov 2003 13:36:24 +0000:
> Tony Graham wrote:
> 
> > Changing UTF-16 Chinese to UTF-8 means a 50% size increase for the
> > Chinese characters in the Basic Multilingual Plane (i.e., most of the
> > Chinese characters in the message) since as UTF-16, one Chinese
> > character is 16 bits, and as UTF-8, one Chinese character is three
> > bytes.
> 
> Exactly - efficient representation of Unicode text currently sadly 
> involves the user or the application doing a frequency analysis and 
> deciding whether to use UTF-8 or UTF-16... I think very, very, few do 
> this right now; UTF-8 seems the almost ubiquitous choice, mainly due to 
> the software industry being driven from places that use the Roman alphabet.
> 
> Perhaps we need a new UTF that loses many of UTF-8s nice properties with 
> respect to lexical sorting and so on, but is less discriminatory against 
> character sets that live far into the BMP, perhaps working along the 
> lines of:

For a moment there, I thought you were inventing SCSU [1].

You might also be interested in BOCU-1 [2].

Regards,


Tony Graham
------------------------------------------------------------------------
XML Technology Center - Dublin
Sun Microsystems Ireland Ltd                       Phone: +353 1 8199708
Hamilton House, East Point Business Park, Dublin 3            x(70)19708

[1] http://www.unicode.org/reports/tr6/
[2] http://www.unicode.org/notes/tn6/

References:
- Re: [xml-dev] Microsoft FUD on binary XML...
  - From: Alaric B Snell <alaric@alaric-snell.com>
- Re: [xml-dev] Microsoft FUD on binary XML...
  - From: Tony Graham <Tony.Graham@Sun.COM>
- Re: [xml-dev] Microsoft FUD on binary XML...
  - From: Alaric B Snell <alaric@alaric-snell.com>

Prev by Date: Merge xml documents with SAX
Next by Date: RE: [xml-dev] Merge xml documents with SAX
Previous by thread: Re: [xml-dev] Microsoft FUD on binary XML...
Next by thread: Re: [xml-dev] Microsoft FUD on binary XML...
Index(es):
- Date
- Thread