xml-dev - Re: [xml-dev] Use of UTF-8 and UTF-16

Re: [xml-dev] Use of UTF-8 and UTF-16

[ Lists Home | Date Index | Thread Index ]

To: Elliotte Harold <elharo@metalab.unc.edu>
Subject: Re: [xml-dev] Use of UTF-8 and UTF-16
From: Philippe Poulard <Philippe.Poulard@sophia.inria.fr>
Date: Wed, 02 Nov 2005 15:04:51 +0100
Cc: Rick Jelliffe <rjelliffe@allette.com.au>, Xml-Dev <xml-dev@lists.xml.org>
In-reply-to: <43660951.3090707@metalab.unc.edu>
References: <NBBBIBMKFOFCNEBAKDPLCEKIKDAA.xml-dev@boynings.co.uk> <39325.203.51.20.11.1130759065.squirrel@intranet.allette.com.au> <43660951.3090707@metalab.unc.edu>
User-agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.7) Gecko/20050511

Elliotte Harold wrote:
> Rick Jelliffe wrote:
> 
>> For CJK (Chinese, Japanese, Korean) XML documents, where three (or six)
>> bytes may be used by UTF-8 instead of UCS-16's two (or four), UTF-16 
>> files
>> will usually be smaller.
> 
> 
> First a correction: UTF-8 never uses six bytes for anything. The largest 
> UTF-8 character you'll ever see is 4 bytes wide.
> 

hi,

I read somewhere that :

UTF-8 uses 6 bytes for ISO/IEC 10646
UTF-8 uses 4 bytes for Unicode

Unicode is a subset of ISO/IEC 10646 (in terms of addressing)
ISO/IEC 10646 is a subset of Unicode (in terms of semantic)

XML uses Unicode

-- 
Cordialement,

            ///
           (. .)
  -----ooO--(_)--Ooo-----
|   Philippe Poulard    |
  -----------------------

Follow-Ups:
- Re: [xml-dev] Use of UTF-8 and UTF-16
  - From: Chris Gray <cpgray@library.uwaterloo.ca>
- Re: [xml-dev] Use of UTF-8 and UTF-16
  - From: richard@inf.ed.ac.uk (Richard Tobin)

Prev by Date: RE: [xml-dev] RE: description of the logical or semantic structure
Next by Date: RE: [xml-dev] RE: description of the logical or semantic structure
Previous by thread: RE: description of the logical or semantic structure
Next by thread: Re: [xml-dev] Use of UTF-8 and UTF-16
Index(es):
- Date
- Thread