Re: [xml-dev] [Summary] UTF-8 Question: e with acute accent shouldrequi

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]

Re: [xml-dev] [Summary] UTF-8 Question: e with acute accent shouldrequire two bytes, right?

From: Philippe Poulard <philippe.poulard@sophia.inria.fr>
To: Richard Tobin <richard@inf.ed.ac.uk>
Date: Mon, 01 Oct 2007 10:28:22 +0200

Richard Tobin a �crit :
> In article <006501c80214$2a2a5d00$8901a8c0@aldebaran> you write:
> 
>>>> It is not correct to say that a Unicode character can be either an 
>>>> "ASCII character" or a "non-ASCII character".  It is better 
>>> to say that 
>>>> some Unicode characters (those with codes below 128) have a 
>>>> corresponding character in ASCII.
> 
>>> On what do you base this assertion?  Why do you think the 
>>> ASCII characters are not the same characters that appear in 
>>> Unicode?  
> 
>> That's not what I said nor what I think.
> 
> So if the ASCII characters *are* the same ones that appear in Unicode,
> why is it not correct to say that Unicode characters are either ASCII
> or non-ASCII characters?

because US-ASCII is both a charset and encoding method, whereas Unicode 
is just a charset, that can be encoded in several encodings (UCS-4, 
UTF-8) ; charsets are usually subsets of unicode (do you know a charset 
that has a character that is not in unicode ?) ; some charsets are 
compatible by zero-extension with unicode, this is the case of US-ASCII :

Bits Encoding    Hex Dec Char                 Binary
  7   US-ASCII     41  65  A                               1000001
  8   ASCII 8bits  41  65  A                              01000001
16   UCS-2        41  65  A                     00000000 01000001
32   UCS-4        41  65  A   00000000 00000000 00000000 01000001

-- 
Cordialement,

               ///
              (. .)
  --------ooO--(_)--Ooo--------
|      Philippe Poulard       |
  -----------------------------
  http://reflex.gforge.inria.fr/
        Have the RefleX !

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]