[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]
Re: [xml-dev] [Summary] UTF-8 Question: e with acute accent shouldrequire two bytes, right?
- From: Philippe Poulard <philippe.poulard@sophia.inria.fr>
- To: Richard Tobin <richard@inf.ed.ac.uk>
- Date: Mon, 01 Oct 2007 10:28:22 +0200
Richard Tobin a écrit :
> In article <006501c80214$2a2a5d00$8901a8c0@aldebaran> you write:
>
>>>> It is not correct to say that a Unicode character can be either an
>>>> "ASCII character" or a "non-ASCII character". It is better
>>> to say that
>>>> some Unicode characters (those with codes below 128) have a
>>>> corresponding character in ASCII.
>
>>> On what do you base this assertion? Why do you think the
>>> ASCII characters are not the same characters that appear in
>>> Unicode?
>
>> That's not what I said nor what I think.
>
> So if the ASCII characters *are* the same ones that appear in Unicode,
> why is it not correct to say that Unicode characters are either ASCII
> or non-ASCII characters?
because US-ASCII is both a charset and encoding method, whereas Unicode
is just a charset, that can be encoded in several encodings (UCS-4,
UTF-8) ; charsets are usually subsets of unicode (do you know a charset
that has a character that is not in unicode ?) ; some charsets are
compatible by zero-extension with unicode, this is the case of US-ASCII :
Bits Encoding Hex Dec Char Binary
7 US-ASCII 41 65 A 1000001
8 ASCII 8bits 41 65 A 01000001
16 UCS-2 41 65 A 00000000 01000001
32 UCS-4 41 65 A 00000000 00000000 00000000 01000001
--
Cordialement,
///
(. .)
--------ooO--(_)--Ooo--------
| Philippe Poulard |
-----------------------------
http://reflex.gforge.inria.fr/
Have the RefleX !
[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]