RE: [xml-dev] [Summary] UTF-8 Question: e with acute accent should requ

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]

RE: [xml-dev] [Summary] UTF-8 Question: e with acute accent should require two bytes, right?

From: Amelia A Lewis <amyzing@talsever.com>
To: xml-dev@lists.xml.org
Date: Sat, 29 Sep 2007 13:59:28 -0400

On 2007-09-29 10:51:36 -0400 "Michael Kay" <mike@saxonica.com> wrote:
>> I read "ASCII character" in a similar way as I read "TCP/IP packet" or 
>> "SOAP envelope" or "HTTP header".  Perhaps other people read it 
>> differently.
> No, I read it the same.
> 
> I think that an ASCII character is a Unicode character in the same way that
> an XML document is an SGML document. One thing can conform to more than one
> description.

We were speaking specifically of "ASCII" and "UTF-8", no?

The ASCII character set is a proper subset of UTF-8 (and a proper subset of ISO-8859-x, and of several other encoding schemes).  Identical bit-patterns identify identical characters.

So I agree that it is over-precise, tending toward confusion, to claim that the "A" in UTF-8 encoding is something different from "A" in ASCII encoding, or from "A" in ISO8859-1, -2, -8, or whatever, since *the design of those larger character repertoires deliberately and consciously intended to leave the ASCII subset unchanged.*  And consequently it is perfectly correct to say that "A" is an ASCII character, but Á is not.  (In this email, if I recall how I set up the client correctly, the latter is a UTF-8 encoded Latin capital A with acute accent; while this character is also found in the repertoire of ISO8859-1, it is encoded differently so that it is far more justifiable to claim that it is in some sense a "different" character (it is, at least, a different encoding of the character)).

Amy!
-- 
Amelia A. Lewis                    amyzing {at} talsever.com
A hundred thousand lemmings can't be wrong.

Follow-Ups:
- RE: [xml-dev] [Summary] UTF-8 Question: e with acute accent should require two bytes, right?
  - From: "Michael Kay" <mike@saxonica.com>
- RE: [xml-dev] [Summary] UTF-8 Question: e with acute accent should require two bytes, right?
  - From: "Alessandro Triglia" <sandro@mclink.it>

References:
- RE: [xml-dev] [Summary] UTF-8 Question: e with acute accent should require two bytes, right?
  - From: "Michael Kay" <mike@saxonica.com>

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]