[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]
RE: [xml-dev] [Summary] UTF-8 Question: e with acute accent should require two bytes, right?
- From: Amelia A Lewis <amyzing@talsever.com>
- To: xml-dev@lists.xml.org
- Date: Sat, 29 Sep 2007 13:59:28 -0400
On 2007-09-29 10:51:36 -0400 "Michael Kay" <mike@saxonica.com> wrote:
>> I read "ASCII character" in a similar way as I read "TCP/IP packet" or
>> "SOAP envelope" or "HTTP header". Perhaps other people read it
>> differently.
> No, I read it the same.
>
> I think that an ASCII character is a Unicode character in the same way that
> an XML document is an SGML document. One thing can conform to more than one
> description.
We were speaking specifically of "ASCII" and "UTF-8", no?
The ASCII character set is a proper subset of UTF-8 (and a proper subset of ISO-8859-x, and of several other encoding schemes). Identical bit-patterns identify identical characters.
So I agree that it is over-precise, tending toward confusion, to claim that the "A" in UTF-8 encoding is something different from "A" in ASCII encoding, or from "A" in ISO8859-1, -2, -8, or whatever, since *the design of those larger character repertoires deliberately and consciously intended to leave the ASCII subset unchanged.* And consequently it is perfectly correct to say that "A" is an ASCII character, but Á is not. (In this email, if I recall how I set up the client correctly, the latter is a UTF-8 encoded Latin capital A with acute accent; while this character is also found in the repertoire of ISO8859-1, it is encoded differently so that it is far more justifiable to claim that it is in some sense a "different" character (it is, at least, a different encoding of the character)).
Amy!
--
Amelia A. Lewis amyzing {at} talsever.com
A hundred thousand lemmings can't be wrong.
[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]