Re: [xml-dev] [Summary] UTF-8 Question: e with acute accent shouldrequi

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]

Re: [xml-dev] [Summary] UTF-8 Question: e with acute accent shouldrequire two bytes, right?

From: Bill Kearney <wkearney99@hotmail.com>
To: xml-dev@lists.xml.org
Date: Sat, 29 Sep 2007 14:42:40 -0400

Alessandro Triglia wrote:
> It is not correct to say that a Unicode character can be either an "ASCII
> character" or a "non-ASCII character".  It is better to say that some
> Unicode characters (those with codes below 128) have a corresponding
> character in ASCII.
>   
Who said anything about ASCII?  That just muddies up the water. 

The representation of that character as E9 presumably comes from the 
editor in question basing itself on ISO-8859-x (but only in SOME of 
them).  Not ASCII.

It's not uncommon for text editors to get this wrong, or make 
assumptions about the encoding based on several other factors.  If your 
underlying OS is 'misconfigured' it can get even more confusing.  The 
tools start trying to "help you" by translating things.  This is nearly 
never helpful for developers trying to wrestle with encoding.  For the 
average wage slave just trying to cut-and-paste between different 
applications it's usually not (too much) of a problem.

And to throw another monkey into the wrench, when you use numeric 
entities in XML they're ALWAYS indicated using ISO 10646 regardless of 
the document's declarations.  Thus even in an ISO-8859-1 XML document 
you would not use &#xE9; for it, you'd have to use &#xC3A9;   \

Encoding, it's turtles ALL the way down.

But none of this really has anything to do with "ASCII" so just ditch 
that nonsense.

-Bill Kearney
Syndic8.com

Follow-Ups:
- Re: [xml-dev] [Summary] UTF-8 Question: e with acute accent shouldrequire two bytes, right?
  - From: Julian Reschke <julian.reschke@gmx.de>
- RE: [xml-dev] [Summary] UTF-8 Question: e with acute accent should require two bytes, right?
  - From: "Alessandro Triglia" <sandro@mclink.it>

References:
- UTF-8 Question: e with acute accent should require two bytes, right?
  - From: "Costello, Roger L." <costello@mitre.org>
- [Summary] UTF-8 Question: e with acute accent should require two bytes, right?
  - From: "Costello, Roger L." <costello@mitre.org>
- RE: [xml-dev] [Summary] UTF-8 Question: e with acute accent should require two bytes, right?
  - From: "Alessandro Triglia" <sandro@mclink.it>

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]