[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]
Re: [xml-dev] [Summary] UTF-8 Question: e with acute accent shouldrequire two bytes, right?
- From: Bill Kearney <wkearney99@hotmail.com>
- To: xml-dev@lists.xml.org
- Date: Sat, 29 Sep 2007 14:42:40 -0400
Alessandro Triglia wrote:
> It is not correct to say that a Unicode character can be either an "ASCII
> character" or a "non-ASCII character". It is better to say that some
> Unicode characters (those with codes below 128) have a corresponding
> character in ASCII.
>
Who said anything about ASCII? That just muddies up the water.
The representation of that character as E9 presumably comes from the
editor in question basing itself on ISO-8859-x (but only in SOME of
them). Not ASCII.
It's not uncommon for text editors to get this wrong, or make
assumptions about the encoding based on several other factors. If your
underlying OS is 'misconfigured' it can get even more confusing. The
tools start trying to "help you" by translating things. This is nearly
never helpful for developers trying to wrestle with encoding. For the
average wage slave just trying to cut-and-paste between different
applications it's usually not (too much) of a problem.
And to throw another monkey into the wrench, when you use numeric
entities in XML they're ALWAYS indicated using ISO 10646 regardless of
the document's declarations. Thus even in an ISO-8859-1 XML document
you would not use é for it, you'd have to use 쎩 \
Encoding, it's turtles ALL the way down.
But none of this really has anything to do with "ASCII" so just ditch
that nonsense.
-Bill Kearney
Syndic8.com
[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]