[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]
Re: [xml-dev] [Summary] UTF-8 Question: e with acute accent shouldrequire two bytes, right?
- From: Bill Kearney <wkearney99@hotmail.com>
- To: Alessandro Triglia <sandro@mclink.it>, xml-dev@lists.xml.org,costello@mitre.org
- Date: Sat, 29 Sep 2007 16:42:03 -0400
Alessandro Triglia wrote:
> The whole discussion is about Unicode and ASCII! It started with the
> following sentence in Roger's document: "Here is a simple XML document.
> Most of its characters are ASCII, but there is one non-ASCII character, the
> é character"
No, it's not about ASCII. If it were then the accented character would
never have come up, as it CANNOT BE REPRESENTED IN ASCII. Not at all.
I haven't minutely examined the entire thread of messages, but I believe
it was YOU, not the original post made by Roger that brought up this
whole ASCII nonsense.
The accented characters *CAN* be represented in ISO-8859 (most of them,
anyway) and that's probably what his underlying OS and tools are assuming.
So I think the answer to Roger's question depends on what encoding he
thinks he's using, and what the tools think. It would appear to me that
his example was from a document using ISO-8859, not actually UTF-8. If
that were the case, then seeing that accented character as E9 would be
completely correct. But if he WANTED to be using UTF-8 then E9 would
not be correct.
And note I'm using the label ISO-8859 without the trailing -digit. Not
to be inaccurate but to avoid opening up a whole other huge can of
worms. The accented e character can be represented as E9 is most, but
not all, of the ISO-8859 variants. I don't know which one he's using,
and frankly don't think it would be useful to assume. But it's probably
not important, certainly not as important as quashing this whole ASCII
nonsense.
When dealing with encodings the devil is in the details and there are a
LOT of subtle nuances that make HUGE differences. The last thing anyone
SERIOUS about character handling needs to be basing their thinking on is
ASCII. Just stop.
-Bill Kearney
Syndic8.com
[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]