OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]
Re: [xml-dev] [Summary] UTF-8 Question: e with acute accent shouldrequire two bytes, right?

Alessandro Triglia wrote:
> The whole discussion is about Unicode and ASCII!  It started with the
> following sentence in Roger's document:  "Here is a simple XML document.
> Most of its characters are ASCII, but there is one non-ASCII character, the
>  character"

No, it's not about ASCII.  If it were then the accented character would 
never have come up, as it CANNOT BE REPRESENTED IN ASCII.  Not at all.  

I haven't minutely examined the entire thread of messages, but I believe 
it was YOU, not the original post made by Roger that brought up this 
whole ASCII nonsense. 

The accented characters *CAN* be represented in ISO-8859 (most of them, 
anyway) and that's probably what his underlying OS and tools are assuming.

So I think the answer to Roger's question depends on what encoding he 
thinks he's using, and what the tools think.  It would appear to me that 
his example was from a document using ISO-8859, not actually UTF-8.   If 
that were the case, then seeing that accented character as E9 would be 
completely correct.  But if he WANTED to be using UTF-8 then E9 would 
not be correct. 

And note I'm using the label ISO-8859 without the trailing -digit.  Not 
to be inaccurate but to avoid opening up a whole other huge can of 
worms.  The accented e character can be represented as E9 is most, but 
not all, of the ISO-8859 variants.  I don't know which one he's using, 
and frankly don't think it would be useful to assume.  But it's probably 
not important, certainly not as important as quashing this whole ASCII 

When dealing with encodings the devil is in the details and there are a 
LOT of subtle nuances that make HUGE differences.  The last thing anyone 
SERIOUS about character handling needs to be basing their thinking on is 
ASCII.  Just stop.

-Bill Kearney

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 1993-2007 XML.org. This site is hosted by OASIS