OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]
Re: [xml-dev] [Summary] UTF-8 Question: e with acute accent shouldrequire two bytes, right?

Alessandro Triglia wrote:
> It is not correct to say that a Unicode character can be either an "ASCII
> character" or a "non-ASCII character".  It is better to say that some
> Unicode characters (those with codes below 128) have a corresponding
> character in ASCII.
Who said anything about ASCII?  That just muddies up the water. 

The representation of that character as E9 presumably comes from the 
editor in question basing itself on ISO-8859-x (but only in SOME of 
them).  Not ASCII.

It's not uncommon for text editors to get this wrong, or make 
assumptions about the encoding based on several other factors.  If your 
underlying OS is 'misconfigured' it can get even more confusing.  The 
tools start trying to "help you" by translating things.  This is nearly 
never helpful for developers trying to wrestle with encoding.  For the 
average wage slave just trying to cut-and-paste between different 
applications it's usually not (too much) of a problem.

And to throw another monkey into the wrench, when you use numeric 
entities in XML they're ALWAYS indicated using ISO 10646 regardless of 
the document's declarations.  Thus even in an ISO-8859-1 XML document 
you would not use é for it, you'd have to use 쎩   \

Encoding, it's turtles ALL the way down.

But none of this really has anything to do with "ASCII" so just ditch 
that nonsense.

-Bill Kearney

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 1993-2007 XML.org. This site is hosted by OASIS