XML.orgXML.org
FOCUS AREAS |XML-DEV |XML.org DAILY NEWSLINK |REGISTRY |RESOURCES |ABOUT
OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]
RE: [xml-dev] UTF-8 Question: e with acute accent should require two bytes, right?

Hex editors show you what they've got in memory, not what's on the disk. So
this tells you that the editor has converted the data to iso-8859-1 or
something similar for processing in memory.

Michael Kay
http://www.saxonica.com/ 

> -----Original Message-----
> From: Costello, Roger L. [mailto:costello@mitre.org] 
> Sent: 28 September 2007 16:13
> To: xml-dev@lists.xml.org
> Subject: [xml-dev] UTF-8 Question: e with acute accent should 
> require two bytes, right?
> 
> Hi Folks,
>  
> Consider this element:
>  
> <title>My Resumé</title>
> 
> Notice: é (the character "e" with an acute accent). It is U-00E9
> 
> Since its code point is greater than U+0080, it requires more 
> than one byte. 
> 
> Hex E9 = Decimal 233.  This has the binary: 11101001
> 
> I believe that it is encoded in UTF-8 as two bytes:
> 
>   11000011 10101001
> 
> These bytes correspond to hex C3 and hex A9.
> 
> Thus, é should be encoded in UTF-8 as:
> 
>   C3A9
> 
> The code points of the other characters (My Resum) are all 
> less than U-0080, and so the UTF-8 encoding of those 
> characters should be only one byte.
> 
> So, this is what I believe should be the bytes:
> 
>  M y    R  e s  u m   é
> 4D79 2052 6573 756D C3A9
> 
> Do you agree?
> 
> However, when I view the bytes in my hex editor I get this:
> 
>  M y    R  e s  u m  é
> 4D79 2052 6573 756D E9
> 
> Notice that é uses only one byte.
> 
> Something is wrong.  Here's what I think may be wrong:
> - the editor that I am using to display the hex values is 
> displaying the code points and not the hex values. However, I 
> have now tried two editors, and they both display the same 
> thing (E9).  So perhaps the editor isn't the problem.  
> Perhaps I'm the problem, and am misunderstanding something.  Help!
> 
> /Roger
> 
> 
> ______________________________________________________________
> _________
> 
> XML-DEV is a publicly archived, unmoderated list hosted by 
> OASIS to support XML implementation and development. To 
> minimize spam in the archives, you must subscribe before posting.
> 
> [Un]Subscribe/change address: http://www.oasis-open.org/mlmanage/
> Or unsubscribe: xml-dev-unsubscribe@lists.xml.org
> subscribe: xml-dev-subscribe@lists.xml.org List archive: 
> http://lists.xml.org/archives/xml-dev/
> List Guidelines: http://www.oasis-open.org/maillists/guidelines.php
> 



[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 1993-2007 XML.org. This site is hosted by OASIS