XML.orgXML.org
FOCUS AREAS |XML-DEV |XML.org DAILY NEWSLINK |REGISTRY |RESOURCES |ABOUT
OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]
Re: [xml-dev] UTF-8 Question: e with acute accent should requiretwo bytes, right?

Hi Roger,

UTF-8 uses an 8 bit encoding. E9 fits in 8 bits. It doesn't fit in 7, 
but there's no such thing as UTF-7, the problem you refer to is an ASCII 
7-bit problem. Since 8 bits represents twice as many characters as 7 
bits, it's enough to represent most European languages using one byte 
per character.

Jonathan

Costello, Roger L. wrote:
> Hi Folks,
>  
> Consider this element:
>  
> <title>My Resumé</title>
>
> Notice: é (the character "e" with an acute accent). It is U-00E9
>
> Since its code point is greater than U+0080, it requires more than one
> byte. 
>
> Hex E9 = Decimal 233.  This has the binary: 11101001
>
> I believe that it is encoded in UTF-8 as two bytes:
>
>   11000011 10101001
>
> These bytes correspond to hex C3 and hex A9.
>
> Thus, é should be encoded in UTF-8 as:
>
>   C3A9
>
> The code points of the other characters (My Resum) are all less than
> U-0080, and so the UTF-8 encoding of those characters should be only
> one byte.
>
> So, this is what I believe should be the bytes:
>
>  M y    R  e s  u m   é
> 4D79 2052 6573 756D C3A9
>
> Do you agree?
>
> However, when I view the bytes in my hex editor I get this:
>
>  M y    R  e s  u m  é
> 4D79 2052 6573 756D E9
>
> Notice that é uses only one byte.
>
> Something is wrong.  Here's what I think may be wrong:
> - the editor that I am using to display the hex values is displaying
> the code points and not the hex values. However, I have now tried two
> editors, and they both display the same thing (E9).  So perhaps the
> editor isn't the problem.  Perhaps I'm the problem, and am
> misunderstanding something.  Help!
>
> /Roger
>
>
> _______________________________________________________________________
>
> XML-DEV is a publicly archived, unmoderated list hosted by OASIS
> to support XML implementation and development. To minimize
> spam in the archives, you must subscribe before posting.
>
> [Un]Subscribe/change address: http://www.oasis-open.org/mlmanage/
> Or unsubscribe: xml-dev-unsubscribe@lists.xml.org
> subscribe: xml-dev-subscribe@lists.xml.org
> List archive: http://lists.xml.org/archives/xml-dev/
> List Guidelines: http://www.oasis-open.org/maillists/guidelines.php
>
>
>   



[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 1993-2007 XML.org. This site is hosted by OASIS