XML.orgXML.org
FOCUS AREAS |XML-DEV |XML.org DAILY NEWSLINK |REGISTRY |RESOURCES |ABOUT
OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]
RE: [xml-dev] UTF-8 Question: e with acute accent should require two bytes, right?

Not sure what editors you are using, but I used Eclipse to create a UTF-8
file, then viewed it using "od -t x1".

My file says "My résumé". It is output by dump as follows:

0000000 4d 79 20 72 c3 a9 73 75 6d c3 a9

The é is encoded as c3a9.

Which editor did you use to originally create the file?

-Erik


-----Original Message-----
From: Costello, Roger L. [mailto:costello@mitre.org] 
Sent: September 28, 2007 11:13 AM
To: xml-dev@lists.xml.org
Subject: [xml-dev] UTF-8 Question: e with acute accent should require two
bytes, right?

Hi Folks,
 
Consider this element:
 
<title>My Resumé</title>

Notice: é (the character "e" with an acute accent). It is U-00E9

Since its code point is greater than U+0080, it requires more than one
byte. 

Hex E9 = Decimal 233.  This has the binary: 11101001

I believe that it is encoded in UTF-8 as two bytes:

  11000011 10101001

These bytes correspond to hex C3 and hex A9.

Thus, é should be encoded in UTF-8 as:

  C3A9

The code points of the other characters (My Resum) are all less than
U-0080, and so the UTF-8 encoding of those characters should be only
one byte.

So, this is what I believe should be the bytes:

 M y    R  e s  u m   é
4D79 2052 6573 756D C3A9

Do you agree?

However, when I view the bytes in my hex editor I get this:

 M y    R  e s  u m  é
4D79 2052 6573 756D E9

Notice that é uses only one byte.

Something is wrong.  Here's what I think may be wrong:
- the editor that I am using to display the hex values is displaying
the code points and not the hex values. However, I have now tried two
editors, and they both display the same thing (E9).  So perhaps the
editor isn't the problem.  Perhaps I'm the problem, and am
misunderstanding something.  Help!

/Roger


_______________________________________________________________________

XML-DEV is a publicly archived, unmoderated list hosted by OASIS
to support XML implementation and development. To minimize
spam in the archives, you must subscribe before posting.

[Un]Subscribe/change address: http://www.oasis-open.org/mlmanage/
Or unsubscribe: xml-dev-unsubscribe@lists.xml.org
subscribe: xml-dev-subscribe@lists.xml.org
List archive: http://lists.xml.org/archives/xml-dev/
List Guidelines: http://www.oasis-open.org/maillists/guidelines.php

smime.p7s



[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 1993-2007 XML.org. This site is hosted by OASIS