OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]
RE: [xml-dev] UTF-8 Question: e with acute accent should require twobytes, right?

> Notice:  (the character "e" with an acute accent). It is U-00E9
> Since its code point is greater than U+0080, it requires more than one
> byte. 

It depends. In ISO 8859-1 (Latin-1) and Windows-1252 (the default for many editors), only 1 byte is required: 0xE9.

> Thus,  should be encoded in UTF-8 as:
>   C3A9


> Something is wrong.  Here's what I think may be wrong:
> - the editor that I am using to display the hex values is displaying
> the code points and not the hex values. However, I have now tried two
> editors, and they both display the same thing (E9).

PSPad has 2 methods to invoke a hex view of a file, giving somewhat different results:

1. Open the file in the default Text Editor mode, then switch to View/Hex Edit Mode. Here, encoding conversions are coming into play, when switching views of the "bytes in memory."

2. Open the file directly in the Hex Editor, by selecting File/Open in Hex Editor. In this mode you get a better view of the "bytes on disk" without encoding conversions. When I come across encoding problems, this is the view that I use.

Perhaps the editors you've tried don't have the second type of hex view, which I think is what you want.

Mike Waters

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 1993-2007 XML.org. This site is hosted by OASIS