[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]
RE: [xml-dev] UTF-8 Question: e with acute accent should require twobytes, right?
- From: "Waters, Michael, Springer US" <Mike.Waters@springer.com>
- To: "Costello, Roger L." <costello@mitre.org>, xml-dev@lists.xml.org
- Date: Fri, 28 Sep 2007 12:51:56 -0400
> Notice: é (the character "e" with an acute accent). It is U-00E9
>
> Since its code point is greater than U+0080, it requires more than one
> byte.
It depends. In ISO 8859-1 (Latin-1) and Windows-1252 (the default for many editors), only 1 byte is required: 0xE9.
> Thus, é should be encoded in UTF-8 as:
>
> C3A9
Yes.
> Something is wrong. Here's what I think may be wrong:
> - the editor that I am using to display the hex values is displaying
> the code points and not the hex values. However, I have now tried two
> editors, and they both display the same thing (E9).
PSPad has 2 methods to invoke a hex view of a file, giving somewhat different results:
1. Open the file in the default Text Editor mode, then switch to View/Hex Edit Mode. Here, encoding conversions are coming into play, when switching views of the "bytes in memory."
2. Open the file directly in the Hex Editor, by selecting File/Open in Hex Editor. In this mode you get a better view of the "bytes on disk" without encoding conversions. When I come across encoding problems, this is the view that I use.
Perhaps the editors you've tried don't have the second type of hex view, which I think is what you want.
Mike Waters
[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]