Lists Home |
Date Index |
- To: firstname.lastname@example.org
- Subject: baffling encoding problem
- From: Bryan Rasmussen <email@example.com>
- Date: Wed, 9 Feb 2005 13:45:44 +0100
- User-agent: Internet Messaging Program (IMP) 3.2.2
I need some insight on encoding here. I have an xml instance that has passed
through some parties before it got to me.
I'm on a windows system. The instance has a declared encoding of UTF-8 and a
of UTF-8 (EF BB BF).
In this xml there is a textnode that should have the word BÝrnehave.
When I open it in various text editors, notepad, wordpad, it shows Brnhave.
I open it in various xml editors some crash saying there is an encoding
also when I open it in IE. Some editors however feel that it is correct, for
example Crimson and XML SPY, although both of them display it without the Ý
sign. When I open it in 010 editor, the hex editor i use, it shows 42F8 726E
6568 6176 6 which that's right isn't it? I copy that I get BÝrnehave.
When I try to load the instance in a DOM using MSXML I get an encoding error,
when I run it through XSV to validate I get
Input error: Illegal UTF-8 byte 2 <0x72> at file offset 1070
in unnamed entity at line 18 char 27
which is the r, of course it is at the position where the Ý should be.
If I save it in any editor that feels it is correct I will then have a file that
will open in all the other editors and in IE, load it in dom etc.