[
Lists Home |
Date Index |
Thread Index
]
- To: xml-dev@lists.xml.org
- Subject: baffling encoding problem
- From: Bryan Rasmussen <bry@itnisk.com>
- Date: Wed, 9 Feb 2005 13:45:44 +0100
- User-agent: Internet Messaging Program (IMP) 3.2.2
Hi,
I need some insight on encoding here. I have an xml instance that has passed
through some parties before it got to me.
I'm on a windows system. The instance has a declared encoding of UTF-8 and a
BOM
of UTF-8 (EF BB BF).
In this xml there is a textnode that should have the word Børnehave.
When I open it in various text editors, notepad, wordpad, it shows Brnhave.
When
I open it in various xml editors some crash saying there is an encoding
problem,
also when I open it in IE. Some editors however feel that it is correct, for
example Crimson and XML SPY, although both of them display it without the ø
sign. When I open it in 010 editor, the hex editor i use, it shows 42F8 726E
6568 6176 6 which that's right isn't it? I copy that I get Børnehave.
When I try to load the instance in a DOM using MSXML I get an encoding error,
when I run it through XSV to validate I get
Input error: Illegal UTF-8 byte 2 <0x72> at file offset 1070
in unnamed entity at line 18 char 27
which is the r, of course it is at the position where the ø should be.
If I save it in any editor that feels it is correct I will then have a file that
will open in all the other editors and in IE, load it in dom etc.
--
Bryan Rasmussen
|