OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

 


 

   baffling encoding problem

[ Lists Home | Date Index | Thread Index ]
  • To: xml-dev@lists.xml.org
  • Subject: baffling encoding problem
  • From: Bryan Rasmussen <bry@itnisk.com>
  • Date: Wed, 9 Feb 2005 13:45:44 +0100
  • User-agent: Internet Messaging Program (IMP) 3.2.2


Hi,
I need some insight on encoding here. I have an xml instance that has passed
through some parties before it got to me. 

I'm on a windows system. The instance has a declared encoding of UTF-8 and a
BOM
of UTF-8 (EF BB BF). 

In this xml there is a textnode that should have the word Børnehave. 
When I open it in various text editors, notepad, wordpad, it shows Brnhave.
When
I open it in various xml editors some crash saying there is an encoding
problem,
also when I open it in IE. Some editors however feel that it is correct, for
example Crimson and XML SPY, although both of them display it without the ø
sign.  When I open it in 010 editor, the hex editor i use, it shows 42F8 726E
6568 6176 6 which that's right isn't it? I copy that I get Børnehave. 

When I try to load the instance in a DOM using MSXML I get an encoding error, 
when I run it through XSV to validate I get 
 Input error: Illegal UTF-8 byte 2 <0x72> at file offset 1070
 in unnamed entity at line 18 char 27 
which is the r, of course it is at the position where the ø should be. 

If I save it in any editor that feels it is correct I will then have a file that
will open in all the other editors and in IE, load it in dom etc. 


 




-- 
Bryan Rasmussen









 

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS