OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

 


 

   Re: [xml-dev] 5-byte UTF-8 encoding not supported

[ Lists Home | Date Index | Thread Index ]

At 22:53 16-12-2001, Neeraja Divakaruni wrote:
>We are getting an exception while parsing the XML document using Oracle
>parser "parseCLOB"  procedure.. The exception is "5-byte UTF-8 encoding
>not supported".
>
>One more observation is the other foreign characters like æ , Æ , Ø (
>these are also danish characters) etc we are getting ane exception "
>Invalid UTF8 encoding".
>What can be the possible causes for these two exceptions ?? Please do
>respond..

It sounds like the parser is trying to parse the CLOB as UTF-8 despite its 
actual encoding.  The character "ø" is 0xF8 (11111000) in ISO 8859-1, which 
would be interpreted as the start of a 5-byte UTF-8 sequence; the other 
characters you mention are not valid UTF-8 sequence starters.

What code are you using to parse the CLOB and to set the encoding?  I 
suspect that, rather than simply inserting an XML declaration in the CLOB, 
you need to actually instruct the parser what encoding to use for reading 
the input.

~Chris
-- 
Christopher R. Maden, Principal Consultant, HMM Consulting Int'l, Inc.
DTDs/schemas - conversion - ebooks - publishing - Web - B2B - training
<URL: http://www.hmmci.com/ > <URL: http://crism.maden.org/consulting/ >
PGP Fingerprint: BBA6 4085 DED0 E176 D6D4  5DFC AC52 F825 AFEC 58DA

PGP signature





 

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS