OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help



   Re: [xml-dev] 5-byte UTF-8 encoding not supported

[ Lists Home | Date Index | Thread Index ]

At 22:53 16-12-2001, Neeraja Divakaruni wrote:
>We are getting an exception while parsing the XML document using Oracle
>parser "parseCLOB"  procedure.. The exception is "5-byte UTF-8 encoding
>not supported".
>One more observation is the other foreign characters like  ,  ,  (
>these are also danish characters) etc we are getting ane exception "
>Invalid UTF8 encoding".
>What can be the possible causes for these two exceptions ?? Please do

It sounds like the parser is trying to parse the CLOB as UTF-8 despite its 
actual encoding.  The character "" is 0xF8 (11111000) in ISO 8859-1, which 
would be interpreted as the start of a 5-byte UTF-8 sequence; the other 
characters you mention are not valid UTF-8 sequence starters.

What code are you using to parse the CLOB and to set the encoding?  I 
suspect that, rather than simply inserting an XML declaration in the CLOB, 
you need to actually instruct the parser what encoding to use for reading 
the input.

Christopher R. Maden, Principal Consultant, HMM Consulting Int'l, Inc.
DTDs/schemas - conversion - ebooks - publishing - Web - B2B - training
<URL: http://www.hmmci.com/ > <URL: http://crism.maden.org/consulting/ >
PGP Fingerprint: BBA6 4085 DED0 E176 D6D4  5DFC AC52 F825 AFEC 58DA

PGP signature


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS