Lists Home |
Date Index |
At 22:53 16-12-2001, Neeraja Divakaruni wrote:
>We are getting an exception while parsing the XML document using Oracle
>parser "parseCLOB" procedure.. The exception is "5-byte UTF-8 encoding
>One more observation is the other foreign characters like æ , Æ , Ø (
>these are also danish characters) etc we are getting ane exception "
>Invalid UTF8 encoding".
>What can be the possible causes for these two exceptions ?? Please do
It sounds like the parser is trying to parse the CLOB as UTF-8 despite its
actual encoding. The character "ø" is 0xF8 (11111000) in ISO 8859-1, which
would be interpreted as the start of a 5-byte UTF-8 sequence; the other
characters you mention are not valid UTF-8 sequence starters.
What code are you using to parse the CLOB and to set the encoding? I
suspect that, rather than simply inserting an XML declaration in the CLOB,
you need to actually instruct the parser what encoding to use for reading
Christopher R. Maden, Principal Consultant, HMM Consulting Int'l, Inc.
DTDs/schemas - conversion - ebooks - publishing - Web - B2B - training
<URL: http://www.hmmci.com/ > <URL: http://crism.maden.org/consulting/ >
PGP Fingerprint: BBA6 4085 DED0 E176 D6D4 5DFC AC52 F825 AFEC 58DA