[
Lists Home |
Date Index |
Thread Index
]
At 22:53 16-12-2001, Neeraja Divakaruni wrote:
>We are getting an exception while parsing the XML document using Oracle
>parser "parseCLOB" procedure.. The exception is "5-byte UTF-8 encoding
>not supported".
>
>One more observation is the other foreign characters like æ , Æ , Ø (
>these are also danish characters) etc we are getting ane exception "
>Invalid UTF8 encoding".
>What can be the possible causes for these two exceptions ?? Please do
>respond..
It sounds like the parser is trying to parse the CLOB as UTF-8 despite its
actual encoding. The character "ø" is 0xF8 (11111000) in ISO 8859-1, which
would be interpreted as the start of a 5-byte UTF-8 sequence; the other
characters you mention are not valid UTF-8 sequence starters.
What code are you using to parse the CLOB and to set the encoding? I
suspect that, rather than simply inserting an XML declaration in the CLOB,
you need to actually instruct the parser what encoding to use for reading
the input.
~Chris
--
Christopher R. Maden, Principal Consultant, HMM Consulting Int'l, Inc.
DTDs/schemas - conversion - ebooks - publishing - Web - B2B - training
<URL: http://www.hmmci.com/ > <URL: http://crism.maden.org/consulting/ >
PGP Fingerprint: BBA6 4085 DED0 E176 D6D4 5DFC AC52 F825 AFEC 58DA
PGP signature
|