[
Lists Home |
Date Index |
Thread Index
]
- From: Rick JELLIFFE <ricko@geotempo.com>
- Date: Sun, 09 Jul 2000 23:50:17 +0800
Lucio Piccoli wrote:
>
> thanks for your response Rick,
>
> > If you need to be able to pin down the specific encoding problem, some
> > extra info would be helpful:
> > - Can you tell us what the particular UTF-8 encoding error is?
>
> org.xml.sax.SAXParseException: Character conversion error: "Unconvertible
> UTF-8 character beginning with 0x96" (line number may be too low).
In UTF-8, there is no legitimate code sequence with that code.
> Nope. The HTTP headers do not contain the UTF-8 charset attribute, but why
> is that a problem? Does the parser read the HTTP headers to determine the
> encoding? Shouldn't it simply stream the content body?
If it is sent over HTTP, the HTTP header has priority over the document
header, if the document is sent text/xml. If the HTTP header does not
specify an encoding, the default of ISO 8859-1 may be used. Get them to
send application/xml and/or get them to explicitly set the encoding of
UTF-8 in the HTTP header.
> my hex editor does the same.
> sequence 20 96 20.
> So from the UTF-8 spec 0x96 is way beyond 0x7F.
Yes. That is definitely not UTF-8. It is not a legitimate ISO 8859-n
codepoint either. There is no UTF-8 character which is represented by
a code-point greater than hex 80 followed and preceded by codepoints
less-than hex 80.
> So do i have a case for telling my supplier to correctly validate his
> documents?
Definitely. Try a concilatory approach: ask them what the character is
supposed to be.
Rick Jelliffe
***************************************************************************
This is xml-dev, the mailing list for XML developers.
To unsubscribe, mailto:majordomo@xml.org&BODY=unsubscribe%20xml-dev
List archives are available at http://xml.org/archives/xml-dev/
***************************************************************************
|