"Michael Kay" <mike@saxonica.com> wrote on 03/18/2010 03:01:53 PM:
>> It's not well-formed.
>>
>> From the XML 1.0 spec [1]:
>> "It is a fatal error if an XML entity is determined (via default,
>> encoding declaration, or higher-level protocol) to be in a certain
>> encoding but contains byte sequences that are not legal in that encoding."
>
>
> Unless of course there is a "higher-level protocol" that tells you
> it's really a different encoding. (The term higher-level protocol is
> not really defined. I think they had in mind the media-type from the
> HTTP content header.
Right or through other methods like InputSource.setEncoding() in the SAX API. I was assuming (for Roger) that it was being determined by the encoding declaration when I gave my overly simplistic answer.
> In terms of the protocol stack, that of course
> is a lower-level protocol. But it's sufficiently woolly that a phone
> call from the sender to say "Oops, I meant EBCDIC" would be enough
> to make the document well-formed.
>
> Regards,
>
> Michael Kay
> http://www.saxonica.com/
> http://twitter.com/michaelhkay
Thanks.
Michael Glavassevich
XML Parser Development
IBM Toronto Lab
E-mail: mrglavas@ca.ibm.com
E-mail: mrglavas@apache.org