OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]
RE: [xml-dev] Is it a well-formedness error to use a character not in theencoding specified by the XML declaration?

"Michael Kay" <mike@saxonica.com> wrote on 03/18/2010 03:01:53 PM:

>> It's not well-formed.
>> From the XML 1.0 spec [1]:
>> "It is a fatal error if an XML entity is determined (via default,
>> encoding declaration, or higher-level protocol) to be in a certain
>> encoding but contains byte sequences that are not legal in that encoding."

> Unless of course there is a "higher-level protocol" that tells you
> it's really a different encoding. (The term higher-level protocol is
> not really defined. I think they had in mind the media-type from the
> HTTP content header.

Right or through other methods like InputSource.setEncoding() in the SAX API. I was assuming (for Roger) that it was being determined by the encoding declaration when I gave my overly simplistic answer.

> In terms of the protocol stack, that of course
> is a lower-level protocol. But it's sufficiently woolly that a phone
> call from the sender to say "Oops, I meant EBCDIC" would be enough
> to make the document well-formed.

> Regards,
> Michael Kay
> http://www.saxonica.com/
> http://twitter.com/michaelhkay


Michael Glavassevich
XML Parser Development
IBM Toronto Lab
E-mail: mrglavas@ca.ibm.com

E-mail: mrglavas@apache.org

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 1993-2007 XML.org. This site is hosted by OASIS