"Costello, Roger L." <costello@mitre.org> wrote on 03/18/2010 02:13:00 PM:
> Hi Folks,
>
> Consider this simple XML document. The XML declaration specifies an
> encoding of US-ASCII. The document contains an œ ligature, which is
> not a US-ASCII character:
>
> <?xml version="1.0" encoding="US-ASCII"?>
> <Family-Name>Lecœur</Family-Name>
>
> Should I get an error when I check this document for well-
> formedness? I checked and didn't get an error. I was surprised, as I
> thought that this would be a well-formedness error.
It's not well-formed.
From the XML 1.0 spec [1]:
"It is a fatal error if an XML entity is determined (via default, encoding declaration, or higher-level protocol) to be in a certain encoding but contains byte sequences that are not legal in that encoding."
If you tried this with Xerces-J you'd get a fatal error.
> /Roger
>
> _______________________________________________________________________
>
> XML-DEV is a publicly archived, unmoderated list hosted by OASIS
> to support XML implementation and development. To minimize
> spam in the archives, you must subscribe before posting.
>
> [Un]Subscribe/change address: http://www.oasis-open.org/mlmanage/
> Or unsubscribe: xml-dev-unsubscribe@lists.xml.org
> subscribe: xml-dev-subscribe@lists.xml.org
> List archive: http://lists.xml.org/archives/xml-dev/
> List Guidelines: http://www.oasis-open.org/maillists/guidelines.php
[1] http://www.w3.org/TR/2006/REC-xml-20060816/#charencoding
Michael Glavassevich
XML Parser Development
IBM Toronto Lab
E-mail: mrglavas@ca.ibm.com
E-mail: mrglavas@apache.org