XML.orgXML.org
FOCUS AREAS |XML-DEV |XML.org DAILY NEWSLINK |REGISTRY |RESOURCES |ABOUT
OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]
An XML document is not well-formed if encoding="..." does not matchthe actual encoding of the characters in the document, right?

Thanks Chris for pointing us to that article: XML on the Web has Failed

I am making my way through it. 

This statement in the article piqued my interest:

    ... determining the actual character encoding of an 
    XML document is a prerequisite for determining its 
    well-formedness ...

I decided to do an experiment. 

I created this XML document and encoded each character in the document using the iso-8859-1 encoding and in the encoding="..." I asserted that I am using the iso-8859-1 encoding:

<?xml version="1.0" encoding="iso-8859-1"?>
<Name>López</Name>

I checked the document for well-formedness and the XML parser said it is well-formed.

Good.

Then I changed encoding="iso-8859-1" to encoding="utf-8":

<?xml version="1.0" encoding="utf-8"?>
<Name>López</Name>

I checked it for well-formedness and the parser said it is still well-formed.

Huh? 

Shouldn't I have gotten a well-formedness error?

I did my experiment using the latest version of Oxygen XML. I think that it uses the Xerces XML Parser, right?

Is this a bug in Xerces?

/Roger




[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 1993-2007 XML.org. This site is hosted by OASIS