XML.orgXML.org
FOCUS AREAS |XML-DEV |XML.org DAILY NEWSLINK |REGISTRY |RESOURCES |ABOUT
OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]
RE: [xml-dev] Heed this warning about Postel's Prescription

Nonono! 

The BOM way 1. is allowed by XML and arose because of a gap in the Unicode specifications, and therefor an early ambiguity inherited by XML.

But way 2 completely goes against XML draconian error WF rules, and is the kind of muddle-headed hacking that has made i18n too difficult for most developers to understand or ever get right, with systems acting differently. Developers are, in general, fantastically willing to come up with the wrong theory about what is causing an encoding error, only matched by their determination to avoid looking at the actual byte codes directly, using a hex editor. The most common cause of 'decoding errors' is that the XML is being read using  an encoding that does not match the actual encoding of the resource [ie the XML header was generated wrong at write time, and/or is not being used at read time] : allowing silent 'resynchronizing' corrupts the data, delays problem detection, and allows the developer to defraud their bosses by claiming to have implemented XML when all they have done is disable error detection.

Rick

On 29/06/2015 5:59 AM, "Costello, Roger L." <costello@mitre.org> wrote:
How might Postel's Law be applied to web services that receive XML and sends out XML?

Here are two ways:

1. The web service is willing to receive UTF-8 XML documents containing a pseudo-BOM. The web service sends out UTF-8 XML documents without a pseudo-BOM. [1]

2. The web service is willing to receive XML character streams with Unicode decoding errors: it processes the character stream by replacing the offending bytes by the Unicode replacement character U+FFFD until it manages to resynchronize the UTF-{8,16} byte stream. The web service sends out XML documents without character decoding errors. [2]

/Roger

[1] See Rick Jelliffe's post on the xml-dev list: http://lists.xml.org/archives/xml-dev/201506/msg00065.html

[2] See Daniel Bunzli's post on the unicode list: http://www.unicode.org/mail-arch/unicode-ml/y2015-m06/0247.html


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 1993-2007 XML.org. This site is hosted by OASIS