RE: [xml-dev] Heed this warning about Postel's Prescription

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]

From: Rick Jelliffe <rjelliffe@allette.com.au>
To: "Roger L. Costello" <costello@mitre.org>
Date: Tue, 30 Jun 2015 11:41:07 +1000

Nonono!

The BOM way 1. is allowed by XML and arose because of a gap in the Unicode specifications, and therefor an early ambiguity inherited by XML.

But way 2 completely goes against XML draconian error WF rules, and is the kind of muddle-headed hacking that has made i18n too difficult for most developers to understand or ever get right, with systems acting differently. Developers are, in general, fantastically willing to come up with the wrong theory about what is causing an encoding error, only matched by their determination to avoid looking at the actual byte codes directly, using a hex editor. The most common cause of 'decoding errors' is that the XML is being read using an encoding that does not match the actual encoding of the resource [ie the XML header was generated wrong at write time, and/or is not being used at read time] : allowing silent 'resynchronizing' corrupts the data, delays problem detection, and allows the developer to defraud their bosses by claiming to have implemented XML when all they have done is disable error detection.

Rick

On 29/06/2015 5:59 AM, "Costello, Roger L." <costello@mitre.org> wrote:

How might Postel's Law be applied to web services that receive XML and sends out XML?

Here are two ways:

1. The web service is willing to receive UTF-8 XML documents containing a pseudo-BOM. The web service sends out UTF-8 XML documents without a pseudo-BOM. [1]

2. The web service is willing to receive XML character streams with Unicode decoding errors: it processes the character stream by replacing the offending bytes by the Unicode replacement character U+FFFD until it manages to resynchronize the UTF-{8,16} byte stream. The web service sends out XML documents without character decoding errors. [2]

/Roger

[1] See Rick Jelliffe's post on the xml-dev list: http://lists.xml.org/archives/xml-dev/201506/msg00065.html

[2] See Daniel Bunzli's post on the unicode list: http://www.unicode.org/mail-arch/unicode-ml/y2015-m06/0247.html

References:
- Heed this warning about Postel's Prescription
  - From: "Costello, Roger L." <costello@mitre.org>
- Re: [xml-dev] Heed this warning about Postel's Prescription
  - From: Rick Jelliffe <rjelliffe@allette.com.au>
- RE: [xml-dev] Heed this warning about Postel's Prescription
  - From: "Costello, Roger L." <costello@mitre.org>

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]