=?UTF-8?Q?Re=3A_=5Bxml=2Ddev=5D_Should_XML_applications_follow

On Thu, Apr 19, 2018 at 11:55 PM, Rick Jelliffe <rjelliffe@allette.com.au> wrote:

* I think Postel's principle speaks about the risk of rarely-implemented things, in particular, optional things. So I don't believe that there is any danger of a lot of XML processors that don't handle > or hex characters or CDATA sections.

I agree with that as far as it goes. The reason for escaping all > characters is that although the rule that specifies when they MUST be escaped is well-embodied in software, it isn't well-known to document authors; it rarely comes up and so is easily missed. Similarly, it's easy to get the 9-character token "<![CDATA[" subtly wrong, in which case it will not do what the author expects.

* I was assuming that Postel's principle applies to implementation: minimising the use of PIs, namespaces, characters in data, are all authorial decisions. If an implementer refuses to transfer them as part of the data, their code is not "robust" it is "corrupting".

If you are writing a generic XMLWriter component, that's true. But it's commonplace for the "author" of a document nowadays to be another piece of software driven by higher-level concerns. If you are using XML to communicate, the details of how you use it are part of the implementation of that communication.

* I don't really concur with several of your other points: indeed, I think there is a case that to get robustness you really need to only use ASCII repertoire for direct characters, and you should use HNCR for everything else.are not in the same Unicode ranges as the names in markup should be HNCR.

That would make hash of non-English character content, violating Goal 6 ("XML documents should be human-legible and reasonably clear.") Writing French or Greek or Hindi text with HNCRs is a non-starter, and even if you are allowed to write using, there are also the script-specific punctuation marks that aren't allowed in names. You really don't want them to be HNCRs either.

In any case, my examples were just that, examples.

That being said, I think you do get a different set of "conservative" issues as soon as your XML has to be some other format at the same time as being XML: for example, that your XHTML must be text in some encoding, *AND* XML, *AND* HTML. Or that your line-oriented XML for AWK processing must be text in some encoding *AND* simple lines *AND XML.

Certainly. Although I note that there is an XML plugin for gawk, which supplements the BEGIN and END patterns with things like XMLSTARTELEMENT. Pretty neat. There are similar plugins for JSON and Postgres.

John Cowan http://vrici.lojban.org/~cowan cowan@ccil.org

weirdo: When is R7RS coming out?

Riastradh: As soon as the top is a beautiful golden brown and if you

stick a toothpick in it, the toothpick comes out dry.