=?UTF-8?Q?Re=3A_=5Bxml=2Ddev=5D_Should_XML_applications_follow

Three comments:

* I think Postel's principle speaks about the risk of rarely-implemented things, in particular, optional things. So I don't believe that there is any danger of a lot of XML processors that don't handle > or hex characters or CDATA sections. The DPH died in around 1998 when Larry Wall integrated expat for Perl. So I don't think they are actually conservative.

* I was assuming that Postel's principle applies to implementation: minimising the use of PIs, namespaces, characters in data, are all authorial decisions. If an implementer refuses to transfer them as part of the data, their code is not "robust" it is "corrupting".

* I don't really concur with several of your other points: indeed, I think there is a case that to get robustness you really need to only use ASCII repertoire for direct characters, and you should use HNCR for everything else. To avoid over-limiting or ASCIIcentric, I suppose you could say that any data character that are not in the same Unicode ranges as the names in markup should be HNCR.

That being said, I think you do get a different set of "conservative" issues as soon as your XML has to be some other format at the same time as being XML: for example, that your XHTML must be text in some encoding, *AND* XML, *AND* HTML. Or that your line-oriented XML for AWK processing must be text in some encoding *AND* simple lines *AND XML.

I think of Postel's principle as a kind of 80/20 rule: avoid requiring that someone downstream has implemented more than the easy 80 if you can. But I don't see that those minor syntax differences John suggests are really in anyone's "hard" 20: while things like validation or support for Astral Plane markup definitely may be.

Even for namespaces: maybe there is a case to be made that avoiding default namespacing plus no prefix remapping plus well-known prefixes is "conservative" because then in vanilla DOMs etc you only need to use the simple DOM and straight element names, and so avoid namespace lookup.

Regards

Rick

On Fri, Apr 20, 2018 at 12:13 PM, John Cowan <johnwcowan@gmail.com> wrote:

On Thu, Apr 12, 2018 at 8:34 PM, Rick Jelliffe <rjelliffe@allette.com.au> wrote:

So this only relates to optional parts of a spec. In XML there are only three optional things: version, standalone, character encoding.

While I agree with this post, there is rather more optionality in XML than that. Consider the following Postel-like suggestions:

Make sure that namespaces and prefixes are mapped 1:1 (sane documents, in Joe English's sense)

If possible, put all namespace declarations in the root element (namespace-normal documents).

Think twice before using namespaces at all.

If possible, keep Unicode noncharacters and control characters out of character content and attribute values.

Use hex (not decimal) character references only if required by non-Unicode-aware editing tools. Do not use .

Minimize the use of PIs.

Don't use CDATA sections, except in documents that are about markup.

Use the five named escapes, not their hex equivalents.

Always escape > characters.

--
John Cowan http://vrici.lojban.org/~cowan cowan@ccil.org
The Penguin shall hunt and devour all that is crufty, gnarly and
bogacious; all code which wriggles like spaghetti, or is infested with
blighting creatures, or is bound by grave and perilous Licences shall it
capture. And in capturing shall it replicate, and in replicating shall
it document, and in documentation shall it bring freedom, serenity and
most cool froodiness to the earth and all who code therein. --Gospel of Tux