XML.orgXML.org
FOCUS AREAS |XML-DEV |XML.org DAILY NEWSLINK |REGISTRY |RESOURCES |ABOUT
OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]
=?UTF-8?Q?Re=3A_=5Bxml=2Ddev=5D_Should_XML_applications_follow_Postel?==?UTF-8?Q?=E2=80=99s_Law=3F?=



On Thu, Apr 19, 2018 at 11:55 PM, Rick Jelliffe <rjelliffe@allette.com.au> wrote:

* I think Postel's principle speaks about the risk of rarely-implemented things, in particular, optional things. So I don't believe that there is any danger of a lot of XML processors that don't handle > or hex characters or CDATA sections.

I agree with that as far as it goes. The reason for escaping all > characters is that although the rule that specifies when they MUST be escaped is well-embodied in software, it isn't well-known to document authors; it rarely comes up and so is easily missed.  Similarly, it's easy to get the 9-character token "<![CDATA[" subtly wrong, in which case it will not do what the author expects.
 
* I was assuming that Postel's principle applies to implementation: minimising the use of PIs, namespaces, characters in data, are all authorial decisions. If an implementer refuses to transfer them as part of the data, their code is not "robust" it is "corrupting".  

If you are writing a generic XMLWriter component, that's true.  But it's commonplace for the "author" of a document nowadays to be another piece of software driven by higher-level concerns.  If you are using XML to communicate, the details of how you use it are part of the implementation of that communication.
 
* I don't really concur with several of your other points: indeed, I think there is a case that to get robustness you really need to only use ASCII repertoire for direct characters, and you should use HNCR for everything else.are not in the same Unicode ranges as the names in markup should be HNCR.

That would make hash of non-English character content, violating Goal 6 ("XML documents should be human-legible and reasonably clear.")  Writing French or Greek or Hindi text with HNCRs is a non-starter, and even if you are allowed to write using, there are also the script-specific punctuation marks that aren't allowed in names.  You really don't want them to be HNCRs either.

In any case, my examples were just that, examples.

That being said, I think you do get a different set of "conservative" issues as soon as your XML has to be some other format at the same time as being XML: for example, that your XHTML must be text in some encoding, *AND* XML, *AND* HTML.  Or that your line-oriented XML for AWK processing must be text in some encoding *AND* simple lines *AND XML. 

Certainly.  Although I note that there is an XML plugin for gawk, which supplements the BEGIN and END patterns with things like XMLSTARTELEMENT.  Pretty neat.  There are similar plugins for JSON and Postgres.

-- 
John Cowan          http://vrici.lojban.org/~cowan        cowan@ccil.org
weirdo:    When is R7RS coming out?
Riastradh: As soon as the top is a beautiful golden brown and if you
stick a toothpick in it, the toothpick comes out dry.



[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 1993-2007 XML.org. This site is hosted by OASIS