I nominate Michael Kay's message for the xml-dev Hall of Fame

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]

From: "Costello, Roger L." <costello@mitre.org>
To: "xml-dev@lists.xml.org" <xml-dev@lists.xml.org>
Date: Sun, 8 Dec 2013 10:35:43 +0000

Hi Folks,

Michael Kay's most recent message deserves careful reading. And then read it again.

He talks about language paternalism versus language orthogonality.

He gave concrete examples of zero/emptiness rearing its head in the XML suite of technologies:

XSD: what is the meaning of a union type
with no member types?

XSD: what is the meaning of type defined
with an empty enumeration?

XSD: a choice is allowed to have no alternatives.

XSD: what is the meaning of a type where the
maxInclusive facet is less than the
minInclusive facet?

XQuery: expressions are disallowed whose value
can be statically determined to be an empty
sequence, unless it is explicitly written as
the literal "()"

XPath: certain functions, such as tokenize(), ban regular
expressions that allow the empty string.

Here is Michael's message:

During the WG meetings on XSD 1.1 I recall quite a few debates on topics related to this theme. Perhaps they arose there because the people on that group were particularly deep thinkers, perhaps because they were aware that decisions in XSD 1.0 had often been made inconsistently when such things came up.

For example, should you allow a union type with no member types? In fact XSD 1.1 defines such a type as a built-in type (called xs:error), but it does not allow users to define such a type. Similarly, it does not allow a type to be defined with an empty enumeration. It does, however, allow a complex type to be defined as a choice with no alternatives. The WG found these decisions troubling, which is perhaps why they were not always made consistently.

The debates on these decisions often went under the label "paternalism". A restriction in the language that prevents you doing something which the designers consider to be useless and perhaps harmful, where the restriction is not logically necessary, can be considered paternalistic - the language designers are trying to prevent the users from doing things which they think users should not be doing. By and large, orthogonality in language design is now valued above paternalism: we don't stop people writing (x+0), or iterating over a loop zero times.

The static typing rules in XQuery rather controversially disallow any expression whose value can be statically determined to be an empty sequence, unless it is explicitly written as the literal "()". This is very consciously a choice of paternalism over orthogonality; it's done because it enables mistyped element names in path expressions (such as documetn/sectino) to be reported as compile time errors. I think it would have been better to solve this problem in a different way, for example by disallowing the use of any element or attribute name in a schema-aware query unless the name is declared in the schema (with some explicit way of using dynamic names where necessary).

Another example from XQuery is about whether to allow a FLWOR expression with no "for" or "let" clauses, for example

where 1 = 0 return 3

which is perfectly well-defined, just not very useful.

It hadn't occurred to me that the debate between paternalism versus orthogonality often boils down to a decision about whether to allow emptiness. I've been trying to think of counter-examples. Another example from XSD was whether to allow the maxInclusive facet for a type to be less than the minInclusive facet. But again there's a link with emptiness: if this condition is not satisfied, the type cannot have any instances, but it's perfectly well defined.

In XPath we have rules for certain functions that ban regular expressions that match a zero-length string. I think there's a logical reason in this case; the semantics of something like tokenize() have to be special-cased for zero-length separators to ensure that the function terminates, and once you start special-casing you can't argue that you're doing something in the interest of orthogonallity. I guess pure orthogonality would actually make the function run for ever and leave the user to suffer the inevitable consequences. Here paternalism seems to offer the user a better deal.

Computer science isn't alone in struggling with some of the implications of zero. Mathematicians still argue about whether zero to the power zero should be zero or one.

Michael Kay
Saxonica

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]