What's wrong with namespaces? Some observations and suggestions

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]

From: Amelia A Lewis <amyzing@talsever.com>
To: "xml-dev@lists.xml.org" <xml-dev@lists.xml.org>
Date: Fri, 3 Dec 2010 20:34:41 -0500

Heyo.

So, I've been (recently, publicly) critical of the Namespaces in XML 
specification.  My summary version: when I try to teach someone who 
doesn't care about XML about XML APIs, it's namespaces where they 
Boggle and Fall Down.  Michael Kay offered some corroboration, in 
noting that over fifty percent of a particular day's queries were 
related to namespaces, and that this was typical for queries that he 
handled.

So.  Herewith, some strongly opinionated judgments as to what's wrong 
with namespaces:

1) Namespaces have no need to use URIs.

2) Namespace URIs aren't URIs.

3) There is no standalone canonical form, representable in a non-XML 
context.

4) The distinction in semantic for the default (unnamed) prefix between 
elements and attributes is extremely difficult to explain, and the 
variation between default (unnamed) prefix and default (unnamed) 
namespace behavior promotes confusion.

First, that namespaces have no need to use URIs: they don't.  The 
problem is one of authority, not of identifying resources.  Notably, 
while there has long been a convention of using "locatable" or 
"retrievable" URI formats (especially http: scheme URIs), there is 
usually nothing at the specified location to retrieve.  Note the 
invention of RDDL, for instance, and even its name (Resource Directory 
Description Language).  RDDL solves the problem: there is no resource 
for this uniform identifier to identify.

The role played by URIs in XML namespaces is played by domain names in 
a number of languages that existed prior to the namespace specification 
(Java comes immediately to mind).  Other solutions exist as well 
(Perl's uses colons! but the distribution of authority is not 
outstanding). The problem: a distributed authority for partition of a 
global namespace, with no centralized administration to resolve 
conflicts.  The adopted solution, URIs, carry with them radical 
verbosity, a pre-existing BNF incompatible with the "Name" production 
in XML, and a potentially heavy implementation price should a language 
choose to use a full "URI object" to represent namespaces.  Because the 
BNF for URIs is not a subset of the BNF for Name, it is impossible to 
specify a namespace without a surrounding context that binds namespaces 
to (Name BNF-compatible) prefixes (see item 3).

That leads to item 2: Namespaces in XML are not URIs.  They cannot be 
compared like URIs; they follow none of the semantic rules of URIs.  
There is no such thing as a "relative" namespace as opposed to an 
"absolute" namespace (but users might reasonably expect it, especially 
after encountering XML:Base).  Comparison is for string equality.  The 
empty string is valid (though a sentinel value).  As already mentioned, 
even if the URI uses a well-known scheme with well-known handlers for 
retrieval of the uniformly-identified resource, there is no guarantee 
that a resource will be found there, or that if one is, it will be in a 
particular format, or for that matter will even be related to the 
namespace in question in any degree.

If URIs *had* to be chosen, then they should have been chosen "whole 
hog."  Comparison should work like URIs; they should be subject to 
resolution (with all the horrendous weight that that would entail); 
they should be full URIs, not just strings in Name BNF-incompatible 
string drag.

Now ... the fact that these URIs aren't URIs actually provides an 
opportunity.  With no change to parsers or anything else, it is 
possible, even now, to recommend "package/path" style namespaces: 
instead of http://www.talsever.org/xml/namespaces/edml, 
org.talsever.xml.namespace.edml.  A change incompatible with Namespaces 
in XML: org:talsever:xml:namespace:edml (but many existing processors 
would choke).  W3C could reserve the short prefix "xml" (similar to 
Snoracle's reservation of "java"), and could additionally operate a 
"short-namespace registry."  And this would potentially resolve item 3 
...

... which is that there is no standalone canonical form for 
fully-qualified XML names (namespace + localname) outside of XML.  
Because URIs, which are syntax-incompatible with XML Name-s, were 
chosen to facilitate namespace partitioning, we can't say 
"http://www.talsever.org/namespaces/edml:entity";.  James Clark proposed 
a useful form: "{http://www.talsever.org/namespaces/edml}entity"; ... 
but it can't be used as the name of an element, and because it was not 
incorporated as a standard canonical form in the Namespaces in XML 
specification, it wasn't widely adopted.  Instead, languages like 
XPath, where the construct would be useful, instead rely upon a context 
to externally define the namespace to prefix mappings (or prefix to 
namespace mappings, if you prefer).  That XPath doesn't acknowledge 
binding of the default prefix is just added misery.  The lack of a 
standard canonical form is the reason for QNames in content, which 
violate the layering of XML, forcing awareness of namespace/prefix 
mappings out of the parser level into the application.

Again, since namespace URIs aren't really URIs, adoption of a 
convention that substitutes "fully-qualified names" instead might work 
around this, but this is likely to be incompatible with existing 
namespace-aware parsers and processors (and because of the layering 
violation, that's pretty much every XML application out there, 
probably).  Fully qualified names have drawbacks, too (verbosity ... if 
you have to say "org.w3.namespaces.xhtml:p" instead of "p", you'll be 
*seriously* annoyed in short order), but further refinements might be 
possible, there--so long as the ability to always transform to a 
fully-qualified name existed.

The final issue I offer is the significant confusion surrounding 
default namespaces and prefixes, and the difference in syntax and 
semantics in their application to elements and attributes.  Again, I 
would suggest that this is in large part an artifact of the requirement 
to map the Name BNF-incompatible URI onto a compatible prefix, but the 
introduction of the empty string to represent the pre-namespaces global 
XML namespace complicates that analysis.

For elements, the default (unnamed, empty string, missing) namespace 
indicates the *global* namespace.  In that context, it is very rarely 
safe to combine one vocabulary with another, unless both vocabularies 
have been vetted for compatibility in advance.  The "stock" element of 
a recipe DTD has nothing to do with the "stock" element of an inventory 
DTD--a grocery might reasonably have both dialects in use.  For 
attributes, however, leaving them out of a namespace is the best thing 
to do.  It's simpler, shorter to write, and there is no danger of a 
'flag' attribute on a 'vessel' element becoming confused with the 
'flag' attribute of a 'note' element.  Attributes are implicitly 
"namespaced" by their container element.  Elements are not--but could 
be, certainly.  There is no actual reason to apply a namespace to an 
element that is not going to be used at the "top level" (but "top 
level" is open to interpretation--see HTML microformats, for example, 
and this is particularly true where namespaces are most needed, for 
embedding something in another vocabulary).  Relax-NG provides an 
explicit "export mechanism," elements that are allowed to be the 
starting point; that's a useful concept.  Only such elements really 
need to be in namespaces--for HTML, one might say "block-level 
elements", for instance.

Compare the default (unnamed, missing, empty string and no colon 
either) prefix.  It can be bound to a non-default namespace ... for 
elements.  Attributes with no prefix are not in the namespace bound to 
no-prefix; they're scoped by their containing attribute.  Elements with 
no prefix may be in the global (that is, default) namespace, or in some 
other, non-default namespace.  To find out, you have to have all the 
ancestors around.  In some contexts, like XPath, an element name with 
no prefix *must* be in the global namespace; you can't bind the default 
prefix for an XPath expression--even if the prefix is bound for the 
original document pointed to by the expression and by the document 
containing the expression.  I spent half an hour trying (surprisingly 
patiently) to explain to an astronomer (very *bright*--but not involved 
in XML) friend that even if she bound the default prefix of her 
stylesheet to the namespace URI which was bound to the default prefix 
of her incoming document (the XHTML namespace), "/html/body/p" still 
wasn't going to match anything.  She had to "redundantly" bind to "h" 
and use "/h:html/h:body/h:p" (in the end, she didn't do it that way ... 
she preprocessed the incoming XHTML with a quick script that removed 
the namespace declaration, so she could write XPath that made sense to 
her ... and that was largely because my explanation devolved into chaos 
when she wanted to match on "@class" attribtues (no, not @h:class ... 
well, because attributes don't work like that ... no, don't take out 
the "h" binding! ... wait, what are you doing? ... oh .... fine, just 
strip the namespace decl and do it that way, then)).

Summarizing: choosing URIs for namespaces was a mistake (in my 
opinion), because it meant no canonical form, and required "binding"; 
that these URIs aren't really URIs is both a source of confusion, and a 
potential opportunity.  Admittedly, it may not be possible to take 
advantage of the opportunity, in the current state of play.

My two cents (adjusted for inflation; I apologize for my well-known 
tendency to verbosity and sesquipedalian persiflage).

Amy!
-- 
Amelia A. Lewis                    amyzing {at} talsever.com
"Oh, fuck!  You did it just like I told you to!"  (The manager's lament)

Follow-Ups:
- Re: [xml-dev] What's wrong with namespaces? Some observations and suggestions
  - From: Bjoern Hoehrmann <derhoermi@gmx.net>
- Re: [xml-dev] What's wrong with namespaces? Some observations andsuggestions
  - From: Michael Kay <mike@saxonica.com>
- Re: [xml-dev] What's wrong with namespaces? Some observations andsuggestions
  - From: Liam R E Quin <liam@w3.org>

References:
- XML as salvage yard (was RE: James Clark: XML versus the Web)
  - From: "Simon St.Laurent" <simonstl@simonstl.com>
- Re: [xml-dev] XML as salvage yard (was RE: James Clark: XML versusthe Web)
  - From: Liam R E Quin <liam@w3.org>

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]