[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]
What's wrong with namespaces? Some observations and suggestions
- From: Amelia A Lewis <amyzing@talsever.com>
- To: "xml-dev@lists.xml.org" <xml-dev@lists.xml.org>
- Date: Fri, 3 Dec 2010 20:34:41 -0500
Heyo.
So, I've been (recently, publicly) critical of the Namespaces in XML
specification. My summary version: when I try to teach someone who
doesn't care about XML about XML APIs, it's namespaces where they
Boggle and Fall Down. Michael Kay offered some corroboration, in
noting that over fifty percent of a particular day's queries were
related to namespaces, and that this was typical for queries that he
handled.
So. Herewith, some strongly opinionated judgments as to what's wrong
with namespaces:
1) Namespaces have no need to use URIs.
2) Namespace URIs aren't URIs.
3) There is no standalone canonical form, representable in a non-XML
context.
4) The distinction in semantic for the default (unnamed) prefix between
elements and attributes is extremely difficult to explain, and the
variation between default (unnamed) prefix and default (unnamed)
namespace behavior promotes confusion.
First, that namespaces have no need to use URIs: they don't. The
problem is one of authority, not of identifying resources. Notably,
while there has long been a convention of using "locatable" or
"retrievable" URI formats (especially http: scheme URIs), there is
usually nothing at the specified location to retrieve. Note the
invention of RDDL, for instance, and even its name (Resource Directory
Description Language). RDDL solves the problem: there is no resource
for this uniform identifier to identify.
The role played by URIs in XML namespaces is played by domain names in
a number of languages that existed prior to the namespace specification
(Java comes immediately to mind). Other solutions exist as well
(Perl's uses colons! but the distribution of authority is not
outstanding). The problem: a distributed authority for partition of a
global namespace, with no centralized administration to resolve
conflicts. The adopted solution, URIs, carry with them radical
verbosity, a pre-existing BNF incompatible with the "Name" production
in XML, and a potentially heavy implementation price should a language
choose to use a full "URI object" to represent namespaces. Because the
BNF for URIs is not a subset of the BNF for Name, it is impossible to
specify a namespace without a surrounding context that binds namespaces
to (Name BNF-compatible) prefixes (see item 3).
That leads to item 2: Namespaces in XML are not URIs. They cannot be
compared like URIs; they follow none of the semantic rules of URIs.
There is no such thing as a "relative" namespace as opposed to an
"absolute" namespace (but users might reasonably expect it, especially
after encountering XML:Base). Comparison is for string equality. The
empty string is valid (though a sentinel value). As already mentioned,
even if the URI uses a well-known scheme with well-known handlers for
retrieval of the uniformly-identified resource, there is no guarantee
that a resource will be found there, or that if one is, it will be in a
particular format, or for that matter will even be related to the
namespace in question in any degree.
If URIs *had* to be chosen, then they should have been chosen "whole
hog." Comparison should work like URIs; they should be subject to
resolution (with all the horrendous weight that that would entail);
they should be full URIs, not just strings in Name BNF-incompatible
string drag.
Now ... the fact that these URIs aren't URIs actually provides an
opportunity. With no change to parsers or anything else, it is
possible, even now, to recommend "package/path" style namespaces:
instead of http://www.talsever.org/xml/namespaces/edml,
org.talsever.xml.namespace.edml. A change incompatible with Namespaces
in XML: org:talsever:xml:namespace:edml (but many existing processors
would choke). W3C could reserve the short prefix "xml" (similar to
Snoracle's reservation of "java"), and could additionally operate a
"short-namespace registry." And this would potentially resolve item 3
...
... which is that there is no standalone canonical form for
fully-qualified XML names (namespace + localname) outside of XML.
Because URIs, which are syntax-incompatible with XML Name-s, were
chosen to facilitate namespace partitioning, we can't say
"http://www.talsever.org/namespaces/edml:entity". James Clark proposed
a useful form: "{http://www.talsever.org/namespaces/edml}entity" ...
but it can't be used as the name of an element, and because it was not
incorporated as a standard canonical form in the Namespaces in XML
specification, it wasn't widely adopted. Instead, languages like
XPath, where the construct would be useful, instead rely upon a context
to externally define the namespace to prefix mappings (or prefix to
namespace mappings, if you prefer). That XPath doesn't acknowledge
binding of the default prefix is just added misery. The lack of a
standard canonical form is the reason for QNames in content, which
violate the layering of XML, forcing awareness of namespace/prefix
mappings out of the parser level into the application.
Again, since namespace URIs aren't really URIs, adoption of a
convention that substitutes "fully-qualified names" instead might work
around this, but this is likely to be incompatible with existing
namespace-aware parsers and processors (and because of the layering
violation, that's pretty much every XML application out there,
probably). Fully qualified names have drawbacks, too (verbosity ... if
you have to say "org.w3.namespaces.xhtml:p" instead of "p", you'll be
*seriously* annoyed in short order), but further refinements might be
possible, there--so long as the ability to always transform to a
fully-qualified name existed.
The final issue I offer is the significant confusion surrounding
default namespaces and prefixes, and the difference in syntax and
semantics in their application to elements and attributes. Again, I
would suggest that this is in large part an artifact of the requirement
to map the Name BNF-incompatible URI onto a compatible prefix, but the
introduction of the empty string to represent the pre-namespaces global
XML namespace complicates that analysis.
For elements, the default (unnamed, empty string, missing) namespace
indicates the *global* namespace. In that context, it is very rarely
safe to combine one vocabulary with another, unless both vocabularies
have been vetted for compatibility in advance. The "stock" element of
a recipe DTD has nothing to do with the "stock" element of an inventory
DTD--a grocery might reasonably have both dialects in use. For
attributes, however, leaving them out of a namespace is the best thing
to do. It's simpler, shorter to write, and there is no danger of a
'flag' attribute on a 'vessel' element becoming confused with the
'flag' attribute of a 'note' element. Attributes are implicitly
"namespaced" by their container element. Elements are not--but could
be, certainly. There is no actual reason to apply a namespace to an
element that is not going to be used at the "top level" (but "top
level" is open to interpretation--see HTML microformats, for example,
and this is particularly true where namespaces are most needed, for
embedding something in another vocabulary). Relax-NG provides an
explicit "export mechanism," elements that are allowed to be the
starting point; that's a useful concept. Only such elements really
need to be in namespaces--for HTML, one might say "block-level
elements", for instance.
Compare the default (unnamed, missing, empty string and no colon
either) prefix. It can be bound to a non-default namespace ... for
elements. Attributes with no prefix are not in the namespace bound to
no-prefix; they're scoped by their containing attribute. Elements with
no prefix may be in the global (that is, default) namespace, or in some
other, non-default namespace. To find out, you have to have all the
ancestors around. In some contexts, like XPath, an element name with
no prefix *must* be in the global namespace; you can't bind the default
prefix for an XPath expression--even if the prefix is bound for the
original document pointed to by the expression and by the document
containing the expression. I spent half an hour trying (surprisingly
patiently) to explain to an astronomer (very *bright*--but not involved
in XML) friend that even if she bound the default prefix of her
stylesheet to the namespace URI which was bound to the default prefix
of her incoming document (the XHTML namespace), "/html/body/p" still
wasn't going to match anything. She had to "redundantly" bind to "h"
and use "/h:html/h:body/h:p" (in the end, she didn't do it that way ...
she preprocessed the incoming XHTML with a quick script that removed
the namespace declaration, so she could write XPath that made sense to
her ... and that was largely because my explanation devolved into chaos
when she wanted to match on "@class" attribtues (no, not @h:class ...
well, because attributes don't work like that ... no, don't take out
the "h" binding! ... wait, what are you doing? ... oh .... fine, just
strip the namespace decl and do it that way, then)).
Summarizing: choosing URIs for namespaces was a mistake (in my
opinion), because it meant no canonical form, and required "binding";
that these URIs aren't really URIs is both a source of confusion, and a
potential opportunity. Admittedly, it may not be possible to take
advantage of the opportunity, in the current state of play.
My two cents (adjusted for inflation; I apologize for my well-known
tendency to verbosity and sesquipedalian persiflage).
Amy!
--
Amelia A. Lewis amyzing {at} talsever.com
"Oh, fuck! You did it just like I told you to!" (The manager's lament)
[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]