xml-dev - escaping QName interlopers

escaping QName interlopers

[ Lists Home | Date Index | Thread Index ]

To: xml-dev@lists.xml.org
Subject: escaping QName interlopers
From: "Simon St.Laurent" <simonstl@simonstl.com>
Date: Sun, 19 Jan 2003 11:47:30 -0500

[Warning: half-baked Sunday AM musings follow.  There's definitely
something here, but what it is and whether it could sweep the world is
another set of questions.]

QNames appear to have emerged as a means of combining existing markup
practice with the W3C's fondness for URIs as the defining identifier for
the Web.  Nearly every aspect of QNames has been questioned in one form
or another on this list and elsewhere, but there are still probably some
pieces we haven't explored, some of which may in fact be interesting.

It struck me this morning that many of the problems with QNames aren't
exactly the fault of QNames.  QNames are kind of the result of a
collision between a URI truck and the Name compact car where we end up
driving the truck from the driver's seat of the car.

The most obvious fault line is the inability to express QNames as URIs.
For example, it some contexts, it might be very convenient to be able to
describe this spec element:

<piece xmlns="http://simonstl.com/ns/vellum"; />

as:

http://simonstl.com/ns/vellum#piece

Unfortunately, there are a number of problems with that approach.  That
works, but what do I do with:

<piece xmlns="http://simonstl.com/ns#vellum"; />

Creating http://simonstl.com/ns#vellum#piece is an even uglier
collision.  While namespaces that contain fragment identifiers are
apparently rare, creations primarily of the W3C, they do exist and are
legal, and therefore offer a barrier to developers who want to describe
their vocabularies using URI-based rather than QName-based mechanisms.

A query string approach is another option, but it gets even wilder with
namespace URIs that use fragment identifiers:

http://simonstl.com/ns?name=piece#vellum

Moving beyond the intricacies of URI syntax, there are other problems
with this kind of approach.  RDDL documents are presently designed to
provide resources about vocabularies as a set of names, with schemas and
other such niceties.  It's not clear that even my friendliest case of
"http://simonstl.com/ns/vellum#piece"; is particularly compatible with
that approach, though perhaps RDDL could be extended that way.

Given all of these problems, why on earth would we want to be able to
treat element and attribute names as URIs?

The simplest reason for doing so is removing the mismatch between QName
processing, which is now context dependent in an ever-growing number of
ways, and URI processing.  For better or worse, URIs are less
context-dependent (or can be made so through absolutization) than
QNames.  This would let me get rid of a lot of annoying code and spend
less time thinking about issues like QNames in attributes.

A more interesting reason for doing this builds on the ever-growing use
of namespaces to mix and match vocabularies and the recurring problems
of modularization.  Pulling pieces of out of various schemas or DTDs and
reassembling them to fit given projects is a nuisance because schemas
and DTDs also describe sets of resources rather than individual
components.  While the individual components do get described in the
end, they are described for use in a particular context.  There are few
mechanisms for describing which attributes of a given element, for
instance, are crucial to its use, and which may be safely pruned when it
is reused elsewhere.

There are widely-distributed mechanisms which are in fact designed to
answer these kinds of relationship questions, though I've admitted in
the past that I'm not particularly fond of them and don't find them
particularly accessible.  RDF and its surrounding toolkits, however, do
an excellent job of describing relationships between resources, at least
when those resources can be identified as URIs.  The current
namespace/QName approach only applies a URI to the namespace, making it
difficult to apply RDF to smaller pieces.

It struck me this morning that there's a way to apply URIs to individual
elements and attributes, though it requires an application of namespaces
that varies pretty dramatically from the typical style, and effectively
creates something like a DOCTYPE.

I'll start with a simple example:
<piece xmlns="http://simonstl.com/ns/vellum"; >
  <connections>
   <traverse>
     <from href="http://www.w3.org/TR/REC-xml#sec-common-syn"; />
     <to href="http://www.w3.org/TR/REC-xml-names/#ns-qualnames"; />
   </traverse>
   <traverse>
     <from href="http://www.w3.org/TR/REC-xml-names/#ns-qualnames"; />
     <to href="http://www.w3.org/TR/REC-xml#sec-common-syn"; />
   </traverse>
  </connections>
</piece>

Using the approach I've been pondering, this could turn into something
like:
<piece:x 
    xmlns:piece="http://simonstl.com/ns/vellum/piece"; 
    xmlns:connections="http://simonstl.com/ns/vellum/connections";
    xmlns:traverse="http://simonstl.com/ns/vellum/traverse";
    xmlns:from="http://simonstl.com/ns/vellum/from";
    xmlns:to="http://simonstl.com/ns/vellum/to";
    xmlns:href="http://simonstl.com/ns/vellum/href";
    >
  <connections:x>
   <traverse:x>
     <from:x href:x="http://www.w3.org/TR/REC-xml#sec-common-syn"; />
     <to:x href:x="http://www.w3.org/TR/REC-xml-names/#ns-qualnames"; />
   </traverse:x>
   <traverse:x>
     <from:x href:x="http://www.w3.org/TR/REC-xml-names/#ns-qualnames";
/>
     <to:x href:x="http://www.w3.org/TR/REC-xml#sec-common-syn"; />
   </traverse>
  </connections>
</piece>

The second form defines one namespace per element and attribute name,
creating a direct mapping between those names and a URI.  It then uses a
blank local name for all the elements and attributes, since every
element and attribute already has a unique identifier.  Each element can
now have its own space defining different levels of processing,
mixability, etc., and modularization approaches can reference these
descriptions directly rather than having to harvest information from
tangled schemas.  It also makes it simpler for modularization approaches
to define their own sets of relationships between these components,
overriding the claims made by the creators of the original markup if
they so choose.

There are lots of problems with this, of course.  The root element
becomes pretty hefty, and there's no easy way to slap that content in an
entity since entities can't be containers.  The namespace declarations
could be distributed through the document, though that has the amusing
side effect of moving an element's real name to its attributes and the
real names of attributes to sibling attributes.

Still, I think there's something here worth thinking about.  This
approach seems to bind XML much more tightly to the Web architecture,
exposing more information to Web-oriented tools and potentially removing
the layers of obfuscation that grow when namespace-mixing becomes
commonplace.  

Will it happen?  I'm not counting on it.  Is it worth a few minutes of
thought? I think it is.
-- 
Simon St.Laurent
Ring around the content, a pocket full of brackets
Errors, errors, all fall down!
http://simonstl.com -- http://monasticxml.org

Follow-Ups:
- Re: [xml-dev] escaping QName interlopers
  - From: "Jonathan Borden" <jonathan@openhealth.org>
- Re: [xml-dev] escaping QName interlopers
  - From: Norman Walsh <ndw@nwalsh.com>
- Re: [xml-dev] escaping QName interlopers
  - From: John Cowan <jcowan@reutershealth.com>

Prev by Date: Re: [xml-dev] ConciseXML syntax
Next by Date: ConciseXML arguments
Previous by thread: 1-to-many HTML links, prototyped
Next by thread: Re: [xml-dev] escaping QName interlopers
Index(es):
- Date
- Thread