xml-dev - namespaces and entities: a thought experiment

namespaces and entities: a thought experiment

[ Lists Home | Date Index | Thread Index ]

To: xml-dev@lists.xml.org
Subject: namespaces and entities: a thought experiment
From: adam souzis <adam@kinecta.com>
Date: Fri, 23 Aug 2002 14:53:49 -0700
User-agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.1b) Gecko/20020721

I'm not sure if there's much more to be said about namespaces, but I'd 
like to share a thought-experiment about an imaginary XML 2.0 syntax 
mechanism that replaces both namespaces and entities.   Inventing syntax 
mechanisms in itself may be fun and very occasionally interesting to 
others but hopefully this will be useful to help cut through the 
conceptual overhead that contributes xml namespace's difficulties and 
controversy.

We begin by noting that the namespace and entity mechanisms are similar 
in that they allow you to substitute one string for another; one for 
general character replacement, the other, a prefix in a name for an URI.

So consider if entities were scoped and declared just like namespace 
declaration attributes:

<example:element xmlent:example="http://namespace.org#";>

Here the xmlent prefix is special like the xmlns prefix except it 
declares a named entity with the name "example" and replacement text of 
"http://namespace.org";.  Within any character data contained by the 
element you could use &example; just like any named entity reference.

But in addition to the standard entity reference syntax we also specify 
that ':' is not allow in the element and attribute name production and 
instead specify that the ':' is the terminal character of an entity 
reference that may start an element or attribute name.  

In other words, in the example above 'example:element' expands to 
http://namespace.org#element

The only tricky thing we need to make this syntax behave just like the 
current namespace syntax is that we  need an unnamed entity equivalent 
to xmlns="http://namespace#";.  We'd have to allow 
xmlent="http://namespace#"; declaration that has the specially quality of 
having the parser act as if the entity reference was declared in every 
attribute and element name inside its scope. (And in the PCDATA we'd 
need another built-in entity that returns that value, let's call it 
&xmlen;)
With this added I'm pretty sure this mechanism will behave exactly like 
the namespace spec (except of course we also need to add the constraint 
that the replacement text needs to conform to the URI syntax ending with 
a '#'), with the exception that an unqualified attribute will be part of 
the default namespace/un-named entity instead of being part of its 
parent element's namespace ("Per Element Types Partition" in the 
namespace spec) -- which is a behaviour I think most of us agree is better.

Benefits of this approach:

* The problem of qnames in character data goes away:  Qnames just need 
to be written as regular entity references &adf;element allow them to be 
recognized by the parser and thus solving the XML canonalization problem 
of having to preserve distinguish namespace declarations in order to 
resolve prefixes of qnames embedded in character data.

* Simpler xml parser implementation.  The same stack that parser needs 
for namespaces today can be used here to support both namespace and 
entities. Of course this entity mechanism doesn't allow for external 
entities or mark-up in the replacement text, but XInclude's a better 
place for that functionality.

* Namespaces become conceptually simpler and so easier to use: they are 
just a way to create a globally distinguishable set of element names and 
a globally distinguishable set of attribute names. Also, it helps clear 
up the confusion about the namespace URI because it highlights that we 
talking about a constraint on a string of characters in order to ensure 
its uniqueness, not a URL.  The hardest concept is the fact now element 
and attributes can have names that are not legally expressable in the 
syntax, but this very fact makes it easy to process: if an element or 
attributes name has a '#' in it, you know it must be in a namespace, and 
that namespace's name is the string to the left of the '#'.

Anyway, some food for thought, I hope.

regards,
adam

Prev by Date: Re: [xml-dev] Detection of non-Unicode characters
Next by Date: Re: [xml-dev] How to read a DTD file?
Previous by thread: Detection of non-Unicode characters
Next by thread: [ANN] XML and Web Services Connections (31 Oct- 2 Nov in Orlando)
Index(es):
- Date
- Thread