[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Dealing with namespaces (Was Re: "Binary XML" proposals)
- From: Joe English <jenglish@flightlab.com>
- To: xml-dev@lists.xml.org
- Date: Wed, 11 Apr 2001 08:31:27 -0700
Ken MacLeod wrote:
> Joe English <jenglish@flightlab.com> writes:
> > Al Snell wrote:
> > > [using a string table for element and attribute names]
> >
> > That's the approach I used in Cost; it works well. [...]
> >
> > This starts to break down when you throw namespaces into the mix
> > though, since element and attribute names are no longer simple,
> > atomic values. [...]
> >
> > I haven't yet seen or thought up a fully satisfactory solution to
> > this problem...
>
> In Orchard[1], I use the tuple of (URI,LocalName) for element and
> attribute names, instead of the QName, and it works great.
Another approach I've been playing with is to normalize all
QNames on input so that the same prefix is always used for
the same URI. (If two prefixes are bound to the same
URI, only the first one is used internally; if a declaration
uses a prefix that's previously been bound to a different URI,
the normalizer generates a new prefix.)
Application programs can also declare prefix mappings.
This way, a program that wants to process elements in
the {http://www.foo.com/} namespace can call
xmlns::declare "foo" "http://www.foo.com"
at the beginning of the program. Subsequently, QNames with
prefixes bound to the {http://www.foo.com} namespace URI will
be rewritten to use the 'foo:' prefix no matter what prefix
was used in the input document. Then the application can treat
element and attribute names atomically as "foo:bar"
just like in the pre-namespace days.
This works OK for many (most?) applications, but troubles
arise with architectures that use QNames inside attribute
values and element content. For example XSD, and anything
that uses XPath. To support architectures like this, it's
also necessary to make the QName normalization routine
available to the application. In a SAX-like interface
it's sufficient to provide access to the "current" namespace
environment; in a DOM-like interface, QName normalization
also depends on a context node.
Then there's the matter of (re)serialization. If a program
reads in a document, modifies it some, then writes it back
out again, is it necessary to preserve the original prefixes
or is it safe to use the "normalized" ones? In the general
case I think it's necessary to preserve the original prefixes --
the architecture might use QNames-in-content in places that
the application doesn't know about, or it may just be more trouble
than its worth for the app to normalize all the QNames-in-content.
Preserving the namespace environment is subtle and difficult to
get right; the DOM and XSLT specs have to deal with this issue,
and the solution doesn't look pretty.
Then there's RDF, which interprets unprefixed attribute names
as if they "belong to" the namespace of their parent element,
whereas most other architectures -- including the Infoset and
"Namespaces in XML" TR -- interpret unprefixed attributes
as "global", i.e., they don't have a Namespace URI property
at all. If the architecture is mostly Infoset-compliant but
allows RDF data islands to be mixed in (which seems like a common
practice), how can you tell when to apply RDF semantics and when
not to? This one currently has me stumped. Maybe for RDF
it's best to translate all QNames into URIs and work with those
instead?
I'd like to have an API that makes namespace issues mostly transparent.
XSLT comes pretty close to achieving this. IMO the SAX and DOM APIs
miss the mark entirely; they don't help much at all.
In conclusion: namespaces are a pain in the ass.
--Joe English
jenglish@flightlab.com