[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: SAX Filters for Namespace Processing
- From: Eric Bohlman <email@example.com>
- To: Richard Tobin <firstname.lastname@example.org>
- Date: Sat, 04 Aug 2001 11:14:00 -0500
8/4/01 7:51:10 PM, Richard Tobin <email@example.com> wrote:
>>> As many threads on xml-dev have shown,
>>> text-based processing of XML is hazardous at best.
>>Do you understand how this statement completely contradicts the original
>>intent of XML?
>Hang on, consider the context of this. The thing that's "hazardous"
>is copying a fragment of XML from one place to another, and the
>sense in which it's hazardous is that it may become impossible to
>identify what vocabulary the elements belong too.
>Now this was always true, even before namespaces. In fact, it was
>*more* true before namespaces, because there was nothing you could do
>about a name clash short of renaming the elements. Namespaces have
>relieved this somewhat, in that you can bind prefixes locally to
>preserve their meaning. A simple cut-and-paste can't do this,
>because you have to insert namespace attributes, but XSLT does it,
>XInclude will do it, and a namespace-aware editor can do it.
In fact, this sort of thing doesn't just happen in XML; it happens in programming languages, and
even natural languages. In a programming language, if I have a loop that increments the value of a
locally-scoped variable called i, and I cut-and-paste the loop code into another block where
there's no variable named i in scope, I'll get a syntax error. If there *is* a variable named i in
scope in the new context, and it's being used for some other purpose, the new code will trash it.
If I'm writing a story and I've got a paragraph describing what Richard did, and I cut-and-paste it
into another story with a completely different character named Richard, the reader will assume that
the paragraph was about *that* Richard, not the Richard in the original story.
The problem here really isn't "text-based processing." The "problem" is that as much as many geeks
would rather it were otherwise, language, natural or otherwise, is *always* context-dependent. The
meaning of a set of symbols isn't a property of the symbols itself; it comes from an external
understanding of what they're supposed to mean. In _Sphere_ (the book; this part didn't make it
into the movie) Michael Crichton describes a group of scientists who composed a message intended to
announce our existence to any extraterrestrials who might be listening. The message was based on
the values of various physical constants. A skeptic among them asked them to do a little thought
experiment; imagine that they received the same message from space, without knowing what it was
supposed to be, and try to figure out what it meant. None of them could. The point is that
language can't carry all of its own meaning.
In this case, if you're going to rely on scope (i.e. a default namespace declaration) to determine
the namespace URI for an element, then you're going to have to accept that if you treat the element
as pure text, moving it to another scope will change its meaning unless you also make changes to
the new context. And that's not necessarily a bad thing. It's the price you pay for flexibility.
If you have a vocabulary for purchase orders, you have to accept that meaning is going to be lost
if you move an element giving a product's name into the "billing address" section. If you randomly
permute the contents of a document, you're not going to be able to preserve meaning. If you've got
some text content that includes a reference to a parsed general entity and you cut-and-paste it
into a document that doesn't define that entity, or defines it differently than the original
document, you're going to get a change of meaning. I don't really see that as a trap, just as a
matter of needing to understand what you're doing.