OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: SAX Filters for Namespace Processing



8/4/01 7:51:10 PM, Richard Tobin <richard@cogsci.ed.ac.uk> wrote:

>>> As many threads on xml-dev have shown,
>>> text-based processing of XML is hazardous at best.
>
>>Do you understand how this statement completely contradicts the original
>>intent of XML?  
>
>Hang on, consider the context of this.  The thing that's "hazardous"
>is copying a fragment of XML from one place to another, and the
>sense in which it's hazardous is that it may become impossible to
>identify what vocabulary the elements belong too.
>
>Now this was always true, even before namespaces.  In fact, it was
>*more* true before namespaces, because there was nothing you could do
>about a name clash short of renaming the elements.  Namespaces have
>relieved this somewhat, in that you can bind prefixes locally to
>preserve their meaning.  A simple cut-and-paste can't do this,
>because you have to insert namespace attributes, but XSLT does it,
>XInclude will do it, and a namespace-aware editor can do it.

In fact, this sort of thing doesn't just happen in XML; it happens in programming languages, and 
even natural languages.  In a programming language, if I have a loop that increments the value of a 
locally-scoped variable called i, and I cut-and-paste the loop code into another block where 
there's no variable named i in scope, I'll get a syntax error.  If there *is* a variable named i in 
scope in the new context, and it's being used for some other purpose, the new code will trash it.  
If I'm writing a story and I've got a paragraph describing what Richard did, and I cut-and-paste it 
into another story with a completely different character named Richard, the reader will assume that 
the paragraph was about *that* Richard, not the Richard in the original story.

The problem here really isn't "text-based processing."  The "problem" is that as much as many geeks 
would rather it were otherwise, language, natural or otherwise, is *always* context-dependent.  The 
meaning of a set of symbols isn't a property of the symbols itself; it comes from an external 
understanding of what they're supposed to mean.  In _Sphere_ (the book; this part didn't make it 
into the movie) Michael Crichton describes a group of scientists who composed a message intended to 
announce our existence to any extraterrestrials who might be listening.  The message was based on 
the values of various physical constants.  A skeptic among them asked them to do a little thought 
experiment; imagine that they received the same message from space, without knowing what it was 
supposed to be, and try to figure out what it meant.  None of them could.  The point is that 
language can't carry all of its own meaning.

In this case, if you're going to rely on scope (i.e. a default namespace declaration) to determine 
the namespace URI for an element, then you're going to have to accept that if you treat the element 
as pure text, moving it to another scope will change its meaning unless you also make changes to 
the new context.  And that's not necessarily a bad thing.  It's the price you pay for flexibility.  
If you have a vocabulary for purchase orders, you have to accept that meaning is going to be lost 
if you move an element giving a product's name into the "billing address" section.  If you randomly 
permute the contents of a document, you're not going to be able to preserve meaning.  If you've got 
some text content that includes a reference to a parsed general entity and you cut-and-paste it 
into a document that doesn't define that entity, or defines it differently than the original 
document, you're going to get a change of meaning.  I don't really see that as a trap, just as a 
matter of needing to understand what you're doing.