[
Lists Home |
Date Index |
Thread Index
]
Joe,
Great post.
Jonathan
----- Original Message -----
From: "Joe English" <jenglish@flightlab.com>
To: <xml-dev@lists.xml.org>
Sent: Friday, April 05, 2002 11:43 AM
Subject: [xml-dev] A plea for Sanity
>
> [ Also sent to xml-names-editor@w3.org ]
>
> "Namespaces in XML 1.1 Requirements" cites the ability to "undeclare"
> a namespace as the principal (only?) new needed feature, because
> of the case where:
>
> | information items [...] from another document [...] may
> | have fewer in-scope namespaces than their parent. There is
> | no mechanism for accurately serializing this situation. If
> | the infoset is naively serialized and reparsed, the children
> | will end up with additional namespace information items which
> | serve no useful purpose.
>
> I believe that this requirement is ill-considered.
>
> Under SGML and XML 1.0, applications can treat generic
> identifiers as atomic strings; with XML 1.0 + Namespaces,
> element and attribute names become compound objects consisting
> of a URI and a local name. This complicates applications a bit,
> but by itself is not an onerous burden: toolkits like SAX can
> provide namespace processors that keep track of the namespace
> environment, map GIs to {URI+localname} pairs, and throw away
> the original namespace declarations.
>
> The real complexity starts to show up in applications which
> themselves need to keep track of the namespace environment
> (e.g., XSLT). This is usually required for applications that
> need to reserialize an Infoset as XML and wish to retain
> the original namespace prefixes on output. (It gets hairier
> for markup vocabularies that include QNames in content, but that's
> a different issue.)
>
> But the new requirement implies that the *exact set of in-scope
> namespaces at each node* is an essential part of the Infoset.
> This is the part that I think is ill-considered. This property
> should be deemed inessential, just as whitespace in tags and the
> order of attribute value specifications are deemed inessential.
> XML-related specifications should not expect or demand that it be
> preserved; any set of namespace declarations that produce the same
> {URI+localname} pairs after namespace processing should be considered
> equivalent.
>
> In particular, "additional namespace information items which
> serve no useful purpose" -- and hence do not affect the interpretation
> of QNames in markup or content -- should not matter. Applications
> should be free to insert or discard them as they see fit without
> changing the meaning of the Infoset.
>
> * * *
>
> Now a plea for sanity.
>
> (This is for people who design XML vocabularies and applications;
> xml-names-editor, I know you're busy, so you can stop reading here.)
>
> There are certain practices which, if avoided, can make life
> simpler for application and toolkit developers. These are
> all legal according to the Namespaces REC, and I don't suggest
> that they be disallowed in XML 1.1, but it may be beneficial
> for individual applications to disallow them.
>
> Some definitions:
>
> Let's say that an XML document is _neurotic_ if it maps the same
> namespace prefix to two different namespace URIs at different
> points. Neurosis makes it necessary for XML processors to
> work with {URI+localname} pairs instead of GIs, and to keep
> track of the namespace environment at each point in the tree
> if there are QNames-in-content. If it weren't for neurosis,
> applications could use a single namespace map that applied to
> the entire document.
>
> Conversely, a document is _borderline_ if it maps two different
> namespace prefixes to the same namespace URI. Borderline documents
> complicate reserialization: the choice of which prefix to
> use for a particular {URI+localname} pair depends on its
> position in the tree.
>
> A document is _psychotic_ if it maps two different namespace prefixes
> to the same URI _in the same scope_. Psychosis presents an even
> bigger difficulty for reserialization: now applications must keep
> track of the original prefix as well as the {URI+localname} pair.
>
> A document is _normal_ (or _in namespace-normal form_) if all
> namespace declarations appear on the root element and it is
> not psychotic. (A borderline document with all namespace
> declarations in the same place is automatically psychotic;
> a neurotic document with this property would be illegal according
> to the Namespaces REC.)
>
> Normal documents are the easiest to process: the application can
> determine the global namespace environment at the beginning of the
> parse, and can use it throughout processing.
>
> It's not always possible to produce normal documents -- the producer
> might not know all of the relevant namespaces at the time it emits
> the root element start-tag -- so a weaker definition is useful:
> A document is _sane_ if it is neither neurotic nor borderline.
>
> Document producers should be designed to emit sane documents.
>
> This is not hard to do -- the serializer just needs to maintain
> a monotonic, bijective URI/prefix map and reuse the same prefix
> whenever a namespace URI leaves and comes back into scope.
> ("Bijective": there is precisely one URI for each prefix and
> one prefix for each URI; by "monotonic" I mean that prefix+URI
> pairs may be added to the map but not removed.)
>
> A sane document can be transformed into a normal document simply
> by moving all namespace declarations to the root element and
> filtering out duplicates. (This can't be done in streaming
> mode, but it might be an appropriate technique for XML databases.)
>
> Now general-purpose XML consumers cannot expect to receive sane
> documents. However *special-purpose* consumers, designed to work
> with specific markup vocabularies, can be a lot simpler if the
> markup vocabulary includes namespace sanity as a requirement.
>
> As an application developer, I'd prefer not to have to worry
> about namespace nodes or {URI+localname} pairs. I'd rather be
> able to give the parser an internal namespace map describing
> all the namespace URIs I'm interested in, and have the parser
> translate QNames in markup to use my prefixes. Then the application
> can work with GIs instead of {URI+localname} pairs. If the source
> document is sane, then it's possible to preserve the original prefixes
> on reserialization simply by remembering the original namespace map;
> it's not necessary to keep track of namespace nodes during processing.
>
> QNames in content are a lot easier to process in a sane document.
> Sanity guarantees that a given QName means the same thing wherever
> it appears. Any future markup vocabulary which uses QNames in content
> should include sanity as an application requirement.
>
> A requirement for sanity shifts part of the burden onto document
> producers, where it's easy to handle. The alternative is maddening
> complexity for document consumers.
>
>
> --Joe English
>
> jenglish@flightlab.com
>
> -----------------------------------------------------------------
> The xml-dev list is sponsored by XML.org <http://www.xml.org>, an
> initiative of OASIS <http://www.oasis-open.org>
>
> The list archives are at http://lists.xml.org/archives/xml-dev/
>
> To subscribe or unsubscribe from this list use the subscription
> manager: <http://lists.xml.org/ob/adm.pl>
>
>
|