[
Lists Home |
Date Index |
Thread Index
]
Hi James,
You're proposing enhancing the processing that goes on in XML parsers,
correct? My main question is: how does the parser get to know about
which colons in element and attribute values indicate QNames, and
which are literal colons? For example, given:
<my_document xml:qnames="resolve">
<para type="foo:bar">
Here: a colon that doesn't indicate a QName.
And a literal string that looks like a QName, but isn't:
<xmp>my:type</xmp>.
</para>
</my_document>
How does the parser know that the QName in the type attribute should
be resolved (and constitutes a namespace error since its prefix isn't
declared), but that the first colon in the content of the para element
isn't a malformed QName, and that 'my:type', which looks like a QName,
isn't intended to be resolved?
This occurs even within a particular value: given an XPath, it is not
enough to simply go through and change everything that looks like a
QName into a resolved qualified name. For example:
<xsl:value-of select="foo:bar/*[name() = 'fred:barney']" />
In the select attribute, there are two things that look like QNames:
"foo:bar" and "fred:barney". However, "fred:barney" is a literal
string, and therefore shouldn't be resolved.
To be able to spot that it shouldn't resolve "fred:barney", a
QName-in-content-aware parser would have to know that an attribute was
an XPath attribute, *and* be able to parse that XPath attribute so
that it could recognise that "fred:barney" was in a literal string and
that therefore no resolution should be attempted.
XML Schema deals with this partially. Elements and attributes that are
of the type xs:QName are resolved during parsing by schema validators,
and the resolved qualified name is passed through to the application
in the PSVI.
What XML Schema doesn't help with is dealing with XPaths or attributes
that hold namespace prefixes (such as 'extension-element-prefixes' in
XSLT). XML Schema can say that a value is a QName or a space-separated
list of QNames, but it can't say that a value is an XPath, a prefix or
a list of prefixes, or for that matter a comma-separated list of
QNames and other uses of prefixes or QNames in values that we haven't
thought of yet.
Still, there might be a solution around here somewhere. Certainly,
standardising the lexical representation "{namespace-uri}local-name"
for QNames would help a whole lot in other areas, and might facilitate
incremental resolution of QNames in content and attribute values.
(Although perhaps "{namespace-uri}prefix:local-name" would be better
-- the prefix might be meaningless to a processor, but it's almost
always meaningful to people and therefore important to keep around.)
Another possibility (which I expect to get shot down immediately)
would be to treat all colons that are not preceded or followed by
whitespace as indications of QNames, which would mean adding a sixth
built-in entity, &cln;, say, to escape literal colons. The examples
above would then be:
<my_document>
<para type="foo:bar">
Here: a colon that doesn't indicate a QName.
And a literal string that looks like a QName, but isn't:
<xmp>my&cln;type</xmp>.
</para>
</my_document>
<xsl:value-of select="foo:bar/*[name() = 'fred&cln;barney']" />
This does have the advantage that a basic namespace-aware parser, with
no knowledge of schemas or the particular markup language, would be
able to know which QNames to resolve and which to leave alone. It
still wouldn't, however, be able to deal with attributes holding
namespace prefixes not involved in QNames.
Cheers,
Jeni
---
Jeni Tennison
http://www.jenitennison.com/
|