[
Lists Home |
Date Index |
Thread Index
]
> Given the following XML in a DOM document
>
> <foo>
> bar
> <![CDATA[
> baz
> ]]>
> quux
> </foo>
>
> and the following XPath
>
> //text()
>
> what should be the resulting DOM nodes and why? I can think of two answers but they both have problems.
>
> PS: Why is http://www.w3.org/TR/2002/WD-DOM-Level-3-XPath-20020712/ returning a 404 when it is linked from http://www.w3.org/DOM/ ?
>
XPath is defined against a certain model of an XML document. The section that
answers your question is 5.7:
"Character data is grouped into text nodes. As much character data as possible
is grouped into each text node: a text node never has an immediately following
or preceding sibling that is a text node. The string-value of a text node is
the character data. A text node always has at least one character of data.
"Each character within a CDATA section is treated as character data. Thus,
<![CDATA[<]]> in the source document will treated the same as <. Both will
result in a single < character in a text node in the tree. Thus, a CDATA
section is treated as if the <![CDATA[ and ]]> were removed and every
occurrence of < and & were replaced by < and & respectively."
Therefore to a conforming XPath processor,
<foo>
bar
<![CDATA[
baz
]]>
quux
</foo>
Is precesely the same as
<foo>
bar
baz
quux
</foo>
i.e. one element node with one text node child.
There is actually an open bug against 4XPath right now that it leaks a bit in
this performance. e.g. in some cases, it can return a text node child of an
attribute when operating on a DOM (this is so in DOM but not XPath). Your pos
is a handy reminder for me to fix this bug.
As an illustration, here's a session with 4XPath does (interactive Python
prompt):
>>> DOC = """<foo>
... bar
... <![CDATA[
... baz
... ]]>
... quux
... </foo>"""
>>> from Ft.Xml.Domlette import NonvalidatingReader
>>> doc = NonvalidatingReader.parseString(DOC, "http://dummybaseuri.com")
>>> from Ft.Xml.XPath import Evaluate
>>> result = Evaluate("//text()", contextNode=doc)
>>> print result
[<cText at 0x81ae434>]
>>> print result[0].data
bar
baz
quux
>>>
--
Uche Ogbuji Fourthought, Inc.
http://uche.ogbuji.net http://4Suite.org http://fourthought.com
Track chair, XML/Web Services One Boston: http://www.xmlconference.com/
The many heads of XML modeling - http://adtmag.com/article.asp?id=6393
Will XML live up to its promise? - http://www-106.ibm.com/developerworks/xml/li
brary/x-think11.html
|