xml-dev - Re: [xml-dev] Question for the XPath and DOM folks

Re: [xml-dev] Question for the XPath and DOM folks

[ Lists Home | Date Index | Thread Index ]

To: "Dare Obasanjo" <dareo@microsoft.com>
Subject: Re: [xml-dev] Question for the XPath and DOM folks
From: Uche Ogbuji <uche.ogbuji@fourthought.com>
Date: Sat, 20 Jul 2002 13:42:16 -0600
Cc: xml-dev@lists.xml.org
In-reply-to: Message from "Dare Obasanjo" <dareo@microsoft.com> of "Sat, 20 Jul 2002 11:19:25 PDT." <8BD7226E07DDFF49AF5EF4030ACE0B7E06621DFF@red-msg-06.redmond.corp.microsoft.com>
Sender: uche.ogbuji@fourthought.com

> Given the following XML in a DOM document
>  
> <foo>
> bar 
> <![CDATA[
> baz 
> ]]>
> quux
> </foo>
> 
> and the following XPath 
>  
> //text() 
>  
> what should be the resulting DOM nodes and why? I can think of two answers but they both have problems. 
>  
>  PS: Why is http://www.w3.org/TR/2002/WD-DOM-Level-3-XPath-20020712/ returning a 404 when it is linked from  http://www.w3.org/DOM/ ?
> 

XPath is defined against a certain model of an XML document.  The section that 
answers your question is 5.7:

"Character data is grouped into text nodes. As much character data as possible 
is grouped into each text node: a text node never has an immediately following 
or preceding sibling that is a text node. The string-value of a text node is 
the character data. A text node always has at least one character of data.

"Each character within a CDATA section is treated as character data. Thus, 
<![CDATA[<]]> in the source document will treated the same as &lt;. Both will 
result in a single < character in a text node in the tree. Thus, a CDATA 
section is treated as if the <![CDATA[ and ]]> were removed and every 
occurrence of < and & were replaced by &lt; and &amp; respectively."

Therefore to a conforming XPath processor,

<foo>
bar 
<![CDATA[
baz 
]]>
quux
</foo>

Is precesely the same as

<foo>
bar 
baz 
quux
</foo>

i.e. one element node with one text node child.

There is actually an open bug against 4XPath right now that it leaks a bit in 
this performance.  e.g. in some cases, it can return a text node child of an 
attribute when operating on a DOM (this is so in DOM but not XPath).  Your pos 
is a handy reminder for me to fix this bug.

As an illustration, here's a session with 4XPath does (interactive Python 
prompt):

>>> DOC = """<foo>
... bar
... <![CDATA[
... baz
... ]]>
... quux
... </foo>"""
>>> from Ft.Xml.Domlette import NonvalidatingReader
>>> doc = NonvalidatingReader.parseString(DOC, "http://dummybaseuri.com";)
>>> from Ft.Xml.XPath import Evaluate
>>> result = Evaluate("//text()", contextNode=doc)
>>> print result
[<cText at 0x81ae434>]
>>> print result[0].data

bar

baz

quux

>>>


-- 
Uche Ogbuji                                    Fourthought, Inc.
http://uche.ogbuji.net    http://4Suite.org    http://fourthought.com
Track chair, XML/Web Services One Boston: http://www.xmlconference.com/
The many heads of XML modeling - http://adtmag.com/article.asp?id=6393
Will XML live up to its promise? - http://www-106.ibm.com/developerworks/xml/li
brary/x-think11.html

Follow-Ups:
- UNSUBSCRIBE UNSUBSCRIBE UNSUBSCRIBE Re: [xml-dev] Question for the XPath and DOM folks
  - From: Edward Gloor <egloor@qwest.com>

References:
- Question for the XPath and DOM folks
  - From: "Dare Obasanjo" <dareo@microsoft.com>

Prev by Date: Re: [xml-dev] Question for the XPath and DOM folks
Next by Date: Re: [xml-dev] Question for the XPath and DOM folks
Previous by thread: Re: [xml-dev] Question for the XPath and DOM folks
Next by thread: UNSUBSCRIBE UNSUBSCRIBE UNSUBSCRIBE Re: [xml-dev] Question for the XPath and DOM folks
Index(es):
- Date
- Thread