xml-dev - RE: [xml-dev] Question for the XPath and DOM folks

RE: [xml-dev] Question for the XPath and DOM folks

[ Lists Home | Date Index | Thread Index ]

To: "Uche Ogbuji" <uche.ogbuji@fourthought.com>
Subject: RE: [xml-dev] Question for the XPath and DOM folks
From: "Dare Obasanjo" <dareo@microsoft.com>
Date: Sat, 20 Jul 2002 12:50:06 -0700
Cc: <xml-dev@lists.xml.org>
Thread-index: AcIwJVN+yGQoKxD5RieUBf0hvPdihAAATeoN
Thread-topic: [xml-dev] Question for the XPath and DOM folks

You missed the point of the question which is about the difference between the DOM and XPath data models. I know what the answer to the question is based on just the XPath data model. 

	-----Original Message----- 
	From: Uche Ogbuji [mailto:uche.ogbuji@fourthought.com] 
	Sent: Sat 7/20/2002 12:42 PM 
	To: Dare Obasanjo 
	Cc: xml-dev@lists.xml.org 
	Subject: Re: [xml-dev] Question for the XPath and DOM folks 
	
	

	> Given the following XML in a DOM document
	> 
	> <foo>
	> bar
	> <![CDATA[
	> baz
	> ]]>
	> quux
	> </foo>
	>
	> and the following XPath
	> 
	> //text()
	> 
	> what should be the resulting DOM nodes and why? I can think of two answers but they both have problems.
	> 
	>  PS: Why is http://www.w3.org/TR/2002/WD-DOM-Level-3-XPath-20020712/ returning a 404 when it is linked from  http://www.w3.org/DOM/ ?
	>
	
	XPath is defined against a certain model of an XML document.  The section that
	answers your question is 5.7:
	
	"Character data is grouped into text nodes. As much character data as possible
	is grouped into each text node: a text node never has an immediately following
	or preceding sibling that is a text node. The string-value of a text node is
	the character data. A text node always has at least one character of data.
	
	"Each character within a CDATA section is treated as character data. Thus,
	<![CDATA[<]]> in the source document will treated the same as &lt;. Both will
	result in a single < character in a text node in the tree. Thus, a CDATA
	section is treated as if the <![CDATA[ and ]]> were removed and every
	occurrence of < and & were replaced by &lt; and &amp; respectively."
	
	Therefore to a conforming XPath processor,
	
	<foo>
	bar
	<![CDATA[
	baz
	]]>
	quux
	</foo>
	
	Is precesely the same as
	
	<foo>
	bar
	baz
	quux
	</foo>
	
	i.e. one element node with one text node child.
	
	There is actually an open bug against 4XPath right now that it leaks a bit in
	this performance.  e.g. in some cases, it can return a text node child of an
	attribute when operating on a DOM (this is so in DOM but not XPath).  Your pos
	is a handy reminder for me to fix this bug.
	
	As an illustration, here's a session with 4XPath does (interactive Python
	prompt):
	
	>>> DOC = """<foo>
	... bar
	... <![CDATA[
	... baz
	... ]]>
	... quux
	... </foo>"""
	>>> from Ft.Xml.Domlette import NonvalidatingReader
	>>> doc = NonvalidatingReader.parseString(DOC, "http://dummybaseuri.com";)
	>>> from Ft.Xml.XPath import Evaluate
	>>> result = Evaluate("//text()", contextNode=doc)
	>>> print result
	[<cText at 0x81ae434>]
	>>> print result[0].data
	
	bar
	
	baz
	
	quux
	
	>>>
	
	
	--
	Uche Ogbuji                                    Fourthought, Inc.
	http://uche.ogbuji.net    http://4Suite.org    http://fourthought.com
	Track chair, XML/Web Services One Boston: http://www.xmlconference.com/
	The many heads of XML modeling - http://adtmag.com/article.asp?id=6393
	Will XML live up to its promise? - http://www-106.ibm.com/developerworks/xml/li
	brary/x-think11.html

Follow-Ups:
- RE: [xml-dev] Question for the XPath and DOM folks
  - From: "Michael Kay" <michael.h.kay@ntlworld.com>
- Re: [xml-dev] Question for the XPath and DOM folks
  - From: Uche Ogbuji <uche.ogbuji@fourthought.com>

Prev by Date: Re: [xml-dev] Question for the XPath and DOM folks
Next by Date: Re: [xml-dev] Question for the XPath and DOM folks
Previous by thread: Re: [xml-dev] Question for the XPath and DOM folks
Next by thread: Re: [xml-dev] Question for the XPath and DOM folks
Index(es):
- Date
- Thread