Lists Home |
Date Index |
7/20/2002 4:28:44 PM, Uche Ogbuji <email@example.com> wrote:
>> "The XPath model relies on the XML Information Set [XML Information set]
>> ands represents Character Information Items in a single logical text node
>> where DOM may have multiple fragmented Text nodes due to cdata sections,
>> entity references, etc. Instead of returning multiple nodes where XPath sees
>> a single logical text node, only the first non-empty DOM Text or
>> CDATASection node of any logical XPath text will be returned in the node
>Yikes! This is a *very* *very* bad job. Luckily that spec is still a WD and
>I hope they'll fix it before release. If they can't do better than that then
>they should just leave DOM/XPath interaction to application specifics.
Well, it was a VERY VERY VERY bad job for the "W3C" (if one can think of it
as a unified entity rather than a collection of working groups made up
of competitors, loosely coordinated by the staff and director)
to have created the situation where there are multiple,
inconsistent data models defined by various XML-related Recommendations. I'm
not sure why it happened, what should have been done differently (in 20:20
hindsight, of course), and how it will be resolved, but this is a really
nasty situation. There is a lot of guilt to share, and yes Tom Bradford if you're
still out there, it is mostly my fault :~)
More importantly, the W3C has learned from this mistake, and I don't
think it would happen under the current organization and process.
In the long run my personal (and official corporate, FWIW) position is that the
data models MUST be reconciled, even at the cost of some backwards incompatibility.
("Re-breaking the bone so that it can heal cleanly" is my favorite metaphor here).
In the short run, it's not at all clear what is to be done. I/we do not want
to hold DOM Level 3 hostage to this, however, because it could take awhile ....
DOM Level 3 provides basically 2 ways to deal with this: Load-time options to
create an "InfoSet" view with no CDATA sections and unexpanded entity references,
and the XPath interfaces to allow one to essentially translate between the XPath
view of a document and the DOM view of a document. The key point is that an
XPathResult doesn't return "a" node, it returns a way of iterating across the
DOM view of the nodes corresponding to the XPath view of the nodes.
That's what the "manually gather" bit means here:
>> Applications using XPath in an environment with fragmented text nodes
>> must manually gather the text of a single logical text node possibly from
>> multiple nodes beginning with the first Text node or CDATASection node
>> returned by the implementation."
Just to make life more interesting, there's a couple more issues to wrestle with:
how to map the XPath "nodes have a namespace property" view onto the DOM "namespace
declaration nodes upwards in the tree define the namespace a node is in" view; and
how to deal with the fact that the XPath data model is in flux. I'm sure that the
DOM and XPath groups would appreciate any constructive suggestions on how to
reconcile all this for the benefit of users ... or whether to just fuggitaboudit
because it's essentially impossible to reconcile all the irreconciliable factors.