xml-dev - Re: [xml-dev] XPath 2.0 - What is a "node"? What is an "item"?

Re: [xml-dev] XPath 2.0 - What is a "node"? What is an "item"?

[ Lists Home | Date Index | Thread Index ]

To: AndrewWatt2000@aol.com
Subject: Re: [xml-dev] XPath 2.0 - What is a "node"? What is an "item"?
From: Jeni Tennison <jeni@jenitennison.com>
Date: Mon, 13 May 2002 16:40:31 +0100
Cc: xml-dev@lists.xml.org
In-reply-to: <16f.d912288.2a11237b@aol.com>
Organization: Jeni Tennison Consulting Ltd
References: <16f.d912288.2a11237b@aol.com>
Reply-to: Jeni Tennison <jeni@jenitennison.com>

Hi Andrew,

> In 1. Introduction it is stated that "The data model is based on the
> Information Set". I took that to indicate that XPath 2.0 Data Model
> incorporates all of the Infoset REC.

I think that would be overstating the relationship. The XML Infoset
includes a number of information items that aren't included as nodes
in the XPath 1.0 or 2.0 data model, for example unexpanded entity
reference information items, unparsed entity information items,
document type declaration information items. The XPath 1.0 and 2.0
models are *based on* the Infoset, but they are not equivalent to the
Infoset (either the XML Infoset or the PSVInfoset).

> However, in 4.1 the description of a document node omits several
> properties of the document information item as described in the
> Infoset REC.

Yes. Specifically a document node:

  - doesn't have a document element, since XPath 1.0 and 2.0 document
    (root) nodes can have more than one element children

  - doesn't have notations, although arguably since XML Schema has an
    xs:NOTATION data type it perhaps should

  - doesn't have unparsed entities, although arguably since XSLT needs
    to be able to get hold of the URIs of unparsed entities (with the
    unparsed-entity-uri() function, it should

  - doesn't have a character encoding scheme, since XPath doesn't
    care what encoding a document uses (and a "document" in XPath
    terms might not have an encoding at all, if it is never
    serialized)

  - doesn't have a standalone flag, since XPath resolves any
    references, so a document by definition must be standalone

  - doesn't have a version, since XPath is built on top of XML 1.0
    so assumes XML 1.0

  - doesn't have a all declarations processed flag, since XPath
    doesn't care, I think.

> Further, it is stated in 1. that "An item is either a node or an
> atomic value.". I read that to refer to an "information item".

I think that at that point, the WD is talking about "an item" as in
"an item in a sequence". I agree that it's a little confusing, but the
context is fairly clear:

  Every value handled by the data model is either a sequence of zero
  or more items, or an error. An item is either a node or an atomic
  value.

Perhaps the second sentence should be changed to read "An item in a
sequence is either a node or an atomic value."
  
> If that is the case then an "information item" is essentially
> identical to a "node".
>
> However an Infoset "information item" has a number of properties
> which a "node" at least as described in XPath 1.0 does not possess.
> So, it seems that an XPath 2.0 node is fundamentally different from
> an XPath 1.0 node in that it now possesses a full set of Infoset
> properties.

Yes. Nodes are not synonymous with information items.

> Yet in 4.1 it is stated "Document nodes and XPath 1.0 root nodes are
> essentially identical.".

Yes -- that's saying that root nodes in XPath 1.0 and document nodes
in XPath 2.0 are the same thing. It isn't drawing a comparison between
document information items and document nodes.

> Yet if an XPath 2.0 document node "is" (as quoted from 1. above) an
> "item" and if an "item" is intended to be the same as an Infoset
> "information item" it is not possible for an XPath 2.0 document node
> (which must possess Infoset properties) to be "essentially
> identical" to an XPath 1.0 root node (which possessed no Infoset
> properties).

Nodes have properties and Infoset information items have properties.
The relationships between a node's properties and an information
item's properties is sometimes 1:1, or sometimes a node's properties
are derived in some other way from the information item's properties.

The XQuery/XPath data model tries to explain how you get from the
Infoset (specifically the PSVI) to the data model that's used by
XQuery/XPath. This is an attempt to tie it in with the other specs;
otherwise you have the situation as in XPath 1.0 where XPath has its
own data model, slightly different from the DOM data model, slightly
different again from the Infoset.

> I hope I have conveyed something coherent of what I perceive as the 
> inconsistency of descriptions.

I think the description's improved no end from the description in the
last draft, but that doesn't mean it's perfect yet. Do you think it
would make it clearer if the description of the XPath data model was
given separately from the descriptions that link the XPath data model
into the Infoset? I think it's important for the linkages to be there,
but I agree that as a user trying to get a handle on what the XPath
data model looks like, you need to have a fair amount of background
knowledge.

> My questions now include, What did the WG intend to say about the
> relationships of an XPath 2.0 "node" and an XPath 2.0 "item"?

An XPath 2.0 "item" is roughly the same as an XPath 1.0 "object".
Objects in XPath 1.0 were broken down into four sorts: node sets,
strings, numbers and booleans. Items in XPath 2.0 are broken down into
nodes and atomic values. But what you're really manipulating all the
time are sequences of items (a sequence containing one item is treated
just the same as the item on its own).

> Does an XPath 2.0 node have (or not have) a full complement of
> Infoset properties?

No, it has a filtered set (actually of PSVInfoset properties)
applicable to XPath 2.0 processing.

> How, precisely, does an XPath 2.0 node differ from an XPath 1.0
> node?

Not a huge amount. The main differences are:

- Nodes have typed values as well as string values. The typed value
  of an element or attribute is the sequence of atomic values that it
  contains. So for example if you have <date>2002-05-13</date> then
  the typed value of the date element node might be the date
  2002-05-13 (only *might* be, because it depends on what validation
  has taken place)

- Nodes have a type; again, this is only relevant for element and
  attribute nodes. The type of a node is the name of the complex or
  simple type against which it's validated according to the schema or
  DTD.

- Nodes have a unique ID, which is only relevant for element nodes;
  I don't see the point of it myself, but I guess someone thought it
  was worthwhile.

The other major difference is that namespace nodes are now shared by
element nodes rather than being associated with particular element
nodes in the way that they were. This leads to some backwards
compatibility problems (some of my stylesheets use namespace nodes to
resolve qualified names) but with the extra QName support from XPath
2.0, that shouldn't be a problem.

Does that clarify things?

Cheers,

Jeni

---
Jeni Tennison
http://www.jenitennison.com/

References:
- XPath 2.0 - What is a "node"? What is an "item"?
  - From: AndrewWatt2000@aol.com

Prev by Date: Re: [xml-dev] How to spell "No PSVI" in XSLT 2.0 ?
Next by Date: RE: [xml-dev] organization (was RE: frustration)
Previous by thread: XPath 2.0 - What is a "node"? What is an "item"?
Next by thread: Re: [xml-dev] How to spell "No PSVI" in XSLT 2.0 ?
Index(es):
- Date
- Thread