Re: [xml-dev] XPath and a continuous, uniform information space

How very interesting, David! Your posting helped me very much to get a picture of the direction in which you are looking.

This direction seems to me closely related to the concept of maps, as introduced by XSLT 3.0. The fundamental difference between node+children vs. map+entries is that a node can be an entry in multiple maps - this corresponds to your sentence:
"A node *itself* can be shared as well."
which liberates us, enabling the agile generation of new structure without destroying old structure.

I regard maps as an extremely important extension of the XDM and I wait impatiently for the XQuery Working Group to accept them. Maps allow for the flexibility of combining and recombining nodes into structures (from flat to deep) which is really missing when we regard structure to be *only* representable by a (conventional) node tree, with its single parent principle.So I emphatically agree that much is to be gained by introducing maps into the XDM model, and consequently also into the info space model implied by it.

My understanding of the "purpose" of XSLT maps is to provide (a) intermediaries, (b) a logical representation of JSON (and similar) resources. Full stop. This implies an asymmetry in the relationship between maps and nodes: a node can be a map entry (so important!), but a node cannot "have" a map - the model of node properties does not provide for maps. XSLT 3.0: JSON resources *are* (nested) maps and *have* no nodes, XML resources are nodes and have no maps. A pity! Both, maps and nodes offer so much, and the question is whether we can overcome this kind of "choice". Whether our node model itself might not become richer and more inclusive.

Your posting sounds to me as if you do not think of *replacing* the conventional node model by a map model, but to complement it, right? Replacement would be simple to put into practise - and awfully disappointing. Just start to model your data without node trees (documents), use nested XSLT maps for any kind of structure. Or use JSONiq and model all data in JSON. I prefer to skip such an experiment...

Is there no way how we can lift the map model of XDM *into* the node model? Starting point: the set of node properties defined by the XDM. Approach: add a property pulling map structure into the information model of a node. (And my suggestion would be to let syntax enter the picture only later - to start thinking in terms of a pure model.) See below, experiments #1 and #2 to get the idea.

And then there is also an alternative to consider (see experiment #3): not to pull maps into the node model (in other words: let the nodes themselves be "unawares" of their participation in nested map structures), but introduce map structure "from without", perhaps introducing a new kind of resource which defines a tree or graph in terms of nested maps, "reusing" the nodes provided by the primary node forest. (Again as you wrote, David: "A node itself can be shared.")

Experiment #1: An element node is all that it used to be up to now, and additionally it *is* a map which may or may not contain entries (which may in turn be nodes). Translated into properties: an element node has - besides [children], [parent] and [attributes] a new property [entries], holding the entries of the map which the node *is*. Imagine - any node forest (set of documents) would implicitly be a set of graphs to be obtained by "following" the [entries] property.

Experiment #2: As it seems to be rather arbitrary to let elements have (or be) exactly one map, one may try a new node model in which the node is not a single map in the sense of experiment #1, but can have any number of maps, each one identified by a URI. Each URI-associated map could be interpreted as a graph node in a graph identified by that URI. Translated into properties: an element may have an additional property [graphs], which is a nested map, mapping URIs to XSLT 3.0 maps. A particular graph (or tree) is identified by a URI U and can be discovered by following the entries of the map identified by the graph URI U.

Experiment #3: In experiments #1 and #2 the single map - or the set of maps - were a part of the node data, the node itself "decided" in which graphs to participate. There is something limiting about that. We may want to define new structure after-the-fact: without disturbing the nodes pulled into the graph, like a map does not disturb its entries. So an alternative would be to impose such nested maps purely from without - our info space would be a node forest *and* a collection of graphs connecting (reconnecting) nodes of that forest. But I have no idea how to define such graphs. I feel it would be a new kind of resource whose information content would be a graph definition interconnecting conventional nodes.

Do you feel the spirit of the whole approach? Not replace our node model, but to add to it - I am convinced that this is the way to go. And I am excited about how you and the other participants in this discussion gradually expose a new potential which might be tapped by introducing additional structure into the node forest which is and remains (if only for me) the ground layer from which to unfold the space.

Cheers,
Hans

Von: David Lee <dlee@calldei.com>
An: "xml-dev@lists.xml.org" <xml-dev@lists.xml.org>
Gesendet: 16:09 Samstag, 17.August 2013
Betreff: RE: [xml-dev] XPath and a continuous, uniform information space - Recap

=== Hans

David,

I have difficulties to understand this direction of the discussion, let me explain why.

A motto of mine is: "A node is both, content and location" - and I believe that this is a miraculous formula, we have hardly scratched the surface of its potential. The association of information with location is a principle as fundamental as the association of data and methods (object orientation).

I do not think you want to get rid of "location". But what is this, location? Some models: a register; a memory address; a variable name; a combination of database/table/column name; a RDF triple subject URI; a NOSQL object and a data path; a node.

----------

Here is what I am contemplating. Note that my recent ramblings on this list are a "Mental Experiment" to see to me and hopefully get feedback, what our systems
Might look like if we changed a few things ... whether that means a new markup or a change to an existing one I am not suggesting either. Just rather a step back inspired by
Hans's "Info Space" and Mike's "FtanML" and a few other thoughts that have been building in me.

Now to Hans's point.
I am suggesting that *perhaps* a better model/motto of a "Node" is "A node is content and possibly accessible by location".
First, even in XML as it stands nodes do not necessarily have a location so your current motto doesn't reflect current reality.
I.e. in both XSLT and XQuery (and as was shown by Dimitre Novatchev, even in pure XPath) ... let alone any programming language that can make XML out of nothing ...
Nodes don't always have location. So we already have broken this motto.
What I am suggesting is that's not bad, its vitial !
I agree that its vastly useful that you *can* get to a node via a "location" (either an explicit URL, query, or axis from another node), but I am willing to concede, if it provides value,
That we don't have to permanently assign a location to a node *NOR* do we have to impose the constraint that a node has only one location or identity.

What would this buy us ?
Nodes that are fetched, created, or extracted are of the same kind and have the same properties. A more consistant node model.
You can use a URI to *get* or *create* a node but you cant then assume that node is associated with the URI after that (it may be dynamic, or may have multiple URI's).
So in a sense nodes can be retrieved/factored by "location" but then they become free of that association.
Nodes can be shared. Imagine a representation of a typical BTree where each node is simply ( left, right , value ) or another kind of tree where a node is ( value , list of children ) ...
If we don't enforce or expose or even implement node identity in the model, and don't have parents , then a nodes value can be shared in different trees. A node *itself* can be shared as well.
This has tremendous performance advantages both in-process and in a distributed system.
In a distributed system it means we can construct single trees out of collections of child nodes of any size and location and their contents can be other nodes of any size and location (permanent or dynamic)
With no reguard at all to enforcing parentage or uniqueness. This is in fact the model of resources in the web. Does a web page have a parent ? Can it be shared ?
So this means we can not only access nodes from anywhere (a file, constructed, the web, out of thin air) but that any tree can be created composing any of these nodes into another node, hierarchieally
And lazily.
No distinction/difference between nodes that came from different kinds of constructors (documents, uris, other nodes, queried from DB's, dynamically created) - All the same and all equal model.

What we loose
No parent axis
No node identy (It is useless or misleading to ask if 2 nodes are the same, only if they are value equal)
No sibling access. (without parent access you cant get siblings. You can however, ask for siblings within the context of a parent).
No document order. (Order is only implied by walking children of a node, a node itself since it can be shared may have many orderings)

I actually think this fits the concept of Hans' "Info Space" extremely well. Better even ...

Is it useful ? Are the benefits enough to accept the loses ?

-David