OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]
Re: [xml-dev] YPath, ZPath?

Thank you Michael for the explanation!

As I presented this at XML Amsterdam last November (http://www.agencexml.com/amsterdam2015/Processing_non-XML_sources_as_XML.pdf), in my implementation, I have added a node type for array, another one for map and another one for entry.

Actually, I am more concerned about big CSV files than JSON. Without a header, CSV can be interpreted as an array of arrays. With a header, as an array of maps. And, with a header and a key, as a map of maps... W3C CSV Group also specified how to associate a type to a column!

I have not implemented a different node type for each atomic type. Instead, the type info is just added to the text node.

It was then simpler to start to implement an XPath/XQuery-like engine which only manipulates nodes, even for constant values in expressions or for resulting values (such nodes don't have a document owner, nor a parent node). I also added a node type for sequence (always with, at least, 2 child nodes), a null value being used to represent the empty sequence.

About mixing XML nodes and others, I have experimented an XForms page splitting text nodes into arrays then treating items separately. If useful, It was also easy to define an XML notation for serializing such mixed trees: typically with an element for each node type, for example exml:document, exml:map,...

About mutability, I just can say that, in XForms context, data is there to be edited and might not always be valid during the process!

So, I know that I am not writing an engine which will be in conformance with the future recommendation but it will allow me to have more power, with maybe/probably less performance.

Best regards,

Alain Couthures

Le 12 avril 2016 à 10:41, Michael Kay <mike@saxonica.com> a écrit :

(c) The node sub model of XDM was designed for processing XML, but must not be extended because it was designed for processing XML [your argument?]

We spent a long time weighing up all the alternatives on this, and ended up deciding that maps and arrays were sufficiently different from anything in the 2.0 model that they needed to be handled differently.

Firstly, they aren't like any of the existing kinds of XML node, so if we had made them nodes, we would probably have needed to introduce two new node kinds (maps and arrays); or possibly three, since there's an argument for treating a (key, value) pair within a map as a separate node in the tree.

Secondly, the leaves in a JSON tree are booleans, strings, and numbers, so you have to consider whether these should be nodes as well. And if boolean, strings, and numbers become nodes, what about the other 16 atomic data types? And what about sequences?

Nodes in the XDM model have creation-based identity; do you really want ("foo" is "foo") to return false? And closely related to this, nodes in the XDM model have parents: do you really want to the leaf nodes (strings and booleans) to be able to ask what the parent of a number is?

Now, we could have made the question of upwards navigation orthogonal to other questions, allowing any tree to be built with or without parent pointers. And that is one of the very many ideas that was considered and discarded.

If maps, arrays, and possibly KV pairs are modelled as nodes, that immediately raises the question whether these nodes can be mixed in the same tree as the traditional XML-based nodes. If they can't be mixed, then you essentially have Xtrees consisting of Xnodes and Jtrees consisting of Jnodes, and the model is not very different from what it is today except in terminology. If they can be mixed, then a whole new set of questions opens up. Can an attribute value be an array containing nested arrays? If so, how to you create such an animal, and how do you process it once you have done so?

But I think the critical questions are not terminology, but the question of whether maps and arrays have creation-based identity, have parents, and are mutable or immutable. When we first introduced maps in the XSL WG, it was not to support JSON, but rather to provide a lightweight data structure for retaining information during streamed processing of large XML source files. It was quite clear from the use cases we studied that XML trees were not suitable for this purpose, because the costs of modification are too high. We made a policy decision to retain the principle that all data structures are immutable, so we needed a data structure where small changes can be made efficient without introducing mutability, and maps (without parent pointers) met that need.

When the design was later debated in the XQuery WG the questions of mutability and identity came up again, mainly because XQuery WG folks are interested in databases. Unfortunately there haven't been enough people in the WG interested in XQuery Update to pursue the question of making maps and arrays persistent objects that can be updated in a database (though of course the JSONiq implementors each have their own solutions in this area). Personally I think the right direction for database technology is immutability: you only add data, you never delete anything. But I haven't had the time or energy to pursue that area.

It has to be said that the current collection of data structures available in XDM is a bit of a ragbag. But I don't think I know of any programming environment that has managed to avoid that. It's certainly no worse than SQL. If we started again we certainly wouldn't reinvent XML in its current form, and we could do something much cleaner. But it's rare to have that luxury.

Michael Kay

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 1993-2007 XML.org. This site is hosted by OASIS