RE: [xml-dev] List of differences between XML and JSON?

Hi Folks,

Hans-Juergen's message is brilliant! See my thoughts in red.

------------------------------

XML and JSON are data formats defined in terms of a syntax. The syntax does not define the information content.

The following is merely a string of characters:

The XML specification says that applications should interpret the pointy brackets as things to be called "tags." Inside the pointy brackets is a substring that is to be called a "tag name." Sandwiched between the tags is something called "content." And so forth.

There is nothing in the XML specification about parent axis, child axis, or any other kind of axis. There is nothing about hierarchy, containment, or parent/child relationship. There is nothing about nodes or trees. The XML specification says that the above is just a string of characters containing symbols that software programs can be pre-programmed to interpret in the way described by the XML specification.

The syntax does not define the information content. This is left to an information model to be applied to the syntax.

Beautifully stated!

XML standards offer two information models: the information set and the XDM. For practical purposes, I think one should only consider the XDM.

The XML specification defines a term called "parent", and a term called "child". But those are merely terms of convenience for discussing nested tags. That nesting terminology is purely context-free grammar stuff. Nothing exciting here – no "information" specification here.

A specification completely separate from the XML specification, called the XQuery and XPath Data Model (XDM) says that the above string can be treated as more than just a bunch of self-organized pointy bracket clusters. The XDM says that applications should treat the tags as "element nodes" with "child nodes." Thus emerges the notions of trees and hierarchies and containment.

But, but, but … the XDM is, as far as the XML specification is concerned, a fiction. That is, as far as the XML specification is concerned, all this XDM verbiage about nodes and trees and axes is pure fiction.

So we need to compare apples to apples: XML versus JSON (not XML+XDM versus JSON). And when we make this proper comparison, there is very little difference between XML and JSON. They both apply to text. Applications that are pre-coded to the XML specification will treat pointy brackets as meaningful. Applications that are pre-coded to the JSON specification will treat curly braces and square brackets as meaningful. Pretty insignificant differences, I think.

So an XML document is a tree of nodes, where each node is an entity with a precisely defined list of properties. The information content of a document *is* this tree.

Concerning JSON - is there any standardized information model? The XQuery 3.1 approach of maps and arrays is just one possible information model, not *the* information model.

Why would one want to apply XQuery concepts to JSON?

Hence Dimitre's statement that in JSON subtrees can be shared makes (to me) only sense when equating JSON to a particular information model. (Which one? XDM maps/arrays? _javascript_ objects?) This can be quite misleading. There is no basis for such a statement in the syntax, just as there is no basis in the XML syntax for excluding the sharing of subtrees. XML might be modelled without nodes, JSON can definitely be modelled with nodes.

I do not believe that there is any substantial difference as to what XML and JSON "is". Both are about instantiating information. More specifically, both are languages for the hierarchical organization of named data. There are different uses for both formats, and it is downright wrong to equate any of the formats with a particular use. So, for example, XML and JSON can be used for information exchange. Both can be used for persisting object data. Etcetera.

My conclusion: in order to avoid confusion, better (a) distinguish clearly between language syntax and information models used to define the information content of syntax instances, (b) distinguish clearly between a data format and the uses it can be put to.

Fantastic message Hans-Juergen - thank you!

Cheers,

Hans-Juergen