Re: [xml-dev] XML Redux

Kurt Cagle
Invited Expert,�Forms Working Group, W3C

443-837-8725

On Tue, Feb 15, 2011 at 1:01 PM, Kurt Cagle <kurt.cagle@gmail.com> wrote:

I like Dave Pawson's use of the <> as formal markup delimiters, but I'd still kind of point to the XQuery XDM and question whether, with a few syntactic shortcuts you couldn't get something that still satisfies the XDM while at the same time giving you a JSON-esque notation. Consider the following:

("This is a test",<foo>This is <bar>an element</bar> inside an element</foo>,12,25,<bin bat="term">More text</bin>)

Rewrite this in XQuery constructor notation:

("This is a test", element foo {('This is ',element bar {'an element'},' inside an element.')},12,25,element bin {(attribute bat {"term"},"More text"}))

Replace element foo with *foo: (), attribute bar with @bar: () :

("This is a test",*foo: ("This is ",*bar: ('an element'),'inside an element'),12,25,*bin: (@bar: "term","More text"))

You could even go a step further by assuming that the constructs *foo: () automatically "escapes out" of text. Additionally sequence items that need to be separated could be placed in a [] structure:

(This is a test *foo: (This is *bar: (an element) inside an element),[12,25],*bin: (@bar: (term) More text))�

HTML would be encoded as *html: (*head: (*title: (This is the top title) *link: (@rel: (stylesheet) @href:(my.css)) *body: (*h1: (This is the page title) *p:(This is a *b: (test).)))

Finally, it may be possible to eliminate the * notation altogether:

html: (head: (title: (This is the top title) link: (@rel: (stylesheet),@href: (my.css)) body: (h1: (This is the page title) p: (This is a b: (test).)))

This doesn't break XML, beyond the document vs. grove issue (which has always been one of the more questionable characteristics of the XML spec), is compact, more or less readable, and can be readily mapped to JSON. For instance, consider a structure of the form:

For instance, a list of strings could be differentiated with []:

colors: (["red","green","blue","yellow"])

JSON would interpret this as:

{colors: ["red","green","blue","yellow"]}

while XML would interpret it as

<colors xsi:type="xs:NMTokens">red green blue yellow</colors>

or, worst case:

<colors>
�� <xml:null>red</xml:null>
�� <xml:null>green</xml:null>

�� <xml:null>blue</xml:null>
�� <xml:null>yellow</xml:null>

</colors>

(The case of a list of strings is one where the XDM is superior to the serialization model, since the angle bracket serialization has no notion of the concept of a list).�

This is a declarative description, not a functional one, but that doesn't mean that you couldn't take advantage of XQuery like constructs:

let $title1 := "This is the top title"

let $title2 := "This is the page title"
let $page :=�html: (head: (title: ({$title1}) link: (@rel: (stylesheet) �@href: (my.css)) body: (h1: ({$title2}) p: (This is a b: (test).)))
return $page

and as white space isn't that much of a concern:

let $page :=�
�� html: (��
�� head: (

�� title: ({$title1})
�� link: (
�� @rel: (stylesheet)
�� @href: (my.css)
�� )
�� )
�� body: (

�� h1: ({$title2})
�� p: (This is a b: (test).)
�� )
�� )

Note that this is the primary reason why I haven't used the curly brace for this particular notation; it's become too thoroughly established as an escape mechanism for the underlying scripting environment.

Finally, taking Michael Kay's example:

{ �authors: [
� � � {name: "Michael Kay", affiliation: "Saxonica"},
� � � {name: "Liam Quin", affiliation: "W3C"}
� �]
� �abstract:<para { style : "bold" }>Here be some dragons</para>
� �content:<section { numbers : [1,1,2] }><para>...</para></section>
}

remap that in the above notation:

(authors: (
�� [null: (name: (Michael Kay) affiliation: (Saxonica)),�
�� null:�(name: (Liam Quin) affiliation: (W3C))]�

�� abstract: (para: (@style: (bold) Here be dragons)
�� content: (section: (numbers: ([1,1,2]) para: (...))
�� )

or, if you use the notation : ( by itself to indicate an "anonyous" class:

(authors: (
�� [ : (name: (Michael Kay) affiliation: (Saxonica)),�
�� : (name: (Liam Quin) affiliation: (W3C))]�
�� abstract: (para:(@style: (bold) Here be dragons)

�� content: (section:(numbers: ([1,1,2]) para: (...))
�� )

Seems pretty straightforward to me, should be fairly easily parseable, and has the advantage of being trivial to wrap within a string. Additionall, "foo : (bar)" is not exactly a common construct lexically, even without whitespace, and escaping it could simply involve the use of a construct such as `foo: (bar)`, with the "`" character indicating that the string should be interpreted literally.

The exact nature of the notation can be argued, but I think the important point to consider is that while the serialization model of XML is not fully congruent with JSON, XDM is. Which means that any discussion about a MicroXML needs to be looking at XDM, rather than the XML 1.0 serialization model, as the basis for that simplification.

This is something that I think has been missing in all of the discussions thus far. This is not a notational issue, it's a data modeling one. There are simply constructs that cannot be modeled readily in JSON that are easily rendered in XML angle bracket notation (ABN) and vice versa, because ABN has no mechanism for defining arrays that doesn't rely either upon a convention (white space NMTokens) while JSON notation for handling semi-repeating XML structures (such as <a>1</a><a>2</a><b>3</b><a>4</a>) can get hideously complex fast. Yet an XDM notation could represent both cases trivially.

Kurt Cagle
Invited Expert,�Forms Working Group, W3C
kurt.cagle@gmail.com
443-837-8725

On Tue, Feb 15, 2011 at 10:49 AM, Michael Kay <mike@saxonica.com> wrote:

But then looking at Mikes

{ �authors: [
� � � {name: "Michael Kay", affiliation: "Saxonica"},
� � � {name: "Liam Quin", affiliation: "W3C"}
� �]
� �abstract:<para { style : "bold" }>Here be some dragons</para>
� �content:<section { numbers : [1,1,2] }><para>...</para></section>
}

I'm not sure if content: is markup? I can see authors as a list..
Is content: wrapping<section/>

No. Just as

affiliation : "Saxonica"

is a name-value pair (within a map) where the name is affiliation and the value is a string, so

content: <section><para>...</para></section>

is a name-value pair (within a map) where the name is content and the value is a (textual) element.

This is what I mean about composability between structured data and marked-up text, without being forced to represent the structured data using syntax that was designed for textual markup. (Not dissimilar from putting XML in a column of an RDB, except that (a) the structured data part is more powerful than rows-and-columns, and (b) you can have structured data inside the text content as well as vice versa.)

Michael Kay

_______________________________________________________________________

XML-DEV is a publicly archived, unmoderated list hosted by OASIS
to support XML implementation and development. To minimize
spam in the archives, you must subscribe before posting.

[Un]Subscribe/change address: http://www.oasis-open.org/mlmanage/
Or unsubscribe: xml-dev-unsubscribe@lists.xml.org
subscribe: xml-dev-subscribe@lists.xml.org
List archive: http://lists.xml.org/archives/xml-dev/
List Guidelines: http://www.oasis-open.org/maillists/guidelines.php