XML's expressive gap: what do we do about the handwaving parts

RDF, JSON, DITA (and the "Bolognese" school of document analysis) all show in their different ways that XML has an expressive gap, where we resort to handwaving and claims of it-is-obvious-from-the-element-name..

I dont think this gap is "semantics", "datatyping" or "rhetoric". I'd say it is algebraic or generic structure.

Lets imagine a universal architecture (this is rough): an attribute set we can imply onto every element:

<!ATTLIST ALL

ur-type ( div | p | span | field | marker )

role ( data | metadata | comment | pi)

parallel ( yes | no )

formal (yes | no) >

where the @ur-type selects element content, mixed content, inline mixed content, data content, empty;

and the @formal says that the element should have information for a title, or name, or number, ID, etc

html:p ->

ur-type=p formal=no role=data collection=unordered parallel=no class=p

html:table ->

ur-type=div formal=yes role=div collection=array parallel=yes class=table

What would this give us?

It provides better hints an application needs for databinding: efficient storing and access. For example, that the contents of an element are likely to be accessed sequentially, or by index, or randomly.

It provides more hints that can be used to stub out, fillout and validate transformations and stylesheets: an element with ur-type of p in the input cannot be an ur-type marker in the output, for example. I have seen many examples of xslts written where some element if one ur-type was eroneoysly mapped to some element of another ur-type.

Or hints that an element is a cohesive addressable unit (formal).

Doesnt XML schemas provide this? Maybe the ur-type can be implied, but nothing else. Does't RDF provide much of this? RDF:Bag or whatever? Certainly, but RDF is overkill as it is not limited to universal generic properties of elements.

So we have these things like ordering, or whether an element has data or metadata, and we currently have no standard vocabulary/language to declare it. (Or do we?)

Cheers

Rick

On Thu, 3 Feb. 2022, 03:31 Chet Ensign, <chet.ensign@oasis-open.org> wrote:

I am posting this on behalf of Mr. Hans-Jürgen Rennau while we debug a problem with his emails being posted to the list.

---

Assume four datasets: an XML document, a JSON document, a CSV file and an HTML document (authored near the north pole, in the rain forest, in Athens and in the Antarctic, respectively).

Imagine a standard which enables you to define the mapping of a document node to a set of RDF triples.

Remember that all documents (XML, JSON, CSV, HTML) can be parsed into document nodes (for example see [1]).

Assume that the RDF graphs obtained from our documents contain the following triples:
foo:oxygen foo:symbol "O"
foo:oxygen foo:numberOfElectrons "8"
foo:oxygen foo:atomic mass "16"
foo:oxygen foo:electronegativity ."3.5"

each one found in a different one of the four RDF graphs.

Then we have integrated information, as we now know four things about oxygen, contributed by different data sources using a different data format. Of course it would be easy to serialize the integrated information into XML, or JSON, or CSV, or HTML or any other format (employing Inuit or any other natural language).

+ + + - - -

But I suppose you think this is an idle dream. Perhaps you think that the imagined standard would not be feasible to create or to use, or you question the practicality to leverage RDF IRIs for identifying resources and properties in more than a few specific cases.

Unfortunately I agree that it is an idle dream. Only the reason I see is a different one, as I am convinced that the imagined standard is not too difficult to create and to use and I do not question the practicality of using RDF IRIs in many fields, including natural science, pharmacology, health care, finance, many verticals and economical interaction. The reason I see is that it seems impossible to find minds with a deep interest in both, XML technology and semantic technology. if - then - but.

With kind regards,
Hans-Jürgen Rennau

[1]
https://www.w3.org/TR/xpath-functions-31/#func-doc
https://docs.basex.org/wiki/JSON_Module#json:doc
https://docs.basex.org/wiki/CSV_Module#csv:doc
https://docs.basex.org/wiki/HTML_Module#html:doc

--
Chet Ensign
Chief Technical Community Steward
OASIS Open

+1 201-341-1393
chet.ensign@oasis-open.org
www.oasis-open.org