Re: [xml-dev] Imagine

Am Montag, 14. Februar 2022, 23:13:21 MEZ hat Webb Roberts <webb@webbroberts.com> Folgendes geschrieben:

During my time working on NIEM (the National Information Exchange Model), we kept integration of XML and RDF as a core tenet. The goal was to ensure that XML data and schemas in the NIEM ecosystem represented RDF data. There were several major pieces to this:

- We defined RDF resource identifiers for each XML qualified name. This gave us RDF names for types, elements, and attributes in XML schemas and data. (see https://niem.github.io/NIEM-NDR/v5.0/niem-ndr.html#section_5.6.1)

- We defined a mapping from data that uses NIEM to RDF. Instance documents are RDF datasets. Element and attribute occurrences are RDF properties. Most elements are subject-predicate-object triples. Some elements are RDF quads. Attribute and element values are RDF literals. (see https://niem.github.io/NIEM-NDR/v5.0/niem-ndr.html#section_5.6.3)

- We defined a mapping from XML schemas using NIEM to RDF schema. Complex types are RDF types. XSD type derivation reflects rdfs:subClassOf. Element and attribute declarations are RDF properties. Instance data has corresponding RDF types. (see https://niem.github.io/NIEM-NDR/v5.0/niem-ndr.html#section_5.6.4 & 5.6.5)

- We maintained rules about how XML and XML Schema were used that let us maintain the relationship between XML and RDF.

One consequence of the XML+XSD mapping to RDF is that it was very straightforward to use JSON-LD as a standard JSON representation for NIEM data, rather than construct a new mapping between XML and JSON. (see https://reference.niem.gov/niem/specification/json/5.0/niem-json-spec-5.0.html)

Webb Roberts

webb@webbroberts.com

On 2022-02-13, at 22:48, Dan Brickley <danbri@danbri.org> wrote:

As has already been pointed out - this might well seem dreamy on XML-DEV but in the RDF world it's pretty much what drew most of us to the technology.

Most RDF toolkits try to make it easy to consolidate information from various sources and formats into its common graph model. They will usually do some subset of the more explicitly RDF-flavoured formats, e.g. RDF/XML, Turtle, RDFa, JSON-LD, N-Triples, Trig, ... etc. But there will also be an API that can be called programmatically, to create triples from anything you have programmatic access to. You'll find XML adaptors of various kinds (XSLT being the most obvious). Back in the day there were angsty debates about schema annotation for mapping to triples. For example see https://www.w3.org/2003/02/schema-annotation
http://cmsmcq.com/2002/schema-annotation.html#ab2b3b3b7
https://www.w3.org/2000/08/w3c-synd/
https://www.w3.org/TR/schema-arch/
https://www.w3.org/1999/04/WebData etc

... although those things never turned out to be as important and central as folks thought.

RDF folk spend much of their time moving all kinds of data into RDF graphs/triples. But so much of this grungy data cleaning work is necessarily custom, per-dataset, per-application, ... limiting the value of generic conversion tools.

At W3C we did have something called GRDDL that's also close to the picture outlined here - https://en.wikipedia.org/wiki/GRDDL - but in my experience it is rarely used.

Finally - there is a ton of non imaginary RDF data out there. You might look at Schema.org - widely used for in-page markup, e.g. see http://webdatacommons.org/structureddata/schemaorgtables/ or for what we've been up to at Google, https://developers.google.com/search/docs/advanced/structured-data/search-gallery

Or at Wikidata, whose data is available in RDF dumps https://www.wikidata.org/wiki/Wikidata:RDF or at query.wikidata.org via SPARQL.

For Oygen entity in Wikidata, or rather their page about it, see https://www.wikidata.org/wiki/Q629

- chemical symbol (https://www.wikidata.org/wiki/Property:P246) = O
- atomic number (https://www.wikidata.org/wiki/Property:P1086) = 8
- mass (https://www.wikidata.org/wiki/Property:P2067) = 15.999 dalton
- electronegativity (https://www.wikidata.org/wiki/Property:P1108) = 3.44

If you look around at Wikidata you'll see that some of these factual claims are sourced.

So we can look into 15.999 being different to the 16 value in Hans-Jürgen's sketch. The sourcing given is:

Pure and Applied Chemistry
retrieved
19 October 2020
title
Atomic weights of the elements 2013 (IUPAC Technical Report) (English)
DOI
10.1515/PAC-2015-0305

Here is an example query that uses Wikidata SPARQL query service to pull out answers - i.e. Oxygen - with a chemical of 8.

SELECT ?chem ?chemLabel ?atomicNumber ?mass ?electronegativity ?anyprop ?anyval
WHERE {
?chem wdt:P246 ?chemSymbol; wdt:P1086 ?atomicNumber; wdt:P2067 ?mass; wdt:P1108 ?electronegativity; ?anyprop ?anyval .
FILTER(?atomicNumber=8)
SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
# Helps get the label in your language, if not, then en language
}

You can run it here: https://w.wiki/4q52

And back on the original theme about mapping, it is also worth knowing about the CONSTRUCT mechanism in SPARQL.

We can take the above query and write CONSTRUCT queries that emit triples in a different shape or vocabulary.

This is a SPARQL query that takes what's in Wikidata for all Chemicals and emits triples along the lines sketched initially:

PREFIX foo: <https://foo.example.org/>
CONSTRUCT {
?chem foo:symbol ?chemLabel .
?chem foo:numberOfElectrons ?atomicNumber .
?chem foo:atomicMass ?mass .
?chem foo:electronegativity ?electronegativity .
?chem foo:discoTime ?discoTime .
# other properties from https://www.wikidata.org/wiki/Q629 here
} WHERE {
?chem wdt:P246 ?chemSymbol; wdt:P1086 ?atomicNumber; wdt:P2067 ?mass; wdt:P1108 ?electronegativity .
OPTIONAL { ?chem wdt:P575 ?discoTime . } .
# commented out to get everything FILTER(?atomicNumber=8)

SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}
ORDER BY ?discoTime # this might be pointless for CONSTRUCT queries

Try it here: https://w.wiki/4q57

Hope this helps...

Dan

On Wed, 2 Feb 2022 at 11:31, Chet Ensign <chet.ensign@oasis-open.org> wrote:
I am posting this on behalf of Mr. Hans-Jürgen Rennau while we debug a problem with his emails being posted to the list.

---

Assume four datasets: an XML document, a JSON document, a CSV file and an HTML document (authored near the north pole, in the rain forest, in Athens and in the Antarctic, respectively).

Imagine a standard which enables you to define the mapping of a document node to a set of RDF triples.

Remember that all documents (XML, JSON, CSV, HTML) can be parsed into document nodes (for example see [1]).

Assume that the RDF graphs obtained from our documents contain the following triples:
foo:oxygen foo:symbol "O"
foo:oxygen foo:numberOfElectrons "8"
foo:oxygen foo:atomic mass "16"
foo:oxygen foo:electronegativity ."3.5"

each one found in a different one of the four RDF graphs.

Then we have integrated information, as we now know four things about oxygen, contributed by different data sources using a different data format. Of course it would be easy to serialize the integrated information into XML, or JSON, or CSV, or HTML or any other format (employing Inuit or any other natural language).

+ + + - - -

But I suppose you think this is an idle dream. Perhaps you think that the imagined standard would not be feasible to create or to use, or you question the practicality to leverage RDF IRIs for identifying resources and properties in more than a few specific cases.

Unfortunately I agree that it is an idle dream. Only the reason I see is a different one, as I am convinced that the imagined standard is not too difficult to create and to use and I do not question the practicality of using RDF IRIs in many fields, including natural science, pharmacology, health care, finance, many verticals and economical interaction. The reason I see is that it seems impossible to find minds with a deep interest in both, XML technology and semantic technology. if - then - but.

With kind regards,
Hans-Jürgen Rennau

[1]
https://www.w3.org/TR/xpath-functions-31/#func-doc
https://docs.basex.org/wiki/JSON_Module#json:doc
https://docs.basex.org/wiki/CSV_Module#csv:doc
https://docs.basex.org/wiki/HTML_Module#html:doc

--
Chet Ensign
Chief Technical Community Steward
OASIS Open

+1 201-341-1393
chet.ensign@oasis-open.org
www.oasis-open.org