Re: [xml-dev] Imagine

On Tuesday, February 15, 2022, 04:23:27 PM GMT+11, Kurt Cagle <kurt.cagle@gmail.com> wrote:

Sergio,

Thanks for the links (and I will definitely be joining the construction group - I'm part of several of the others at this point, with my interest primarily in evangelizing a lot of what's going on in this space to others).

The SPARQL-Anything project looks especially promising and seems to be heading in a direction similar to both my own studies and those I've seen from other developers in the knowledge graph arena. I wrote a (since abandoned) project called Kaleidoscope that was intended to be a knowledge graph editor - one thing that I found in the process is that at some point you have to recognize that you have to get everything down to a primitive layer and not get too committed to a specific ontology.

Another thing I've found that has been especially useful has been the realization that in developing an ontology you need to make a differentiation between an archival graph, one in which predicates are bound to reified bindings in order to map an evolving world, and a secondary now() graph that can then map what are often complex third normal form constructs into simpler, derived relationships. The canonical example is determining who is the CEO of a given company. In most cases, you are likely to start with a record of the form:

Job:_Job123 a Class:_Job;

job:hasCompany Organization:_BigCo;

job:hasEmployee Person:_JamesTBigg;

job:hasId "JTB12523"^^Indentifier:_BigCoBadgeIdentifier;

job:hasStartDate "2021-03-15"^^xsd:date;

However, in a typical now() knowledge graph, this information is reduced down to a single assertion

<<Organization:_BigCo Organization:hasCEO>> :hasBinding Job:_Job123.

within a separate generated graph that links back to the descriptor.

This gets generated on a daily basis any time you have changes in the third normal form construct (usually the expiration of a given object after something obsoletes it). The now() graph reflects the state of the graph at regular intervals for simplicity, while maintaining the archival nature of the data coming in from external data in a temporal graph.

Anyway, I have more to digest on the SPARQL Anywhere work.Thanks for the links, and I'll see you on the lists.

Kurt Cagle

Community/Managing Editor

Data Science Central, A TechTarget Property

kcagle@techtarget.com or kurt.cagle@gmail.com

443-837-8725

On Mon, Feb 14, 2022 at 7:48 PM Sergio Rodriguez <srodriguez142857@yahoo.com> wrote:

Another interesting and, fairly recent, development towards an [from-any-source]-to-RDF fast mappings is "SPARQL Anything". See some resources below including a talk given yesterday.

GitHub repo:
GitHub - SPARQL-Anything/sparql.anything: SPARQL Anything is a system for Semantic Web re-engineering that allows users to ... query anything with SPARQL.

GitHub - SPARQL-Anything/sparql.anything: SPARQL Anything is a system fo...
SPARQL Anything is a system for Semantic Web re-engineering that allows users to ... query anything with SPARQL....

Slides:
https://www.slideshare.net/enricodaga/the-sparql-anything-project
https://www.slideshare.net/enricodaga/knowledge-graph-construction-with-a-faade-the-sparql-anything-project

Talk:
W3C Community Group on Knowledge Graph Construction - SPARQL Anything Presentation

W3C Community Group on Knowledge Graph Construction - SPARQL Anything Pr...

/$¡rm
Kind regards,
Sergio Rodríguez.
Sergio José Rodríguez Méndez | Web ID

On Tuesday, February 15, 2022, 02:07:49 PM GMT+11, Kurt Cagle <kurt.cagle@gmail.com> wrote:

Webb,

I THOUGHT there was some RDF thinking going on there. I recently completed a NIEM conversion project for the FBI, and while I've worked with NIEM before, was surprised at how readily it translated back into RDF.

Kurt Cagle
Community/Managing Editor
Data Science Central, A TechTarget Property
kcagle@techtarget.com or kurt.cagle@gmail.com
443-837-8725

On Mon, Feb 14, 2022 at 2:13 PM Webb Roberts <webb@webbroberts.com> wrote:
During my time working on NIEM (the National Information Exchange Model), we kept integration of XML and RDF as a core tenet. The goal was to ensure that XML data and schemas in the NIEM ecosystem represented RDF data. There were several major pieces to this:

- We defined RDF resource identifiers for each XML qualified name. This gave us RDF names for types, elements, and attributes in XML schemas and data. (see https://niem.github.io/NIEM-NDR/v5.0/niem-ndr.html#section_5.6.1)
- We defined a mapping from data that uses NIEM to RDF. Instance documents are RDF datasets. Element and attribute occurrences are RDF properties. Most elements are subject-predicate-object triples. Some elements are RDF quads. Attribute and element values are RDF literals. (see https://niem.github.io/NIEM-NDR/v5.0/niem-ndr.html#section_5.6.3)
- We defined a mapping from XML schemas using NIEM to RDF schema. Complex types are RDF types. XSD type derivation reflects rdfs:subClassOf. Element and attribute declarations are RDF properties. Instance data has corresponding RDF types. (see https://niem.github.io/NIEM-NDR/v5.0/niem-ndr.html#section_5.6.4 & 5.6.5)
- We maintained rules about how XML and XML Schema were used that let us maintain the relationship between XML and RDF.

One consequence of the XML+XSD mapping to RDF is that it was very straightforward to use JSON-LD as a standard JSON representation for NIEM data, rather than construct a new mapping between XML and JSON. (see https://reference.niem.gov/niem/specification/json/5.0/niem-json-spec-5.0.html)

Webb Roberts
webb@webbroberts.com

On 2022-02-13, at 22:48, Dan Brickley <danbri@danbri.org> wrote:

As has already been pointed out - this might well seem dreamy on XML-DEV but in the RDF world it's pretty much what drew most of us to the technology.

Most RDF toolkits try to make it easy to consolidate information from various sources and formats into its common graph model. They will usually do some subset of the more explicitly RDF-flavoured formats, e.g. RDF/XML, Turtle, RDFa, JSON-LD, N-Triples, Trig, ... etc. But there will also be an API that can be called programmatically, to create triples from anything you have programmatic access to. You'll find XML adaptors of various kinds (XSLT being the most obvious). Back in the day there were angsty debates about schema annotation for mapping to triples. For example see https://www.w3.org/2003/02/schema-annotation
http://cmsmcq.com/2002/schema-annotation.html#ab2b3b3b7
https://www.w3.org/2000/08/w3c-synd/
https://www.w3.org/TR/schema-arch/
https://www.w3.org/1999/04/WebData etc

... although those things never turned out to be as important and central as folks thought.

RDF folk spend much of their time moving all kinds of data into RDF graphs/triples. But so much of this grungy data cleaning work is necessarily custom, per-dataset, per-application, ... limiting the value of generic conversion tools.

At W3C we did have something called GRDDL that's also close to the picture outlined here - https://en.wikipedia.org/wiki/GRDDL - but in my experience it is rarely used.

Finally - there is a ton of non imaginary RDF data out there. You might look at Schema.org - widely used for in-page markup, e.g. see http://webdatacommons.org/structureddata/schemaorgtables/ or for what we've been up to at Google, https://developers.google.com/search/docs/advanced/structured-data/search-gallery

Or at Wikidata, whose data is available in RDF dumps https://www.wikidata.org/wiki/Wikidata:RDF or at query.wikidata.org via SPARQL.

For Oygen entity in Wikidata, or rather their page about it, see https://www.wikidata.org/wiki/Q629

- chemical symbol (https://www.wikidata.org/wiki/Property:P246) = O
- atomic number (https://www.wikidata.org/wiki/Property:P1086) = 8
- mass (https://www.wikidata.org/wiki/Property:P2067) = 15.999 dalton
- electronegativity (https://www.wikidata.org/wiki/Property:P1108) = 3.44

If you look around at Wikidata you'll see that some of these factual claims are sourced.

So we can look into 15.999 being different to the 16 value in Hans-Jürgen's sketch. The sourcing given is:

Pure and Applied Chemistry
retrieved
19 October 2020
title
Atomic weights of the elements 2013 (IUPAC Technical Report) (English)
DOI
10.1515/PAC-2015-0305

Here is an example query that uses Wikidata SPARQL query service to pull out answers - i.e. Oxygen - with a chemical of 8.

SELECT ?chem ?chemLabel ?atomicNumber ?mass ?electronegativity ?anyprop ?anyval
WHERE {
?chem wdt:P246 ?chemSymbol; wdt:P1086 ?atomicNumber; wdt:P2067 ?mass; wdt:P1108 ?electronegativity; ?anyprop ?anyval .
FILTER(?atomicNumber=8)
SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
# Helps get the label in your language, if not, then en language
}

You can run it here: https://w.wiki/4q52

And back on the original theme about mapping, it is also worth knowing about the CONSTRUCT mechanism in SPARQL.

We can take the above query and write CONSTRUCT queries that emit triples in a different shape or vocabulary.

This is a SPARQL query that takes what's in Wikidata for all Chemicals and emits triples along the lines sketched initially:

PREFIX foo: <https://foo.example.org/>
CONSTRUCT {
?chem foo:symbol ?chemLabel .
?chem foo:numberOfElectrons ?atomicNumber .
?chem foo:atomicMass ?mass .
?chem foo:electronegativity ?electronegativity .
?chem foo:discoTime ?discoTime .
# other properties from https://www.wikidata.org/wiki/Q629 here
} WHERE {
?chem wdt:P246 ?chemSymbol; wdt:P1086 ?atomicNumber; wdt:P2067 ?mass; wdt:P1108 ?electronegativity .
OPTIONAL { ?chem wdt:P575 ?discoTime . } .
# commented out to get everything FILTER(?atomicNumber=8)

SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}
ORDER BY ?discoTime # this might be pointless for CONSTRUCT queries

Try it here: https://w.wiki/4q57

Hope this helps...

Dan

On Wed, 2 Feb 2022 at 11:31, Chet Ensign <chet.ensign@oasis-open.org> wrote:
I am posting this on behalf of Mr. Hans-Jürgen Rennau while we debug a problem with his emails being posted to the list.

---

Assume four datasets: an XML document, a JSON document, a CSV file and an HTML document (authored near the north pole, in the rain forest, in Athens and in the Antarctic, respectively).

Imagine a standard which enables you to define the mapping of a document node to a set of RDF triples.

Remember that all documents (XML, JSON, CSV, HTML) can be parsed into document nodes (for example see [1]).

Assume that the RDF graphs obtained from our documents contain the following triples:
foo:oxygen foo:symbol "O"
foo:oxygen foo:numberOfElectrons "8"
foo:oxygen foo:atomic mass "16"
foo:oxygen foo:electronegativity ."3.5"

each one found in a different one of the four RDF graphs.

Then we have integrated information, as we now know four things about oxygen, contributed by different data sources using a different data format. Of course it would be easy to serialize the integrated information into XML, or JSON, or CSV, or HTML or any other format (employing Inuit or any other natural language).

+ + + - - -

But I suppose you think this is an idle dream. Perhaps you think that the imagined standard would not be feasible to create or to use, or you question the practicality to leverage RDF IRIs for identifying resources and properties in more than a few specific cases.

Unfortunately I agree that it is an idle dream. Only the reason I see is a different one, as I am convinced that the imagined standard is not too difficult to create and to use and I do not question the practicality of using RDF IRIs in many fields, including natural science, pharmacology, health care, finance, many verticals and economical interaction. The reason I see is that it seems impossible to find minds with a deep interest in both, XML technology and semantic technology. if - then - but.

With kind regards,
Hans-Jürgen Rennau

[1]
https://www.w3.org/TR/xpath-functions-31/#func-doc
https://docs.basex.org/wiki/JSON_Module#json:doc
https://docs.basex.org/wiki/CSV_Module#csv:doc
https://docs.basex.org/wiki/HTML_Module#html:doc

--
Chet Ensign
Chief Technical Community Steward
OASIS Open

+1 201-341-1393
chet.ensign@oasis-open.org
www.oasis-open.org