XML.orgXML.org
FOCUS AREAS |XML-DEV |XML.org DAILY NEWSLINK |REGISTRY |RESOURCES |ABOUT
OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]
=?UTF-8?Q?Re=3A_=5Bxml=2Ddev=5D_The_semantics_of_an_XML_document_is_?==?UTF-8?Q?=E2=80=A6?=

Hiya,

This talk about semantics in XML (or any other format or medium) is
quite interesting to me. I was more involved in Topic Maps (over 10
years ago now... *sigh*), but as we had some of the same challenges,
I'll speak from that perspective. I think, probably wrongly, that the
answer to anything we're talking about here is to get right the core /
upper ontology of whatever-the-thing-is, whether it's XML/DTD, JSON,
morse code, eye winking, etc; a meta model of semantics that tries to
get a few epistemological constraints right before we dive into
implementations. I don't think we can claim that any existing
transmitting format can successfully carry semantics well (be it RDF,
XTM, or otherwise), there's simply too much invested time and tech
that's brought us to the current state. This is why we're good at
data, but not so much at semantics. Some standards win by the smaller
scope alone, some lose because they are too small (and the whole gets
too fragmented). Back and forth we go.

For XML the concept of an upper ontology lies in the XML
specification, of course (and SGML/DTD by extension, and so forth down
the rabbit hole of history), and even though it tries to be semantic
neutral ("it's up to the author of the document"), there's of course
tons of stuff in there that are pinned to concepts and abstract
notions that we, for the most part, can agree to and use and that
carry semantic meaning. I think it's symptomatic of these discussions
that all of these talks of semantics *in* data formats evolved *out*
of data formats, it has shaped the conversation to be about "type" and
"document" and "content" and "data" and "URLs" and so on (a
data-centric meta model, if you will), and from those very
data-centric things are we trying to imply semantics of a very wide
set of uses, and the less data-centric we get, the less strongly typed
we get, the less we feel that things are capturing "semantics". It's
an odd conundrum.

I don't have answers to this one, though. For years I've seen and used
various formats for semantic storage and retrieval, and while they all
have their pro's and con's, I've never seen something that looks like
a good answer. I think part of it is that we can't grasp or agree to
how we do good epistemology in an online world. Our current best use
is using things like Wikipedia URLs as subject identifiers (for
example), but given the fuzziness of the human mind I think it's a
fool's errand. It just might be that we need the machines to create
their own formats for this to work, but through re-interpreting human
knowledge into something else. Possibly. Maybe.

I think, opinionatedly and somewhat tired, that it's safer to say that
if we have a shared understanding of an upper ontology that captures a
better means of human epistemology (did I mention epistemology?) in an
online and distributed world, we can get damn near whatever ideals
we're aiming for. But that requires us to think quite differently
about what we're trying to do, I think (including for all to agree to
and understand what epistemology even is), and the industry that feeds
on data formats (easy to understand) won't let human semantics (much
harder) happen anytime soon. So for now it's inadequate piggy rides on
data formats.

Aaaaaaand, I'm done. Dinner time. Feel free to ignore this pile of waffle.


Cheers,

Alex

On Tue, Jan 11, 2022 at 4:18 PM Liam R. E. Quin <liam@fromoldbooks.org> wrote:
>
> On Mon, 2022-01-10 at 08:04 +0000, Stephen D Green wrote:
> >  I was later a contributor to a paper about
> > how XML (or equivalent) standards might one day be designed using
> > formal ontologies
> > (
> > http://www.macs.hw.ac.uk/~yjc32/project/ref-BM%20ontology/ref%20onto%20for%20eBusiness/onto%20for%20eBuss%20standards.pdf
> > ).
> > Maybe someday it will happen. Till then it might be that ontological
> > analysis will only be a luxury afforded to the more ubiquitous XML
> > languages. I did imagine that a general analysis of common features
> > of XML
> > languages such as the semantic logic implied by containing elements,
> > sequences, etcetera, might happen eventually.
>
> It's easy to forget (i don't know if you did forget) that a single XML
> document might have multiple entirely different interpretations, as
> well as being processed in entirely different ways.
>
> For example, a database schema diagram might be represented in SVG, so
> that an SVG renderer can draw it without knowing anything about the
> database schema language; another program might look at the same file
> and make a swatch of colours found in it. And a human looking at it
> might say, Ah, this tells me a person, identified by person_id, can
> have zero, one or two socks. And a country has a capital city.  And yet
> the database schema editor knew nothing of this.
>
> David Megginson (are you here) had an SGML file that had different
> interpretations depending on which SGML DTD (and SGML declaration) you
> used. The same input file produced different structures as a result.
> It's possible to do something similar in XML with entities, e.g.
> supplying a different DTD using an external catalogue file.
>
> "Semantics" are applied to XML files by external means, and these
> external sources depend on context. And these can be computer science
> Behavioural Semantics, but in that regard XML is data, it doesn't
> behave, neither well nor badly, so the behavioural semantics are more
> correctly (i feel) associated with the application processing the
> input.  But they don't have to be behavioural semantics.  Given
>     <book>
>         <author>Mervyn Peake</author>
>         <title>Titus Groan</title>
>     </book>
>
> we have identified a book (but not a manifestation of a work nor a
> particular instance such as the copy on the bookshelf behind me). We
> didn't need any URLs or IRIs to do that, although someone else could
> come along and infer RDF triples from this fragment.  Most such
> mappings either lose the information that there was a character "<"
> that was followed by a character "b" at the start of the input, in
> favour of capturing "author" and "title" as properties... or they do
> the opposite and capture the syntax but to get to author you have to
> find a < followed by an "a" followed by... which is tedious at the best
> of times. But they are both entirely reasonable RDFications in certain
> contexts, and the both capture complete semantics from some
> perspectives, and woefully incomplete semantics from other
> perspectives.
>
> Tim B-L used to say (maybe still does) that since XML lacks semantics
> it couldn't store RDF, and since it was tree-based it could not
> represent arbitrary graphs.  Stand aside, GraphML and RDF/XML :)  Since
> there are no semantics in XML, i sometimes speak of XML Phlogiston: you
> pour your RDF into XML, whereupon by definition it loses all semantics;
> you transport it, and someone else reconstitutes it, and the semantics
> magically reappear. So there's something invisible storing the
> semantics - phlogiston!
>
> In fact, of course, the semantics come from the applications, and XML
> has no trouble transporting representations of graphs. But a pure XML
> processor has no clue what it's transporting, rather like a container
> ship traversing the ocean :)
>
> Oops, i wrote a treatise!
>
> >
>
> --
> Liam Quin, https://www.delightfulcomputing.com/
> Available for XML/Document/Information Architecture/XSLT/
> XSL/XQuery/Web/Text Processing/A11Y training, work & consulting.
> Barefoot Web-slave, antique illustrations:  http://www.fromoldbooks.org
>
>
> _______________________________________________________________________
>
> XML-DEV is a publicly archived, unmoderated list hosted by OASIS
> to support XML implementation and development. To minimize
> spam in the archives, you must subscribe before posting.
>
> [Un]Subscribe/change address: http://www.oasis-open.org/mlmanage/
> Or unsubscribe: xml-dev-unsubscribe@lists.xml.org
> subscribe: xml-dev-subscribe@lists.xml.org
> List archive: http://lists.xml.org/archives/xml-dev/
> List Guidelines: http://www.oasis-open.org/maillists/guidelines.php
>


-- 
 Information Alchemist, tone modulator, swords master
 thinkplot.org | linkedin.com/in/shelterit | sheltered-objections.blogspot.com


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 1993-2007 XML.org. This site is hosted by OASIS