xml-dev - Re: Syntax and semantics

Re: Syntax and semantics
[ Lists Home | Date Index | Thread Index ]
From: "W. E. Perry" <wperry@fiduciary.com>
To: Paul Prescod <paul@prescod.net>, xml-dev@xml.org
Date: Wed, 17 May 2000 19:18:06 -0400
Paul Prescod wrote:

> I'll take that as "no"?

How so? I have provided two (let us call them expositions) from across a span
of 2300+ years. They are essentially equivalent:  both identify a concrete
instance manifestation (specifier, syntactic reading) upon which a *process*,
whose identification is intrinsic to the exposition, works to effect an outcome
(specified, meaning) which is the thing that our attempt at definition set out
to identify. That strikes me as an entirely adequate proposition of the terms
of debate.

> > >  * In what sense does XML not have semantics?
> >
> > The specification of XML 1.0 stays admirably focussed on
> > delineating the syntax.
>
> Actually, it mixes syntax and semantics shamelessly and inconsistently.
> Stating here that "a processor must pass this to the application" and in
> other places nothing at all.

It seems to me that this identifies one syntactic possibility of an XML
instance which, because it is to be recognized as part of XML-conformant
syntax, must appear in XML instances to which it is relevant.

> > As an 'information set' (and most especially as the Infoset) it
> > most certainly is. Nothing in > the XML 1.0 spec mandates a
> > canonical infoset or a tree structure.
>
> No, but that's just a bug. Some have proposed that the infoset,
> namespaces and XML 1.0 be rolled into XML 1.1.

May be, but the syntactic possibilities conformant to XML 1.0 are as there
described.

> > As I have argued at length elsewhere, imposing either the infoset
> > or the tree structure view upon XML syntax, as specified, greatly
> > and gratuitously curtails the expressive and functional
> > possibilities inherent in the syntax alone.
>
> Greatly, yes. Gratuitously, no. Similarly, mposing the XML syntax on
> Unicode greatly curtails the expressive functionality of Unicode alone.
> But in both cases *the whole point* is to reach into the universe of
> possibilities and pick out the ones we are interested in. XML is not
> interested in languages that cannot be expressed in XML syntax and is
> just as uninteresed in concepts/data models that cannot be expressed as
> (perhaps shallow) trees.

Given that XML 1.0 is a specification of syntax, then it must necessarily limit
itself (at least) to what might be specified as syntactic (rather than simply
encoded) possibilities. Notice, too, that even if understood as a curtailment
of Unicode possibilities, the syntactic possibilities which might conform to
XML 1.0 remain infinite. As a practical matter, the number of possible
syntactic constructs to which pre-agreed semantics might be assigned is far
from limitless.

> Let's talk about what happens if we don't have the infoset, both in
> reality and in practice. In reality we have the problem that people do
> not know whether this:
>
> <![CDATA[]]><![CDATA[ ]]>
>
> is one text node or two.

'Node', of course, implies a tree-structured view which, bug or not, is not
mandated by XML 1.0. Effectively, you want to assign, via the Infoset, a
greater semantic load to this construction than its syntax, as syntax,
expresses. You will be able to impose that semantic expectation upon those who
by prior agreement accept your assignment of particular semantics to that
syntactic construct, which is a very different agreement than accepting XML 1.0
syntax as syntax. May I also point out that you have thereby extended (or at
least extended the specification of; you have in fact reduced the range of
possibilities in) the eXtensible Markup Language, not by markup but by
pre-arranged side agreements entirely separate from the syntax which the
specification delineates.

> In theory, things could get much, much worse were it not for our SGML
> heritage keeping us sane. Some applications could start to treat
> attribute ordering as significant. Some applications could start to
> treat the distinction between ('') and ("") strings as significant. Some
> could start treating the whitespace between attributes as signficant.
> (If you use a tab here, it means X. If you use a space here, it means Y.
> If you use two tabs here, it means Z).

In the peer-to-peer interaction between two autonomous, mostly anonymous nodes
of the Internet topology, any of these is possible as locally derived
semantics. If you are not a party to that transaction, who are you to specify
its terms?

> Remember that the goal is not to maximize expressive paper but rather to
> maximize *interoperability* and code reuse.

Which, in my vision, is done by the ongoing interaction of nodes, two-by-two,
building upon their history with each other. In your vision this is
accomplished by the pre-assignment of semantics to identified syntactic
constructs, which with every new such assignment further limits the potential
universe of interacting nodes to those which, for each and every 'extension'
can, will, and even know how to accept the consequences of a new pre-arranged
semantic side agreement.

> > >  * if semantics are entirely local, then does Microsoft have the right
> > > to interpet the "a" element type in xhtml as meaning "archive" and the
> > > "b" as meaning "Beethoven"?
> >
> > Absolutely, if they can implement desirable behavior from a
> > process by so
> > assuming.
>
> I see. So you feel that the XHTML specification could be just element
> types, with no description of either meaning or behavior. And then when
> we send an XHTML document to a browser, the client and server could
> somehow negotiate the meanings of the tags? I do not follow.

XHTML is not XML precisely because the browser, which is the intended processor
of XHTML instances, lacks the specific extensibility mechanism which is a
unique and defining characteristic of XML. That is extension by markup alone.
With that flexibility and power comes the responsibility of the receiving node
to process XML instances in a locally uniquely appropriate way. XHTML supplies
the convenience of standard processing in the browser and not very much in the
way of doing more than browser rendition.

> > That is where the value of the Semantic Web becomes apparent. One
> > pair at a time, autonomous nodes can build the semantic
> > context within which they come to understand one another. I will
> > not elaborate on this process now; I have done it at plenty
> > length elsewhere.
>
> #1. Can you provide a reference?

http://www.gca.org/attend/2000_conferences/xtech_2000/proceedings/presentations/cover.html

http://www.fiduciar.demon.co.uk/writeup/Q&As.html
http://www.fiduciar.demon.co.uk/writeup/whitepaper.htm
(sorry:  the slides from last August in Montreal seem to have disappeared from
the GCA website--if you really want them I can post them for you)

This stuff is mostly 'slideware' and 'pageware' but it covers a first cut of my
basic points.

> #2. What do we do while we wait for the Semantic Web?

Build it. Right now. With processes which are based on just the sort of
two-at-a-time negotiation between nodes which I describe.

> #3. How can the Semantic Web ever be complete until computers are
> artificially intelligence?

This is not AI, just the gruntwork of building context and semantics from
syntax one data exchange at a time. And it is built only by using it.

> Consider: I have a purchase order database. The computer has no
> knowledge of what that means. It is just a database with fields. You
> send me a purchase order in an random vocabulary. I have no
> foreknowledge of the vocabulary or its semantics. The document must be
> sliced and diced so that it may go into my database. How can this be
> done without a human intervenor? How can the computer know that your
> company_name is what is in my database known as customer_name?

See the Xtech2000 slides, which address this precise problem. If they are
unsatisfying I can supplement them with more formal descriptions of the
processes that I am using in production jobs and with some code (which really
just does iterative donkey work, building up the semantic foundations for more
elaborate node-to-node interaction).

> My experience is that semantic mapping is a process that takes place in
> front of white boards, around meeting tables and occasionally in bars.

It takes place in negotiations of which the raw material is the stuff of actual
transactions, agreements, deals. Much of that negotiation is mechanical and
consists only of the exchange of instance data specific to a particular
transaction, until the point when the nodes (not necessarily simultaneously)
each has enough to consummate what it regards as its side of the deal. At that
point a new basis of negotiation is established, on which the nodes must
negotiate payment, delivery, regulatory documention and other ancillary
requirements of their transaction. There is nothing here which requires a human
most of the time.

> > The only point to make now is that the negotiation must be based
> > on each node handed the other semantically neutral content.
>
> Isn't the phrase "semantically neutral content" oxymoronic? I cannot
> pick up Les Miserables and start reading. First I learn French. I need
> the semantics behind the individual words to get the meaning of the
> whole thing! And I need those semantics before I can read the book.

Actually, there is a great deal more work, and a great many more steps, before
what I describe as 'semantically neutral content' (my choice of words might
perhaps be better) becomes--for you, or any other autonomous node--semantically
interesting, which is to say semantically usable content. Before you go learn
French you have to realize that the stuff is in a natural language, then
identify which one. It is a tedious process of small steps which must be done
the first time for every new sort of content. My own algorithms for this (which
are not by any means the only possible approach) running, naturally, on each
receiving node begin by looking at the provenance and the structure of each
newly-received message. They are looking ideally for a match on both provenance
and structure to a previously processed message, in order to use it as the
first candidate of a model for instantiating this new message into a
locally-comprehensible form. Even if a perfect match is found, it models only a
candidate instantiation, which must then be handled by locally-defined process
and successfully survive locally-defined sanity checks on the output. Where
less-than-perfect matches are found, the process attempts, if no better model
is available in the history of its interaction with the particular counterparty
node, to use them as templates for instantiating the received message content
into a form usable by local processes. With utterly no usable match to go by,
the node can attempt by brute force to instantiate the atomic items of the
received message into any of the data structure which it is capable of
processing--which is to say, any of the forms which it has any interest in or
understanding of. In the end the node may fail to find an instantiation of the
received message which it is equipped to process, it which case it can make
nothing of the message. Far better, though, to accept that failure will result
in some percentage of cases than to rule out, in advance, the possibility of
performing any useful transaction with a node which has not accepted, a priori,
a particular semantic understanding of a given syntactic construct.

Respectfully,

Walter Perry



***************************************************************************
This is xml-dev, the mailing list for XML developers.
To unsubscribe, mailto:majordomo@xml.org&BODY=unsubscribe%20xml-dev
List archives are available at http://xml.org/archives/xml-dev/
***************************************************************************
References:
- Welcome to the XML-URI list
  - From: "Tim Berners-Lee" <timbl@w3.org>
- Re: Welcome to the XML-URI list
  - From: "W. E. Perry" <wperry@fiduciary.com>
- Re: Syntax and semantics
  - From: "W. E. Perry" <wperry@fiduciary.com>
- Re: Syntax and semantics
  - From: Paul Prescod <paul@prescod.net>
Prev by Date: RE: Syntax and semantics
Next by Date: arbitrary bytes/characters?
Previous by thread: Re: Syntax and semantics
Next by thread: Why are relative NS identifiers used?
Index(es):
- Date
- Thread