Re: [xml-dev] Making the Connection between Syntax and Semantics isStill

Hi Roger

Since I like lateral thinking, I would compare the problem you describe to what can be learned from Kinesics (science of body language) and in Kinesics there is the concept of there being no such thing as body movement completely devoid of meaning (see http://en.wikipedia.org/wiki/Kinesics , http://en.wikipedia.org/wiki/Ray_Birdwhistell ). I would apply this as being parallel to your effort to remove the semantic restrictions from the syntax to gain a contextless grammar by postulating that it cannot be done. There is always going to be some meaning encapsulated in the context. If the grammar is defined in such a way that there is no context allowed, then it still won't stop people reading meaning into the context, I suggest.

On a similar note: Is there a parallel science to that of Kinesics which can stretch to helping the analysis of the semantics meanings implied accidentally or intentionally in the syntactic structure of XML? The study of the 'body language' of XML grammars and instances of grammars.

Another note: Isn't it true that a statement can only be absolutely true or false when the context is considered? I can say the shortest path between two points is a straight line but it is true in the context of Euclidean geometry but not in all geometries. I can say the sky is blue and that is true in some contexts but in others the sky is green (e.g. when describing a given painting, say). Even 1 + 1 = 2 is only true in some contexts. To be able to make a statement without the need for also stating its context (or assuming a context) is perhaps impossible, philosophically speaking.

Cheers

Steve

----

Stephen D Green

----

Stephen D Green

On 20 April 2015 at 23:05, Costello, Roger L. <costello@mitre.org> wrote:

Hi Folks,

Problem: define a grammar for this simple data format:

N item1 item2 ... itemN

The first value in the input is a number N which specifies the number of following items.

Ouch! That's not context-free. So basically you lose the entire context-free parsing toolkit.

Stated another way, that problem requires a connection between syntax and semantics.

Alas, we enter the world of black art.

Today the syntax-semantics connection is made by writing code. Let's illustrate how the connection is made in three different grammar languages: XML Schema 1.1, ANTLR, and BISON.

XML Schema 1.1

Use the ordinary context-free capabilities of XSD to define N and each item. Then embed XPath code into the XSD to constrain the number of items to N. Here it is:

ANTLR

Create parser and lexer rules. Embed Java code within the parser rules to constrain the number of items to N. Here it is:

BISON

Proceed as with ANTLR, except instead of embedding Java code into the parser rules, embed C code.

We want this: Reusable and Retargetable Grammars

Reusable: multiple different applications can use the same parse tree.

Retargetable: multiple different languages can be generated from the same grammar.

The requirement for reusable and retargetable grammars is code-free grammars.

Want to Make an Impact on the World?

Today it is nearly impossible to solve any non-trivial language recognition problem without embedding code into the grammar. In the example above we saw that we needed to embed XPath code in XML Schema, Java code in ANTRL, and C code in BISON. And that was for a trivial problem.

Want to make a huge impact on the world? Fix this problem. No more embedded code in grammars.

Comments?

/Roger