Re: [xml-dev] defining correctness for an XML transformation

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]
Re: [xml-dev] defining correctness for an XML transformation - how?
From: "C. M. Sperberg-McQueen" <cmsmcq@blackmesatech.com>
To: Hans-Juergen Rennau <hrennau@yahoo.de>
Date: Wed, 03 Jul 2024 16:16:06 -0600
Thank you.  I think you are right for some cases, but not all.

For cases where what we want is to translate some body of information
from one format to another (whether it's from one XML format to another
or between XML and some non-XML format), I think you are right that
the information of the data seems to provide a good handle.  I tried to
describe a systematic approach to that problem in

  C. M. Sperberg-McQueen, "What constitutes successful format
      conversion? Towards a formalization of 'intellectual content'"
      International Journal of Digital Curation (IJDC) 6.1 (2011):
      153-164.

      http://www.ijdc.net/article/view/170/238
      http://www.ijdc.net/article/download/170/238/0

A concrete but simple example is described in

  Sperberg-McQueen, C. M., Yves Marcoux and Claus Huitfeldt. "Document
      lattices: Equivalence, compatibility, and contradiction in
      document markup." Presented at Balisage: The Markup Conference
      2014, Washington, DC, August 5 - 8, 2014. In Proceedings of
      Balisage: The Markup Conference 2014. Balisage Series on Markup
      Technologies, vol. 13 (2014).

      https://doi.org/10.4242/BalisageVol13.Sperberg-McQueen01
      https://balisage.net/Proceedings/vol13/html/Sperberg-McQueen01/BalisageVol13-Sperberg-McQueen01.html

Two caveats apply:

(1) Those papers assume that what you want is to show that in converting
a particular resource from one form to another, no information you care
about has been corrupted, deleted, or added.  That's challenging in part
because it requires us to specify what information we care about and
what information we don't care about, which is not always easy.

But if the task is to show that the program performing the conversion is
correct, more is required than showing that the right thing happened in
a particular case.  We will need a way to specify correctness (this is
the question I was pondering yesterday), and then a way to prove that
the conversion program meets the specification.

(2) As Mike Kay has already pointed out, some programs we wish to write
to process XML are not aimed at preserving all and only the important
information in the input.  Can we apply to them the methods we develop
to check information preservation?

I would have said "no, clearly not", but your mail (thank you!) has made
me think again.  Take the common case of a stylesheet to display an XML
document in a web browser.  Some information in the input document is
almost certain to be lost in the HTML version of the document -- it is,
as people sometimes say, a down-hill conversion (at least usually).  But
perhaps we can specify that that information counts as information we
don't (in this case) care about.  I'm not quite sure what to do with
requirements like "Keywords should be typographically distinct, and
there must be a blank line between function definitions" (I'm thinking
about the HTML stylesheet I was writing the other day, for the Alloy
modeling language).  But perhaps we can treat those typographical points
as conveying information (and any failure to observe them as conveying
distinct and erroneous information).

So, thank you.  I'll have to think more about that direction.

But to take another case -- suppose we have a program (XQuery or XSLT or
something else) which takes (a) an invisible-XML grammar, in XML form,
and (b) a string of characters, and has the task of producing either a
correct parse tree for the input string, given the input grammar, or
else an XML document correctly signaling that the input string is not a
sentence in the language defined by the grammar.  There is, perhaps,
some sense in which all the information conveyed by the parse tree is
already present, implicitly, in the input string and input grammar taken
together.  But that feels to me a little too much like Wittgenstein's
claim that if we really understand any sentence S, then the logical
consequences cannot possibly surprise us.  (If that is so, then has
anyone ever really understood any sentence in any language?)  In
practice, the information we can readily extract from the grammar and
sentence is not the same as the information we can readily extract from
the parse tree -- that is why the program is potentially useful.  In
that and similar cases, I don't see a way for comparison of information
content to help us.

Or to take a simpler case: for many implementations of ixml, the first
thing they do with the input grammar is translate it into an equivalent
grammar in a BNF-like form.  The two grammars are (if the translation is
correct) equivalent, in the sense that L(G) = L(G').  But since the
relation between the form of the grammar and the language it describes
is complex, I don't see a practical way to check for correctness by
checking that the information content of G and of G' is the same. (If I
remember correctly, the identity of L(G) and L(G') is in the general
case undecidable.)

Thank you for making me think harder.

Michael


Hans-Juergen Rennau <hrennau@yahoo.de> writes:

> A most interesting question. My thoughts tend in a direction different
> from pre- and post-conditions. The focus should be on the comparison
> of structured information. As you are speaking of "translation from
> one format to another", I consider information content as a possible
> yardstick.  Imagine source and target formats can be mapped to their
> semantic information content - expressed e.g. by RDF or some other
> representation as recently presented at xmlprague 2024 [1]. The
> correctness of the transformation can then be assessed by comparing
> the information content of source and target. I think the possibility
> of mapping tree-structured information (XML, JSON, ...) to
> tree-independent information content does not receive the interest
> which it deserves.
>
> Kind regards,
> Hans-Jürgen
>
> [1] Kottmann, Renzo; Cedric Pauken;  Andreas Schmitz: Simple Semantic Data Modeling in XML (SeMoX), xmlprague 2024
> https://archive.xmlprague.cz/2024/files/xmlprague-2024-proceedings.pdf , p. 231 f
>
> Am Mittwoch, 3. Juli 2024 um 15:47:59 MESZ hat C. M. Sperberg-McQueen <cmsmcq@blackmesatech.com> Folgendes geschrieben: 
>
> Roger Costello's recent question about how to show the correctness of a
> translation from one XML format to another very similar one suggests a
> related question.  Forget *showing* that an XML transformation is
> correct -- how would you define correctness formally, if you wanted to
> be able in principle to provide a machine-checkable proof of
> correctness?
>
> For imperative languages, one way is to define a pre-condition which the
> caller of a program or function must guarantee, and a post-condition
> which describes what the program or function will achieve.  Written in a
> Hoare triple, pre-condition P, post-condition Q, and code S can be
> depicted as {P}S{Q}.
>
> But the logical world illustrated by typical descriptions of Hoare
> triples feels remarkably simple -- atomic values assigned to variables.
>
> What language would one need in order to formulate plausible pre- and
> post-conditions on XML transformations, or more generally on functions
> or procedures that operate on XDM instances?
>
> Asking for a friend.


-- 
C. M. Sperberg-McQueen
Black Mesa Technologies LLC
http://blackmesatech.com
References:
- defining correctness for an XML transformation - how?
  - From: "C. M. Sperberg-McQueen" <cmsmcq@blackmesatech.com>
- Re: [xml-dev] defining correctness for an XML transformation - how?
  - From: Hans-Juergen Rennau <hrennau@yahoo.de>
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]