Re: [xml-dev] defining correctness for an XML transformation

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]

Re: [xml-dev] defining correctness for an XML transformation - how?

From: "C. M. Sperberg-McQueen" <cmsmcq@blackmesatech.com>
To: Roger L Costello <costello@mitre.org>
Date: Thu, 04 Jul 2024 07:49:09 -0600

Roger L Costello <costello@mitre.org> writes:

> Michael Sperberg-McQueen made this fascinating statement:
>
>     ... my question [is] focused not on how to prove a transformation
>     correct, but how to specify what correctness is for that
>     transform.
>
> I would really like to understand that. What is the difference between
> “proving a transformation is correct” versus “specifying what
> correctness is for a transform”?

(Michael Kay has already answered this pretty well; this is partly to
say that I agree with what he said and partly to add a simple example.)

The second is (the formulation of) a statement; the first is (the
construction of) a proof that the statement is true.

For example: Suppose we specify as follows the correctness of a
transformation T taking input I and producing output O:

   pre-condition:  I is an XML document containing zero or more
   Airport_Name elements.

   post-condition: O is an XML document containing zero or more 'name'
   elements, such that

     (1) for every Airport_Name element $i in I, there is some name
         element $o in O such that $o has only one child node (a text
         node) and string($o) = normalize-space($i);

     (2) for every name element $o in O, there is some Airport_Name
         element $i in I such that string($o) = normalize-space($i).

Then for any specific input/output pair we can imagine checking that the
pre-condition and post-condition both hold.  If I have succeeded in
making the specification unambiguous, it should always be clear whether
a given pair of I and O do or do not satisfy the conditions.  From this
spec, for example, it's clear that any name in the input may occur any
non-zero number of times in the output, and any name in the output may
have occurred any non-zero number of times in the input, and that the
ordering of names in the output, and the structure of the output
document, are not constrained.

And to prove T correct, instead of just proving an individual I/O pair
correct, we can imagine proving that *whenever* the pre-condition is
satisfied, the result of running T will *always* satisfy the
post-condition.  Suppose T is written in XSLT.  Then we might want to
show that for every Airport_Name element in I, a particular template
will be evaluated, that that template produces zero or more suitable
'name' elements, and that no other template in the stylesheet will
produce any 'name' elements.  Or we might show that a particular
variable is assigned a sequence of strings containing every distinct
Airport_Name string in I, and that a given expression in the stylesheet
serializes them all in 'name' elements (and again that nothing else in
the program will ever produce a 'name' element).

[Note: the example pre- and post-conditions I've given are formulated in
       English; many people prefer a more formal language, because
       statements in more formal languages can be manipulated
       mechanically in ways that are helpful.  Proving that T satisfies
       its specification can similarly be done in prose or in a purely
       formal way.

       We get more confidence that the proof is correct if it can be
       checked mechanically, but plenty of anecdotal evidence suggests
       that even writing conditions and proofs in prose can reduce the
       defect rate for software.  (See for example the 'clean room
       engineering' approach developed by Harlan Mills at IBM, perhaps
       best described as 'semi-formal'.)  There are also anecdotal
       reports that just formulating the pre- and post-conditions helps,
       even if there is no effort to produce a proof.  I find this
       plausible, since formulating pre- and post-conditions even in
       prose makes me think about the program and possible edge cases in
       a way that I do not otherwise always do.

       I will be happiest if the answer to my question of the other day
       describes a formal language analogous to those used in formal
       proofs of correctness for imperative languages, but suitable for
       XML.  Even a coherent account of how to formulate clear, crisp,
       complete, and accurate specifications of correctness in prose,
       however, would look to me like progress.  End of note.]

I hope this helps.

Michael

-- 
C. M. Sperberg-McQueen
Black Mesa Technologies LLC
http://blackmesatech.com

Follow-Ups:
- Re: [xml-dev] defining correctness for an XML transformation - how?
  - From: Rick Jelliffe <rjelliffe@allette.com.au>
- Re: [xml-dev] defining correctness for an XML transformation - how?
  - From: "Liam R. E. Quin" <liam@fromoldbooks.org>
- Re: [xml-dev] defining correctness for an XML transformation - how?
  - From: Thomas Passin <list1@tompassin.net>

References:
- defining correctness for an XML transformation - how?
  - From: "C. M. Sperberg-McQueen" <cmsmcq@blackmesatech.com>
- Re: [xml-dev] defining correctness for an XML transformation - how?
  - From: Tony Graham <tgraham@antenna.co.jp>
- Re: [xml-dev] defining correctness for an XML transformation - how?
  - From: "C. M. Sperberg-McQueen" <cmsmcq@blackmesatech.com>
- Re: [xml-dev] defining correctness for an XML transformation - how?
  - From: Roger L Costello <costello@mitre.org>

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]