XML.orgXML.org
FOCUS AREAS |XML-DEV |XML.org DAILY NEWSLINK |REGISTRY |RESOURCES |ABOUT
OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]
Re: [xml-dev] defining correctness for an XML transformation - how?

On 7/4/2024 9:49 AM, C. M. Sperberg-McQueen wrote:
Roger L Costello <costello@mitre.org> writes:

Michael Sperberg-McQueen made this fascinating statement:

     ... my question [is] focused not on how to prove a transformation
     correct, but how to specify what correctness is for that
     transform.

I would really like to understand that. What is the difference between
“proving a transformation is correct” versus “specifying what
correctness is for a transform”?
By pure coincidence I just happened across this paper which deals in part with what software correctness is and how it can be measured -

"Cleanroom Software Engineering
Harlan D. Mills, M. Dyer, R. C. Linger"

https://trace.tennessee.edu/cgi/viewcontent.cgi?article=1017&context=utk_harlan

Harlan D. Mills was the IBM engineer behind "cleanroom software engineering. The following is a retrospective on his work -

https://ieeecs-media.computer.org/assets/pdf/millslegacy.pdf

One interesting twist is to measure software correctness by statistical measures instead of formally by mathematical deduction. You still need formal specifications, and they can and must be verified by humans, and that verification may involve mathematics, but it's different kind of thing.

As part of correctness testing, one has to include a statistical distribution of expected inputs. Then in testing those are applied and failures are noted. (It sounds a lot like fuzzing, doesn't it?). From them the mean time to failure can be reliably estimated, among other things.

(Michael Kay has already answered this pretty well; this is partly to
say that I agree with what he said and partly to add a simple example.)

The second is (the formulation of) a statement; the first is (the
construction of) a proof that the statement is true.

For example: Suppose we specify as follows the correctness of a
transformation T taking input I and producing output O:

    pre-condition:  I is an XML document containing zero or more
    Airport_Name elements.

    post-condition: O is an XML document containing zero or more 'name'
    elements, such that

      (1) for every Airport_Name element $i in I, there is some name
          element $o in O such that $o has only one child node (a text
          node) and string($o) = normalize-space($i);

      (2) for every name element $o in O, there is some Airport_Name
          element $i in I such that string($o) = normalize-space($i).

Then for any specific input/output pair we can imagine checking that the
pre-condition and post-condition both hold.  If I have succeeded in
making the specification unambiguous, it should always be clear whether
a given pair of I and O do or do not satisfy the conditions.  From this
spec, for example, it's clear that any name in the input may occur any
non-zero number of times in the output, and any name in the output may
have occurred any non-zero number of times in the input, and that the
ordering of names in the output, and the structure of the output
document, are not constrained.

And to prove T correct, instead of just proving an individual I/O pair
correct, we can imagine proving that *whenever* the pre-condition is
satisfied, the result of running T will *always* satisfy the
post-condition.  Suppose T is written in XSLT.  Then we might want to
show that for every Airport_Name element in I, a particular template
will be evaluated, that that template produces zero or more suitable
'name' elements, and that no other template in the stylesheet will
produce any 'name' elements.  Or we might show that a particular
variable is assigned a sequence of strings containing every distinct
Airport_Name string in I, and that a given expression in the stylesheet
serializes them all in 'name' elements (and again that nothing else in
the program will ever produce a 'name' element).

[Note: the example pre- and post-conditions I've given are formulated in
        English; many people prefer a more formal language, because
        statements in more formal languages can be manipulated
        mechanically in ways that are helpful.  Proving that T satisfies
        its specification can similarly be done in prose or in a purely
        formal way.

        We get more confidence that the proof is correct if it can be
        checked mechanically, but plenty of anecdotal evidence suggests
        that even writing conditions and proofs in prose can reduce the
        defect rate for software.  (See for example the 'clean room
        engineering' approach developed by Harlan Mills at IBM, perhaps
        best described as 'semi-formal'.)  There are also anecdotal
        reports that just formulating the pre- and post-conditions helps,
        even if there is no effort to produce a proof.  I find this
        plausible, since formulating pre- and post-conditions even in
        prose makes me think about the program and possible edge cases in
        a way that I do not otherwise always do.

        I will be happiest if the answer to my question of the other day
        describes a formal language analogous to those used in formal
        proofs of correctness for imperative languages, but suitable for
        XML.  Even a coherent account of how to formulate clear, crisp,
        complete, and accurate specifications of correctness in prose,
        however, would look to me like progress.  End of note.]

I hope this helps.

Michael








[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 1993-2007 XML.org. This site is hosted by OASIS