Re: [xml-dev] defining correctness for an XML transformation

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]

Re: [xml-dev] defining correctness for an XML transformation - how?

From: "C. M. Sperberg-McQueen" <cmsmcq@blackmesatech.com>
To: "Liam R. E. Quin" <liam@fromoldbooks.org>
Date: Fri, 05 Jul 2024 07:15:00 -0600

Liam Quin has suggested additions to the toy specification of
correctness I offered as part of my explanation of the difference
between a specification of correctness and a proof of correctness.  The
additions illustrate an important and useful point: correctness in
general may involve properties other than functional correctness, many
of which are (as Liam observes) harder to prove or even define than
functional correctness.

Since the correctness conditions I am most interested in being able to
formulate and perhaps prove are conditions on software I write for my
own use, it's true that I focus most on functional correctness and not
on security or safety properties.  I already know that I am not trying
to steal information from myself, so I don't face the problem of proving
that the code I write for myself is secure against information theft.
In contrast, I do not already know that the transform I have written is
functionally correct, which is why I am interested in better ways of
defining, and maybe even proving, correctness (mostly functional
correctness).  I thank Liam for reminding us that not everyone has that
luxury.

I do have one concern.  Liam's additional provisions have the side
effect of making the specification of correctness longer and more
daunting, which will I suspect confirm some readers in a wrong belief
that correctness is really only relevant in security-critical and
safety-critical applications where cost is not an issue, and has no
interest or utility for the rest of us.  I'm not sure that's true.  In
the code I write and run, functional defects are far more common than
security issues.

A few comments on specific points are interspersed below.

"Liam R. E. Quin" <liam@fromoldbooks.org> writes:

> On Thu, 2024-07-04 at 07:49 -0600, C. M. Sperberg-McQueen wrote:
>> 
>> 
>>      (1) for every Airport_Name element $i in I, there is some name
>>          element $o in O such that $o has only one child node (a text
>>          node) and string($o) = normalize-space($i);
>> 
>>      (2) for every name element $o in O, there is some Airport_Name
>>          element $i in I such that string($o) = normalize-space($i).
>
> Often forgotten, and responsible for a great number of security
> problems:
>
>   (3) there is no other element in O than the wrapper element and
>       AirportName elements

If that is in fact a requirement (whether for security reasons or
other), it should of course be specified.

I deleted this constraint from my draft intentionally, because I wanted
a more minimal, less constraining definition of functional correctness.
The sample definition does not, for example, require a 1:1 relation
between elements in the input and output, or between the overall
structure of the input and output, or require that the output preseerve
the sequence in which names appear; if those are in fact requirements,
then the specification needs to state them.  But are they in fact
requirements for correctness, or are they only properties of the
implementation we have in mind?

>   (4) there is no use of external XML entities in O,
>       no internal subset, and no additional namespace declarations
>       (e.g. xsi: to alter where a schema is sought, and possibly
>       introduce default values)

I'll leave to others the discussion of when and where these constraints
are and are not required.

I'll observe only that anyone who wishes to validate untrusted data and
starts by following the hints in the xsi:schemaLocation attributes
present in the untrusted document is looking for trouble and may find
it.  (Sometimes I do trust myself to point to the right schema, even
when I do not trust myself to get the transform functionally correct; in
that case prohibiting schema location hints would only be a way to make
my own life harder.)

>   (5) the output O is well-formed XML

I think that follows from the statement that O is an XML document.

>   (6) no additional files or resources are consulted or created in
>       the transformation process
>
>   (7) the transformation must complete without using excess memory of
>       CPU time or other system resources.

I'm not sure I know how to define "excess" here.  The discussions of
pre- and post-conditions I have seen recently all seem to wish for
specifications to be unambiguous and provide crisp distinctions.  I do
not know how those who need to impose constraints like this make them
specific enough to be testable, or what kind of language is needed in
order to express them.

> So, there’s a pragmatic side to correctness often ignored in the
> textbooks, partly for simplicity as the closer you get to the edge of
> your system, the harder it gets to specify and measure things.

Excellent point, particularly for those who must persuade others that
the software they produce is trustworthy.

Michael

-- 
C. M. Sperberg-McQueen
Black Mesa Technologies LLC
http://blackmesatech.com

References:
- defining correctness for an XML transformation - how?
  - From: "C. M. Sperberg-McQueen" <cmsmcq@blackmesatech.com>
- Re: [xml-dev] defining correctness for an XML transformation - how?
  - From: Tony Graham <tgraham@antenna.co.jp>
- Re: [xml-dev] defining correctness for an XML transformation - how?
  - From: "C. M. Sperberg-McQueen" <cmsmcq@blackmesatech.com>
- Re: [xml-dev] defining correctness for an XML transformation - how?
  - From: Roger L Costello <costello@mitre.org>
- Re: [xml-dev] defining correctness for an XML transformation - how?
  - From: "C. M. Sperberg-McQueen" <cmsmcq@blackmesatech.com>
- Re: [xml-dev] defining correctness for an XML transformation - how?
  - From: "Liam R. E. Quin" <liam@fromoldbooks.org>

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]