XML.orgXML.org
FOCUS AREAS |XML-DEV |XML.org DAILY NEWSLINK |REGISTRY |RESOURCES |ABOUT
OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]
Re: [xml-dev] RE: How to assess the correctness of a Format1 -->Format2 mapping?

On Wed, 2023-01-25 at 18:05 +0000, Roger L Costello wrote:
> 
> 1. Round-tripping does not prove correctness, right? 

Right. It can add confidence when used in conjunction with testing,
though.


> 
> 2. Suppose there are two applications, App1 and App2, which are
> designed to perform the same task. The input to App1 is instances of
> Format1 and the input to App2 is instances of Format2. One way of
> ascertaining the correctness of the Format1 --> Format2 conversion is
> to compare the outputs of App1 and App2.

Again, this can help give confidence, yes.

>  That is, for each instance of Format1, input the instance into App1
> and map the instance to Format2 to generate instance2; input
> instance2 into App2 and then compare the outputs of App1 and App2. Do
> you agree with the following assertions?
> 
> Assertion #1: Comparing the outputs of App1 and App2 is equivalent to
> comparing the behavior of App1 and App2.

Consider the file removal command, e.g. "rm" on Unix or Linux.
It does not produce any output in the case of success...

However, if by output one means detectable state change in the universe
:) then in terms of procedural semantics yes, it's equivalent.

> 
> Assertion #2: Comparing application behaviors is a very hard problem.
Right.

> Assertion #3: Comparing syntaxes is a relatively easy problem.
We have to decide on what is meant by comparing. There are subjective
and æsthetic comparisons, historical comparisons, or we can compare the
set of input languages accepted by two grammars...

> 
> Assertion #4: When trying to ascertain the correctness of a Format1 -
> -> Format2 mapping, do everything you possibly can to turn the
> problem into one of comparing syntaxes, not one of comparing
> application behaviors.

Let's say, try to frame problems in ways such that solutions can be
tested and measured.

> 
> 3. Here are two approaches to generating XML from an instance of
> Format1:
> 
> Approach #1: Write a program that parses an instance of Format1 and
> then serializes the in-memory parse tree to the desired XML format.
> 
> Approach #2: Using a standard data format specification language
> (i.e., DFDL), write a specification of the Format1 data format. Then
> input that specification into a standard DFDL processor along with an
> instance. The processor parses the instance in accordance with the
> specification and automatically serializes the in-memory infoset to
> XML.
> 
> Which approach is more likely to correctly generate XML documents
> from Format1 instances?

It may be that the first approach requires a deeper understanding of
the format, and hence is more likely to be correct. I don't think
there's a universal answer here.

> 
> I assert that the DFDL specification approach is guaranteed to
> correctly generate XML documents from Format1 instances. 

I have a really nice bridge for sale. It's currently located in Ukraine
and has been previously owned, know what i mean gov?


> The hand-crafted approach is much less likely to correctly generate
> XML documents from Format1 instances. (Well, guaranteed might be a
> bit strong, but I am pretty darn certain that correct XML is
> generated.)

Then you don't need to do any testing, do you? Wait, what? Pretty darn
certain?

> 
> So there is no uncertainty in whether the conversion from Format1 to
> XML is correct. Likewise, there is no uncertainty in whether the
> conversion from Format2 to XML is correct. The only uncertainty is
> whether the conversion from Format1 to Format2 is correct. This is
> huge.

I've yet to see a program of more than half a dozen lines that didn't
have a bug in it or didn't fail for some edge case. I found a security
bug in the Unix "cp" and "mv" command. I've been handed interview
questions and seen bugs in the API design.

The problem of proving programs correct is unsolved. This is because
"correct" has several aspects, some of which appear to be intractable:
it requires a complete and precise statement of what the program should
do, and of the result. But that is equivalent to writing the program.

But that word "complete" is pretty difficult. This is why, for example,
the HTML 5 specification is so large: it attempts to say what a user
agent (e.g. browser) should do with both correct and incorrect input,
since users expect a browser to display Web pages regardless of whether
they conform to a specification or not.

So we look at the documentation for the input format. For example,
OpenType has statements like [1],

For extended typographic families that includes fonts other than the
four basic styles (regular, italic, bold, bold italic), it is strongly
recommended that name IDs 16 and 17 be used in fonts to create an
extended, typographic grouping.

The statement doesn't directly say how name IDs 16 and 17 create this
grouping, nor what goes wrong if they are not used in that way, nor in
what other ways they could be used. But a program translating an
OpenType font to some other format needs to do the right thing in such
cases.

So there are decisions to be made in conversion that require awareness
of both the source and destination contexts. This is not unusual. I'd
go so far as to say it's probably normal for any complex conversion
worth discussing :)

Sometimes there are no simple universal answers and you have to take
life on a case-by-case basis.

liam




[1] strictly speaking this is from an adjunct document,
https://learn.microsoft.com/en-us/typography/opentype/spec/name
Such documents are needed when the specification is too difficult to
read or, being an ISO spec, perhaps isn't freely available or can't be
annotated openly.

-- 
Liam Quin, https://www.delightfulcomputing.com/
Available for XML/Document/Information Architecture/XSLT/
XSL/XQuery/Web/Text Processing/A11Y training, work & consulting.
Barefoot Web-slave, antique illustrations:  http://www.fromoldbooks.org


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 1993-2007 XML.org. This site is hosted by OASIS