XML.orgXML.org
FOCUS AREAS |XML-DEV |XML.org DAILY NEWSLINK |REGISTRY |RESOURCES |ABOUT
OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]
Re: [xml-dev] defining correctness for an XML transformation - how?

Roger L Costello <costello@mitre.org> writes:

> As I mentioned previously, I need to convert <Airport_Name> to
> <name>. Here is an example conversion:
> ...
> The value of <name> is identical to the value of <Airport_Name>,
> except trailing spaces are removed.

> I have a file containing an <Airport_Name> for every airport in the
> world. The airport names are expressed using ASCII characters.
>
> Suppose I iterate over every <Airport_Name> element in the file,
> convert its value to the appropriate <name> value, and evaluate the
> following predicate:
>
>     normalize-space(Airport_Name) eq name
>
> Suppose the predicate evaluates to true for every conversion.
>
> Have I verified the correctness of my conversion?

Maybe.

Your specification of correctness seems to involve more hand-waving and
more silent assumptions than I think is healthy if you care about
verifying the correctness of the conversion.

Even if I take the observation that

    The value of <name> is identical to the value of <Airport_Name>,
    except trailing spaces are removed.

as a requirement and not just an observation about the one example, I
have the feeling that you have left some things out.

- Is it required that every 'Airport_Name' element in the input
  correspond to exactly one 'name' element in the output?

  Possible failures: omissions; duplications.

- Is it required that every 'name' element in the output? correspond to
  exactly one 'Airport_Name' element in the output?

  Possible failures:  injection of new data; merger of input elements.

- Is it required that the airport names in the input be distinct?

- Is it required that the airport names in the input be spelled with
  characters in the range U+0020 to U+007F?  You say that, but the
  schema fragment you give does not enforce it.

- Is the input an XML document or just a sequence of 'Airport_Name'
  elements?  If the former, what is required to happen to the root
  element?

- Is the output to be an XML document or just a sequence of 'name'
  elements?  

- Is the input allowed, required, or forbidden to have elements other
  than 'Airport_Name' elements and a root element?  What happens to
  those elements?

- Is the output allowed, required, or forbidden to have elements other
  than 'name' elements and a root element?  

- If the input does not satisfy the pre-condition(s), is the
  transformation obligated to recover silently (how?), recover with a
  warning (how?), or fail?

- Is the order of 'name' elements in the output required to match the
  order of 'Airport_Name' elements in the input?

  Note that unless the order of names in the input carries some
  information, re-ordering in the output will not change the information
  content of the dataset.  But if the presence of duplicates carries
  information, you won't be able to say "the set of names in the output
  is the same as the set of names produced by stripping whitespace from
  the names of the input"; you'll need to talk about bags or multi-sets
  instead.

> If yes, then the way to verify the correctness of an XML-to-XML conversion is:
>
> 1 Validate the source element against the XML Schema for the source element
>
> 2 Validate the converted target element against the XML Schema for the target element
>
> 3 Create a predicate that the source-target conversion must satisfy and evaluate the predicate 
>
> 4 If all predicates return true, then the correctness of the conversion is verified
>
> Do you agree?

Not in the general case, no.  For the task of information-preserving
translation from one notation into another, these points seem to me
likely to be necessary but not always sufficient.  (If I could say when
they won't be sufficient, I would be closer to a satisfactory
understanding of this problem than I now am.)

In other cases, I'm not sure they can be applied without more careful
formulation.

Consider an XML-to-XML conversion whose input is the standard XML
representation of a conforming ixml grammar and whose output is an
equivalent grammar in something like BNF form (no alternatives except at
the top level of the right-hand side, no expressions of the form E* or
E+ or E?, etc.).  There is not a one to one relation between all input
elements and output elements, and attempting to define a one-to-many or
many-to-many relation so that I can check a predicate on all pairs seems
likely to end badly. If G and G' are the input and output grammars
respectively, then correctness here requires something like:

  - G' has no 'alts', 'repeat0', 'repeat1', 'option' or 'sep' elements.
    
  - In G' every 'alt' element has a 'rule' element as its parent.  (This
    follows from the preceding plus schema validity, but just in case.)

  - L(G') = L(G).

  - For any input string s, the XML output produced by parsing s against
    G is identical to the XML output produced by parsing s against G'.

This requires, of course, that L(G) have a specified meaning.  (As Liam
Quin has pointed out, sometimes the domain-specific meaning of the XML
matters for correctness.)

Or consider an XML-to-XML transformation which takes a document in (an
XML form of) the Alloy modeling language and produces a document in the
same format in which a new column is added to a particular set of
relations.  Again the domain-specific meaning is central to any
specification of the requirements for the transformation.

In both case, there are going to be clear correspondences between some
of the input elements and some of the output elements, but there will
not be a 1:1 mapping between the elements of the input and those of the
output, and it may or may not be possible to define a predicate that
captures just how pairs of corresponding elements must relate.  For the
EBNF to BNF translation, it is not required by the specification given
above that the set of strings generated by a given nonterminal N in G be
generated by N in G'.  It is not even required that there *be* a
nonterminal N in G'.  Preserving all the nonterminals of G in G', and
preserving the sets of strings they generate, is certainly a good idea
because it makes proving the correctness of the transformation much
easier.  But that's an implementation choice, not a condition of
correctness.

Thank you for the concrete example.


-- 
C. M. Sperberg-McQueen
Black Mesa Technologies LLC
http://blackmesatech.com


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 1993-2007 XML.org. This site is hosted by OASIS