Re: [xml-dev] What techniques do you employ to ensure that your data is

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]

Re: [xml-dev] What techniques do you employ to ensure that your data is precisely and unambiguously specified?

From: "G. Ken Holman" <gkholman@CraneSoftwrights.com>
To: "xml-dev@lists.xml.org" <xml-dev@lists.xml.org>
Date: Mon, 24 Sep 2012 12:41:36 -0400

At 2012-09-24 16:12 +0000, Costello, Roger L. wrote:
>How carefully do you specify your data? Is it free from ambiguity 
>and misinterpretation?
>
>Although all data should be specified precisely and without 
>ambiguity, the consequences of data being imprecise or ambiguous 
>ranges from minor to catastrophic.  There is a name for the latter 
>data: level 1 data.
>
>       Level 1 (critical) data: the data must be specified
>       precisely with absolutely no ambiguity. For if the data
>       is misinterpreted, then there is a real
>       possibility of a loss of human life or financial,
>       political, or personal calamity.
>
>When you create your XML Schemas do you identify the level of the 
>data? For example, do your XML documents contains a <level> element:
>
>     <Document>
>             <level>level 1 (critical) data</data>
>             ...
>       </Document>
>
>Has anyone created an XML Schema for the various levels?
>
>[Key Question:] If you have level 1 data what techniques do you 
>employ to ensure that there is virtually zero chance for the data to 
>be misunderstood or misinterpreted?

At a Balisage conference many years ago (if readers haven't heard of 
Balisage, it is *the* conference to go to!) I presented my use of 
what I called "reference numbers in an XPath file" in order to 
unambiguously specify XML data that belongs at a particular place on 
the printed page when writing a stylesheet.

The problem being solved is that fully-qualified absolute XPath 
addresses are too lengthy to write into small boxes on the printed 
page of an invoice.  In an "XPath file" each reference number is a 
compact cross reference to a fully-qualified absolute XPath 
address.  I mock up the end result on a piece of paper and ask the 
committee member to fill in the boxes of the page with the reference 
numbers guiding me unambiguously on how to write my stylesheet.  When 
working with clients, I ask them both to create the mockup and 
populate the page with a set of reference numbers.

One can produce these reference numbers either from the schema (I 
only supported UBL schemas because of the regular nature by which the 
schemas are expressed; my implementation does not work for arbitrary 
schemas), or from an arbitrary instance (doesn't have to be UBL, can 
be any XML document).

It turns out the "instance XPath file" has become very helpful in 
other projects and with clients.  Take any XML instance, create its 
XPath file, then cite the reference numbers to unambiguously talk 
about any element or attribute in the document.

But XPath files are fragile ... if you change anything about the 
document, the reference numbers break and you have to recreate the 
XPath file, which then might invalidate any work you've done already 
by citing the old reference numbers.  So I just tell my client to 
freeze the XML document before generating the XPath file for it, then 
use the reference numbers without changing the file.

You can download the resources I have here:

   http://www.CraneSoftwrights.com/resources/ubl/index.htm#xpath

Consider the Crane-xml2xpath package.  Download the stylesheets and 
you'll find that you can feed any arbitrary XML document into 
Crane-xml2xpath.xsl to produce the XPath file in text.  You use 
Crane-xml2xpath-html.xsl to produce the XPath file in HTML.

Then, should you wish, you can convert the original file into a new 
XML document I call the "XPath Instance" document.  This document 
mirrors the structure of the input XML document but populates every 
element and every attribute with the corresponding reference 
number.  This is helpful when writing stylesheets because stylesheets 
do not (necessarily) do validation.  So when you've written your 
stylesheet and you put the XPath instance document in as input, what 
you end up getting in output is the set of reference numbers laid out 
on the page to use for a visual validation that your stylesheet 
algorithm is correct.

Now, since UBL stylesheets only do layout and no calculations, there 
was never a problem with the reference numbers in the instance 
document messing up arithmetic.  And don't even try to validate an 
instance document because of the format of reference 
numbers:  "!###!" (the bangs are there to distinguish two adjacent 
reference numbers in the formatted result).

Also, since UBL documents are exclusively element content, my 
stylesheets are not geared for mixed content.

So, given those limitations, I do not consider this environment as 
general purpose.  But it is incredibly useful for UBL documents and 
will be useful for any element-content XML document.  It isn't a 
standard approach, but it is a useful approach.

I hope this is helpful.

. . . . . . . . . . . . Ken

--
Contact us for world-wide XML consulting and instructor-led training
Free 5-hour lecture: http://www.CraneSoftwrights.com/links/udemy.htm
Crane Softwrights Ltd.            http://www.CraneSoftwrights.com/x/
G. Ken Holman                   mailto:gkholman@CraneSoftwrights.com
Google+ profile: https://plus.google.com/116832879756988317389/about
Legal business disclaimers:    http://www.CraneSoftwrights.com/legal

References:
- What techniques do you employ to ensure that your data is preciselyand unambiguously specified?
  - From: "Costello, Roger L." <costello@mitre.org>

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]