RE: [xml-dev] 3 approaches to structure lists, plus an analysisof each a

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]
RE: [xml-dev] 3 approaches to structure lists, plus an analysisof each approach
From: "Cox, Bruce" <Bruce.Cox@USPTO.GOV>
To: "G. Ken Holman" <gkholman@CraneSoftwrights.com>,xml-dev@lists.xml.org
Date: Thu, 19 Feb 2009 17:02:25 -0500
Thanks for the detailed information, Ken.  It looks like the problem has
been thoroughly addressed and all I have to do is understand it.  :O)


Bruce B Cox
Manager, Standards Development Division
USPTO/OCIO/SDMG
571-272-9004


-----Original Message-----
From: G. Ken Holman [mailto:gkholman@CraneSoftwrights.com] 
Sent: Wednesday, February 18, 2009 8:33 PM
To: xml-dev@lists.xml.org
Subject: RE: [xml-dev] 3 approaches to structure lists, plus an analysis
of each approach

At 2009-02-18 19:47 -0500, Cox, Bruce wrote:
>Ken, does the approach you describe below address version control?

Absolutely.  Each version of every list has unique list-level meta 
data in the XML genericode file expressing the values of the list.

>There is the unfortunate potential for country codes, for example, to
be
>used differently at different times, as geopolitical boundaries change.

Indeed.  Each code in every list can have value-level meta data 
expressed in the genericode file, helping the reader understand the 
semantics represented by each code.

>For a patent publication, we feel we need to know when the country code
>in question was in force, so the version of the list used is important.

It is important to cite *in the XML document* the instance-level meta 
data identifying the list-level meta data of the list from which the 
value in the instance was obtained.  This gives the recipient the 
value-level meta data to interpret the intended semantics of the value.

For example, the UN/CEFACT Core Component Technical Specification 
(CCTS) 2.01 core component types define a number of facets of 
instance-level meta data, called "supplementary components", that are 
attributes attached to the element that contains the code as content 
or the code as an attribute.

The XML instance author can choose to leave the instance-level meta 
data empty, in which case the interpretation of the code is up to the 
receiver ... for example, a currency value of "USD" is probably US 
dollars.  But in your example omitting instance-level meta data might 
make interpretation ambiguous and imprecise.  Specifying 
instance-level meta data with the country code one would convey the 
unambiguous values of list-level meta data of the code list from 
which the code was derived, thus leading the recipient of the code to 
inspect the value-level meta data associated with the code to 
comprehend the semantics represented by the value used.

>Our situation is complicated by the fact that there are Offices issuing
>patent rights that are not associated with a country, but cover a
larger
>region, such as the European Patent Office.

Not a problem with the use of context/value association files 
declaring that the values of a particular XML information item are 
governed by the union of two genericode lists:  one for the ISO 
country codes (and their semantics for the values), and one for the 
patent community's representation of regions (perhaps that's another
list).

>These institutions are
>given two-letter codes in WIPO Standard ST.3, which also incorporates
>ISO codes for all the member states' Offices. Yes, it duplicates ISO
>country codes, but only because the UN does not always recognize the
>changes in political boundaries *at the same time* that the ISO
>standards are updated, so WIPO has to have its own "politically
correct"
>list.

Oh, then I suppose you wouldn't accept the ISO semantics for the 
values, so you don't need that union.  You could simply point to the 
single WIPO country code list, citing a particular version of that 
list.  But this underscores the importance of the instance-level meta 
data to tell the recipient how to interpret a particular coded value 
... to expand on an example from my book you might have:

   <PatentFilingCountry listSchemeURI="urn:x-WIPO:Country Codes:1992"
      >CS</PatentFilingCountry>

... representing Czechoslovakia, and

   <PatentFilingCountry listSchemeURI="urn:x-WIPO:Country Codes:2007"
      >CS</PatentFilingCountry>

... representing Serbia and Montenegro.

Without the instance-level meta data, just having the value "CS" 
would be ambiguous.

So you want *both* lists (or as many lists) to apply to the same 
element content, and this can be expressed in context/value 
association as the union of two (or more) versions of the WIPO 
list.  With the appropriate genericode files the free Schematron 
implementation of CVA validation on our web site would successfully
validate:

   <PatentFilingCountry listSchemeUri="urn:x-WIPO:Country Codes:1992"
      >FI</PatentFilingCountry>

... and:

   <PatentFilingCountry listSchemeURI="urn:x-WIPO:Country Codes:1996"
      >SF</PatentFilingCountry>

... while rejecting:

   <PatentFilingCountry listSchemeURI="urn:x-WIPO:Country Codes:1996"
      >FI</PatentFilingCountry>

... thus allowing only the single entry for Finland in each of two 
lists with two different values based on the list versions.

There are strategies for omitting instance-level meta data should you 
anticipate a value to be added to a list, say, six months from 
now:  create your instance without instance-level meta data, and 
validate with the union of the published list and your custom 
extension list with the future value.  Later on when the new list is 
published, the instance doesn't change but it will validate with the 
new list, not unioned with your temporary transition extension that 
has since evaporated.  But there is a risk that the committee ends up 
using a different value and the instance won't validate ... but at 
least there was a migration strategy when the future information was 
more certain.

When designing your XML vocabulary for the use of code lists and 
identifiers, you have a responsibility to provide for instance-level 
meta data.  With a few minor exceptions, the UN/CEFACT supplementary 
components for codes are expressed in the attributes:

   listID=
   listAgencyID=
   listAgencyName=
   listName=
   listVersionID=
   listURI=
   listSchemeURI=

The UN/CEFACT supplementary components for identifiers are expressed 
in the attributes:

   schemeAgencyID=
   schemeAgencyName=
   schemeName=
   schemeVersionID=
   schemeDataURI=
   schemeURI=

For those archive readers with a copy of our "Practical Code List 
Implementation", this is detailed for UN/CEFACT core component types 
on pages 48/49.

I'll be talking more about these concepts at XML Prague 2009 
http://www.xmlprague.cz trying to convey the importance to designers 
of XML vocabularies.

I hope this helps, Bruce.

. . . . . . . . . Ken

--
Upcoming hands-on  XQuery, XSLT, UBL & code list training classes:
Brussels, BE 2009-03;  Prague, CZ 2009-03, http://www.xmlprague.cz
Training tools: Comprehensive interactive XSLT/XPath 1.0/2.0 video
Video lesson:    http://www.youtube.com/watch?v=PrNjJCh7Ppg&fmt=18
Video overview:  http://www.youtube.com/watch?v=VTiodiij6gE&fmt=18
G. Ken Holman                 mailto:gkholman@CraneSoftwrights.com
Crane Softwrights Ltd.          http://www.CraneSoftwrights.com/x/
Male Cancer Awareness Nov'07  http://www.CraneSoftwrights.com/x/bc
Legal business disclaimers:  http://www.CraneSoftwrights.com/legal
References:
- RE: [xml-dev] 3 approaches to structure lists, plus an analysis of each approach
  - From: "G. Ken Holman" <gkholman@CraneSoftwrights.com>
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]