OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]
RE: [xml-dev] Data versioning strategy: address semantic, relationship, and syntactic changes?

Which in short form, the attention to detail of versioning is a matter of the business(es) affected by a change and the *tools* for propagating change to both human and automated users in terms of the cost of the change.


This is one of the most daunting challenges to the medical community.   The impact of changes can be very positive but the costs are commensurate.  The complexity of the domain lists created over literally centuries of practice is incredible.


Hmmm... Roger, in complexity theory have you ever come across a term for "resistance" to versioning vs. patching?   We might use 'density' or 'arabesqueness' metaphorically, but there should be a mathematical construct that describes this.  Is a phase transition management system 'beyond' a versioning system?  Is there a type of criticality control parameter affecting phase transitions that could be associated with versioning?  Temperature/Energy/cost comes to mind but in that sense, a versioning system is a means of tuning external parameters and percolation is a suitable model.






-----Original Message-----
From: Cox, Bruce [mailto:Bruce.Cox@USPTO.GOV]
Sent: Monday, December 10, 2007 6:02 PM
To: Greg Hunt; xml-dev@lists.xml.org
Subject: RE: [xml-dev] Data versioning strategy: address semantic, relationship, and syntactic changes?


Greg, Roger, I hope you won't mind if I give some of your interesting ideas a bit of a reality test.


In summary:

For us, changes are usually business driven and decided on cost, and, no, it makes little or no difference what kind of change it is.


In exhausting detail:

At the USPTO, our versioning strategy for the DTD's and style sheets used for patent publications is driven almost entirely by cost.  When a change in a business process provokes a change in patent publications (about 10,000 documents per week), we look at the entire pipeline, including data source, storage, processing, validation, export to publishing contractor, publication, dissemination, consumption by internal search systems, consumption by international exchange partners, consumption by commercial value-added resellers, archives, and final disposition.  Changes to the governing DTD and style sheets are based on that entire analysis.  To the extent possible, changes are made no more frequently than annually and announced six months in advance, primarily so that everyone can get the funding in place in time, make changes, test changes, notify customers, test changes, retrain staff, test changes, update product descriptions, test changes, etc.  We like to test changes on a minimum of two or more weeks of data (20 to 40 thousand documents), but sometimes do it across many months of data through parallel runs.


Granted, our universe is limited in scope.  There are only about 120 patent offices in the world, only a handful use our XML data, and there are fewer than 50 value-added resellers who use our XML data that we know of.  Nevertheless, we identify all changes to everyone we know to be using the data, since we cannot predict what will or won't break someone else's system.  Our business is such that we cannot even dream of placing any constraints on the consumers of the data.  If we miss some of the unknown users, and a change breaks their system, we usually hear about it, especially if it tends to put them out of business.  This has happened with the most innocuous or seemingly trivial of changes as well as the more dramatic changes.  Sometimes we can fix it, sometimes not; you can imagine the rest ... .


It has happened here more than once that some bright idea that seemed to solve a major problem received enough analysis to for us to realize that the cost of implementation far outweighed the benefit.  All our changes are "strong" in the sense of being well-specified.  If they aren't well-specified, they become well-specified, or they don't survive analysis and don't get implemented.  Even the bright ideas that are ultimately abandoned have to be sufficiently well-specified to determine if they can be implemented.


Ontologies and such are usually indecipherable to those who don't know the business they describe, and superfluous to those who do.  Most major business changes in the patent system occur as a result of an act of Congress or as the outcome of some litigation.  In both cases, the Office writes rules that set the meaning of terms for better or worse (and sometimes get revised accordingly), usually based on the language used by Congress or the court.  I don't think there is any mechanical substitute for learning the business you want to engage with.  The world of commerce is far too dynamic for that.  In any case, all changes bite someone, hard or not, sooner or later, so we have little choice but to treat them all as much the same, so we don't categorize them in any way, once agreed. 


During analysis, we take into account the expected benefit as compared to cost, where it can be sometimes useful to understand a change as syntactical only (very low cost as a rule), or structural (more costly, depending on the scope).  Semantic changes are always very costly in the sense of having to retrain habitual users of the data in the new interpretations required.  However, this rarely impacts the DTD (unless there are corresponding changes in structure as well) and is therefore not usually funded from the IT budget.  Nevertheless, considerations for the cost of training can stop an inexpensive DTD change.


There are a number of WIPO Standards that document the meaning of industrial property terminology.  These formed the basis of the vocabulary used in WIPO Standard ST.36, which the USPTO implements as Red Book.  For the most part, for a given element name, all the member states of WIPO assign the same meaning.  However, the harmony is often somewhat superficial, hiding a multitude of variations in rules, traditions, and understanding, among the member states.  That there is as much agreement as there is might be considered an achievement worthy of note.  Without that, I dare say ST.36 could not exist.


And yes, the intellectual property community uses those two-letter ISO country codes for a number of purposes, including place of birth, primary residence, place of filing, mailing address, agent's address, states designated under the PCT, etc., etc.  WIPO Standard ST.3 incorporates, sometimes modifies, and even augments the list with codes for regional authorities that play the role of a patent office for more than one country.  WIPO member states frequently revisit the list as political boundaries change, since the scope of patents is generally limited to a political territory.  Countries usually enact legislation defining the changes in scope of the rights attached to a patent corresponding to the changes in political boundaries.


Bruce B Cox

Manager, Standards Development Division

U.S. Patent & Trademark Office



This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error please notify the sender. This message contains confidential information and is intended only for the individual named. If you are not the named addressee you should not disseminate, distribute or copy this e-mail.

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 1993-2007 XML.org. This site is hosted by OASIS