It seems to me that there are some hidden assumptions in this discussion that should be exposed. For example, where is the boundary between the "business" and the technology that implements it? I'd say that a web service isn't the business, just one of the widgets used to conduct business. The "business" is an abstraction that is independent of the implementing technology. The "business" has layers, as does the implementing technology. Change arises from any of the layers. How that is managed varies with the culture of the layer and the organizations doing business.
In the case of the USPTO disseminating patent and trademark information, there are at least the following layers of activity to consider:
1. For the broad context, we look to the industrial property (IP) industry (patents and trademarks, but not copyright), to the information technology (IT) industry, and to US Federal regulations that set the tone and framework for information dissemination activities in general. Policy decisions are at this level.
2. There are parts of the Code of Federal Regulations that address patents and trademarks specifically, amplified by Federal Register notices, with exhaustive detail in the manuals for patent and trademark examining procedures that guide examiners and applicants. With the addition of the occasional court decision, this is usually the level form which "business rule changes" arise for patent data.
3. But there are lower-level business procedures as well that govern and manage file-wrapper creation, updating, retirement, and disposition; examiner behavior; dispute resolution; and a myriad of other details required to process the approximately 180,000 patents published in a year, as well as the approximately 400,000 applications received in a year. Many Federal regulations apply here (privacy, security, 508 compliance, etc.) These procedures can motivate data changes as well.
(The Patent Office "owns" the above business activities, while the Office of the CIO "owns" the following IT activities.)
4. There are also IT procedures and processes that are engaged to manage patent data at the lowest logical and physical levels, which is where the XML schema reside. This has largely to do with creating the dissemination products for their primary use, which is to build search systems for patent examiners and the public. Federal regulations are implemented here following "best practices" and relevant IT standards. Relevant standards include the output of the W3C, Oasis, ISO, the World Intellectual Property Organization (WIPO), as well as a few US Federal agencies. Internal IT best practices and procedures also provide constraints. US schemas for patent documents are a local implementation of a WIPO standard.
5. The dissemination activity itself. This activity manages subscriptions, provides limited customer support (documents changes from version to version, maintains an FAQ, for example), maintains a catalog, and maintains an archive of the products. Strictly a "come and get it" operation.
A change can be entirely IP business motivated, entirely IT motivated, imposed by Federal regulation, or some combination of the above. Some of these activities will change the meaning of data without altering the data or its syntax. Others will change syntax without altering meaning or the data. Others will change the data without altering meaning or syntax.
This is relevant because it determines whether there is any sense in which a Schema change could be forward or backward compatible. Those who set policy, pass laws, write regulations, or set the rules for patent examination, on the whole, do not concern themselves with forward compatibility for the corresponding XML schema (nor should they).
It is a paramount rule for our XML Schema creation, however, that the Schema is as faithful a reflection of the business as we can make it, generally without regard to the consequences for IT. This tight coupling of the Schema and the business has the consequence of making the XML very sensitive to change in the business. The agility needed to quickly adapt to those changes must therefore be achieved by loosely-coupled widgets in the implementation. (I mention this in case someone wants to decouple the schema from the business to solve the problem of change management; that would deflate the primary selling point for XML, in my judgment.)
On the other hand, if the motivation for a change is from the IT level, or non-IP regulatory level, there might be greater opportunity for achieving forward or backward compatibility. In actual practice, while backward compatibility is considered and incorporated where possible, we have never considered forward compatibility a desirable goal. I think this is largely due to the fact that patent data is more document-centric, unlike the CDC example that Roger gave, which is more data-centric.
I’ve been reviewing our attempts at change management for XML-based patent publications over the past eight years, and I can think of no occasion when life was simple enough to fit Roger’s model. (There may be such situations, just not in my experience.) If we succeeded, it was largely because, as Ian Graham said he does, we assessed the impact of a change on the complete life-cycle of the data, within the context of all the layers I described above. Just last week, we stopped a change that had been planned for nearly a year, because of previously unconsidered consequences for downstream USPTO systems that had no funding to adapt and because the change in the data would have made it arbitrarily diverge from the relevant international standard.
I also want to thank Ian for his elegant phrase, “suitably qualified idiot” – much less pejorative than crackpot.
Bruce B Cox
Manager, Standards Development Division
US Patent & Trademark Office
The contents of this message are the personal opinions of the author and must not be construed as an official statement of the USPTO.
-----Original Message-----
From: Costello, Roger L. [mailto:costello@mitre.org]
Sent: 2007 December 26, Wednesday 11:09
To: xml-dev@lists.xml.org
Subject: Caution using XML Schema backward- or forward-compatibility as a versioning strategy for data exchange
Hi Folks,
Designing XML Schemas to be backward- or forward-compatible is a
popular approach to data versioning.
I think some cautions need to be raised with versioning strategies
based on XML Schema backward- or forward-compatibility.
Below I list the cautions. Do you agree with these cautions? Are
there cautions I have missed?
SCENARIO
Consider deploying a web service. Assume the web service has no
knowledge of who its clients are, or how clients use the data they
retrieve from the web service.
The web service uses an XML Schema to describe the syntax of the data
it exchanges with clients.
The web service uses the following data versioning strategy:
Each new version of the XML Schema is designed to be
forward-compatible.
Thus a client with an old XML Schema can validate an XML instance
document generated by the web service using a new XML Schema.
ISSUE
Given the scenario described, what cautions should be raised on the use
of forward-compatible XML Schemas as a versioning strategy for data
exchange?"
NOTE
A versioning strategy based on backward-compatibility has the same
cautions. I will not explicitly mention backward-compatibility in the
rest of this message, but bear in mind that the comments apply to it as
well.
CAUTION #1: JUST BECAUSE A CLIENT CAN VALIDATE THE DATA IT RETRIEVES
DOESN'T MEAN IT CAN PROCESS THE DATA
Consider a client application implemented to process version 1 data
from the web service.
Suppose the web service changes its XML Schema, in a forward-compatible
fashion. Will the client application be able to process the new
(version 2) data?
Since the XML Schema is forward-compatible the application will be able
to "validate" the new data.
But it is not necessarily the case that the application will be able to
"process" the new data.
Example #1: Suppose in the first version of the XML Schema this
element:
<distance>100</distance>
means "distance from center of town." Accordingly, the client's
application does calculations based on that meaning.
In the version 2 data the syntax is changed in a forward-compatible
fashion. In addition, the semantics of the <distance> element is
changed to "distance from town line."
The client application will be able to validate the version 2 data, but
the calculations will be incorrect.
Example #2: If the version 1 XML Schema defaults the <distance> units
to miles and the version 2 XML Schema defaults the <distance> units to
kilometers then the data will validate but the client's application
will make incorrect calculations.
Lesson Learned #1: Data may change syntactically in such a way that
validation is not impacted, and yet applications break.
Lesson Learned #2: Just because an application can validate data
doesn't mean it can process the data.
Lesson Learned #3: Forward-compatible XML Schemas yield increased
validation but not necessarily increased application processing.
Lesson Learned #4: There is no necessary correlation between the
ability to validate data and the ability to process data.
Lesson Learned #5: A versioning strategy must take into account:
1. Syntactic changes
2. Relationship changes
3. Semantic changes
CAUTION #2: FORWARD-COMPATIBLE CHANGES ARE BASED ON TECHNOLOGY
LIMITATIONS RATHER THAN APPLICATION REQUIREMENTS
Designing a new version of an XML Schema to be forward-compatible with
an old version necessitates that the only changes made in the new
version are "subset" changes, such as:
- constrain an element's or attribute's datatype
- reduce the number of occurrences of an element
- eliminate an optional element or attribute
- remove an element from a choice
This is very restrictive. And to what avail? Answer: to enable
validation of new XML instance documents against an old XML Schema.
But as described above, just because data can be validated doesn't mean
it can be processed.
Further, for the scenario we have been considering, the web service has
no idea about how its data will be processed by clients. Accordingly,
there is no evidence that the additional validation provided by
forward-compatible XML Schemas will help clients.
Lesson Learned #6: A versioning strategy based on forward-compatible
XML Schemas imposes limitations on the types of changes; those
limitations may not be consistent with the actual changes needed by an
application.
Lesson Learned #7: Version data based on data requirements rather than
technology limitations.
QUESTIONS
1. Do you agree with the cautions listed above?
2. Are there other cautions?
3. Do you agree with the Lessons Learned?
4. Given the scenario described above, is it wise to base a versioning
strategy on forward-compatible XML Schemas?
/Roger