RE: Caution using XML Schema backward- or forward-compatibilityas a vers

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]

RE: Caution using XML Schema backward- or forward-compatibilityas a versioning strategy for data exchange

From: "Cox, Bruce" <Bruce.Cox@USPTO.GOV>
To: "Costello, Roger L." <costello@mitre.org>,xml-dev@lists.xml.org
Date: Thu, 27 Dec 2007 15:22:01 -0500

Title: RE: Caution using XML Schema backward- or forward-compatibility as a versioning strategy for data exchange

It seems to me that there are some hidden assumptions in this discussion that should be exposed. For example, where is the boundary between the "business" and the technology that implements it? I'd say that a web service isn't the business, just one of the widgets used to conduct business. The "business" is an abstraction that is independent of the implementing technology. The "business" has layers, as does the implementing technology. Change arises from any of the layers. How that is managed varies with the culture of the layer and the organizations doing business.

In the case of the USPTO disseminating patent and trademark information, there are at least the following layers of activity to consider:

1. For the broad context, we look to the industrial property (IP) industry (patents and trademarks, but not copyright), to the information technology (IT) industry, and to US Federal regulations that set the tone and framework for information dissemination activities in general. Policy decisions are at this level.

2. There are parts of the Code of Federal Regulations that address patents and trademarks specifically, amplified by Federal Register notices, with exhaustive detail in the manuals for patent and trademark examining procedures that guide examiners and applicants. With the addition of the occasional court decision, this is usually the level form which "business rule changes" arise for patent data.

3. But there are lower-level business procedures as well that govern and manage file-wrapper creation, updating, retirement, and disposition; examiner behavior; dispute resolution; and a myriad of other details required to process the approximately 180,000 patents published in a year, as well as the approximately 400,000 applications received in a year. Many Federal regulations apply here (privacy, security, 508 compliance, etc.) These procedures can motivate data changes as well.

(The Patent Office "owns" the above business activities, while the Office of the CIO "owns" the following IT activities.)

4. There are also IT procedures and processes that are engaged to manage patent data at the lowest logical and physical levels, which is where the XML schema reside. This has largely to do with creating the dissemination products for their primary use, which is to build search systems for patent examiners and the public. Federal regulations are implemented here following "best practices" and relevant IT standards. Relevant standards include the output of the W3C, Oasis, ISO, the World Intellectual Property Organization (WIPO), as well as a few US Federal agencies. Internal IT best practices and procedures also provide constraints. US schemas for patent documents are a local implementation of a WIPO standard.

5. The dissemination activity itself. This activity manages subscriptions, provides limited customer support (documents changes from version to version, maintains an FAQ, for example), maintains a catalog, and maintains an archive of the products. Strictly a "come and get it" operation.

A change can be entirely IP business motivated, entirely IT motivated, imposed by Federal regulation, or some combination of the above. Some of these activities will change the meaning of data without altering the data or its syntax. Others will change syntax without altering meaning or the data. Others will change the data without altering meaning or syntax.

This is relevant because it determines whether there is any sense in which a Schema change could be forward or backward compatible. Those who set policy, pass laws, write regulations, or set the rules for patent examination, on the whole, do not concern themselves with forward compatibility for the corresponding XML schema (nor should they).

It is a paramount rule for our XML Schema creation, however, that the Schema is as faithful a reflection of the business as we can make it, generally without regard to the consequences for IT. This tight coupling of the Schema and the business has the consequence of making the XML very sensitive to change in the business. The agility needed to quickly adapt to those changes must therefore be achieved by loosely-coupled widgets in the implementation. (I mention this in case someone wants to decouple the schema from the business to solve the problem of change management; that would deflate the primary selling point for XML, in my judgment.)

On the other hand, if the motivation for a change is from the IT level, or non-IP regulatory level, there might be greater opportunity for achieving forward or backward compatibility. In actual practice, while backward compatibility is considered and incorporated where possible, we have never considered forward compatibility a desirable goal. I think this is largely due to the fact that patent data is more document-centric, unlike the CDC example that Roger gave, which is more data-centric.

I’ve been reviewing our attempts at change management for XML-based patent publications over the past eight years, and I can think of no occasion when life was simple enough to fit Roger’s model. (There may be such situations, just not in my experience.) If we succeeded, it was largely because, as Ian Graham said he does, we assessed the impact of a change on the complete life-cycle of the data, within the context of all the layers I described above. Just last week, we stopped a change that had been planned for nearly a year, because of previously unconsidered consequences for downstream USPTO systems that had no funding to adapt and because the change in the data would have made it arbitrarily diverge from the relevant international standard.

I also want to thank Ian for his elegant phrase, “suitably qualified idiot” – much less pejorative than crackpot.

Bruce B Cox

Manager, Standards Development Division

US Patent & Trademark Office

The contents of this message are the personal opinions of the author and must not be construed as an official statement of the USPTO.

-----Original Message-----
From: Costello, Roger L. [mailto:costello@mitre.org]
Sent: 2007 December 26, Wednesday 11:09
To: xml-dev@lists.xml.org
Subject: Caution using XML Schema backward- or forward-compatibility as a versioning strategy for data exchange

Hi Folks,

Designing XML Schemas to be backward- or forward-compatible is a

popular approach to data versioning.

I think some cautions need to be raised with versioning strategies

based on XML Schema backward- or forward-compatibility.

Below I list the cautions. Do you agree with these cautions? Are

there cautions I have missed?

SCENARIO

Consider deploying a web service. Assume the web service has no

knowledge of who its clients are, or how clients use the data they

retrieve from the web service.

The web service uses an XML Schema to describe the syntax of the data

it exchanges with clients.

The web service uses the following data versioning strategy:

Each new version of the XML Schema is designed to be

forward-compatible.

Thus a client with an old XML Schema can validate an XML instance

document generated by the web service using a new XML Schema.

ISSUE

Given the scenario described, what cautions should be raised on the use

of forward-compatible XML Schemas as a versioning strategy for data

exchange?"

NOTE

A versioning strategy based on backward-compatibility has the same

cautions. I will not explicitly mention backward-compatibility in the

rest of this message, but bear in mind that the comments apply to it as

well.

CAUTION #1: JUST BECAUSE A CLIENT CAN VALIDATE THE DATA IT RETRIEVES

DOESN'T MEAN IT CAN PROCESS THE DATA

Consider a client application implemented to process version 1 data

from the web service.

Suppose the web service changes its XML Schema, in a forward-compatible

fashion. Will the client application be able to process the new

(version 2) data?

Since the XML Schema is forward-compatible the application will be able

to "validate" the new data.

But it is not necessarily the case that the application will be able to

"process" the new data.

Example #1: Suppose in the first version of the XML Schema this

element:

means "distance from center of town." Accordingly, the client's

application does calculations based on that meaning.

In the version 2 data the syntax is changed in a forward-compatible

fashion. In addition, the semantics of the <distance> element is

changed to "distance from town line."

The client application will be able to validate the version 2 data, but

the calculations will be incorrect.

Example #2: If the version 1 XML Schema defaults the <distance> units

to miles and the version 2 XML Schema defaults the <distance> units to

kilometers then the data will validate but the client's application

will make incorrect calculations.

Lesson Learned #1: Data may change syntactically in such a way that

validation is not impacted, and yet applications break.

Lesson Learned #2: Just because an application can validate data

doesn't mean it can process the data.

Lesson Learned #3: Forward-compatible XML Schemas yield increased

validation but not necessarily increased application processing.

Lesson Learned #4: There is no necessary correlation between the

ability to validate data and the ability to process data.

Lesson Learned #5: A versioning strategy must take into account:

1. Syntactic changes

2. Relationship changes

3. Semantic changes

CAUTION #2: FORWARD-COMPATIBLE CHANGES ARE BASED ON TECHNOLOGY

LIMITATIONS RATHER THAN APPLICATION REQUIREMENTS

Designing a new version of an XML Schema to be forward-compatible with

an old version necessitates that the only changes made in the new

version are "subset" changes, such as:

- constrain an element's or attribute's datatype

- reduce the number of occurrences of an element

- eliminate an optional element or attribute

- remove an element from a choice

This is very restrictive. And to what avail? Answer: to enable

validation of new XML instance documents against an old XML Schema.

But as described above, just because data can be validated doesn't mean

it can be processed.

Further, for the scenario we have been considering, the web service has

no idea about how its data will be processed by clients. Accordingly,

there is no evidence that the additional validation provided by

forward-compatible XML Schemas will help clients.

Lesson Learned #6: A versioning strategy based on forward-compatible

XML Schemas imposes limitations on the types of changes; those

limitations may not be consistent with the actual changes needed by an

application.

Lesson Learned #7: Version data based on data requirements rather than

technology limitations.

QUESTIONS

1. Do you agree with the cautions listed above?

2. Are there other cautions?

3. Do you agree with the Lessons Learned?

4. Given the scenario described above, is it wise to base a versioning

strategy on forward-compatible XML Schemas?

/Roger

Follow-Ups:
- Re: [xml-dev] RE: Caution using XML Schema backward- or forward-compatibilityas a versioning strategy for data exchange
  - From: Thomas Lord <lord@emf.net>

References:
- Caution using XML Schema backward- or forward-compatibility as a versioning strategy for data exchange
  - From: "Costello, Roger L." <costello@mitre.org>

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]