It seems
to me that there are some hidden assumptions in this discussion that should be exposed. For
example, where is the boundary between the "business" and the
technology that implements it? I'd say that a web service isn't the business, just one of the
widgets used to conduct business. The "business" is an abstraction that is independent of the
implementing technology. The "business" has layers, as does the
implementing technology. Change arises from any of the layers. How
that is managed varies with the
culture of the layer and the organizations doing business.
In the
case of the USPTO disseminating patent and trademark information, there are at least the
following layers of activity to consider:
1. For
the broad context, we look to the industrial property (IP) industry (patents and trademarks, but not copyright), to the information
technology (IT) industry, and to US Federal regulations that set the tone and
framework for information dissemination activities in general. Policy decisions are at this level.
2. There are parts of the Code of Federal
Regulations that address patents and trademarks specifically, amplified by Federal Register notices, with exhaustive detail in the manuals for patent and
trademark examining procedures that guide examiners and applicants. With the addition of the occasional court
decision, this
is usually the level form which "business rule changes" arise for patent data.
3. But there are lower-level
business procedures as well that govern
and manage file-wrapper creation, updating, retirement, and disposition; examiner behavior; dispute resolution; and a myriad of other details required to process the approximately 180,000 patents published
in a year, as well as the approximately 400,000 applications received in a year. Many Federal regulations
apply here (privacy, security, 508 compliance, etc.) These procedures can motivate data changes as well.
(The Patent Office "owns" the above business activities, while the
Office of the CIO "owns" the following IT activities.)
4. There are also IT
procedures and processes that are engaged to manage
patent data at the lowest logical and physical levels, which is where the XML schema reside. This has largely to do with
creating the dissemination products for their primary use, which is to
build search systems for patent examiners and the public. Federal regulations are implemented here
following "best practices" and relevant IT standards. Relevant standards include
the output of the W3C, Oasis, ISO, the World Intellectual Property Organization
(WIPO), as well as a few US Federal agencies. Internal IT best practices
and procedures also provide constraints. US schemas for patent documents are a local implementation of a WIPO standard.
5. The dissemination activity
itself.
This activity manages subscriptions, provides limited customer support
(documents changes from version to version, maintains an FAQ, for example), maintains a catalog, and maintains an archive of the
products. Strictly a "come and get
it" operation.
A change
can be entirely IP business motivated, entirely IT motivated,
imposed by Federal regulation, or some combination of the
above. Some
of these activities will change the meaning of data without altering
the data or its syntax. Others will change syntax without altering meaning or the data.
Others will change the data without altering meaning or syntax.
This is
relevant because it determines whether there is any sense in which a
Schema change could be forward or backward compatible. Those who set policy, pass
laws, write regulations, or set the rules for patent examination, on the whole, do not concern themselves with forward compatibility for the corresponding XML schema (nor should they).
It is a
paramount rule for our XML Schema creation,
however, that the Schema is as faithful a reflection of the
business as we can make it, generally without regard to the consequences for IT. This tight coupling of the Schema and the business has
the consequence of making the XML very sensitive to change in the business. The agility needed to quickly adapt to those changes must
therefore be achieved by loosely-coupled widgets in the
implementation. (I mention this in case someone wants to decouple
the schema from the business to solve the problem of change management; that
would deflate the primary selling point for XML, in my judgment.)
On the
other hand, if the motivation for a change is from the IT level, or non-IP regulatory level, there
might be greater opportunity for achieving
forward or backward compatibility. In actual practice, while backward
compatibility is considered and incorporated where possible, we have
never considered forward compatibility a desirable goal. I think
this is largely due to the fact that patent data is more document-centric, unlike the
CDC example that Roger gave, which is more data-centric.
I’ve been reviewing our attempts at change
management for XML-based patent publications over the past eight years, and I can think of no occasion when life was simple enough to
fit Roger’s model. (There may be such situations, just not in my experience.) If we succeeded, it was largely because, as Ian Graham said he does,
we assessed the impact of a change on the complete life-cycle of the data, within the context of all the
layers I described above. Just last week, we stopped a change that had
been planned for nearly a year, because of previously unconsidered consequences
for downstream USPTO systems that had no funding to adapt and because the change in
the data would have made it arbitrarily diverge from the relevant
international standard.
I also
want to thank Ian for his elegant phrase, “suitably qualified
” – much less
pejorative than crackpot.
Bruce B Cox
Manager,
Standards Development Division
US Patent
& Trademark Office
The
contents of this message are the personal opinions
of the author and must not be construed as an
official statement of the USPTO.
-----Original
Message-----
From: Costello, Roger L. [mailto:costello@mitre.org]
Sent: 2007 December 26, Wednesday 11:09
To: xml-dev@lists.xml.org
Subject: Caution using XML Schema backward- or forward-compatibility as
a versioning strategy for data exchange
Hi Folks,
Designing
XML Schemas to be backward- or forward-compatible is a
popular
approach to data versioning.
I think
some cautions need to be raised with versioning strategies
based on
XML Schema backward- or forward-compatibility.
Below I
list the cautions. Do you agree with these cautions? Are
there
cautions I have missed?
SCENARIO
Consider
deploying a web service. Assume the web service has no
knowledge
of who its clients are, or how clients use the data they
retrieve
from the web service.
The web
service uses an XML Schema to describe the syntax of the data
it
exchanges with clients.
The web
service uses the following data versioning strategy:
Each
new version of the XML Schema is designed to be
forward-compatible.
Thus a
client with an old XML Schema can validate an XML instance
document
generated by the web service using a new XML Schema.
ISSUE
Given the
scenario described, what cautions should be raised on the use
of
forward-compatible XML Schemas as a versioning strategy for data
exchange?"
NOTE
A
versioning strategy based on backward-compatibility has the same
cautions.
I will not explicitly mention backward-compatibility in the
rest of
this message, but bear in mind that the comments apply to it as
well.
CAUTION
#1: JUST BECAUSE A CLIENT CAN VALIDATE THE DATA IT RETRIEVES
DOESN'T
MEAN IT CAN PROCESS THE DATA
Consider
a client application implemented to process version 1 data
from the
web service.
Suppose
the web service changes its XML Schema, in a forward-compatible
fashion.
Will the client application be able to process the new
(version
2) data?
Since the
XML Schema is forward-compatible the application will be able
to
"validate" the new data.
But it is
not necessarily the case that the application will be able to
"process"
the new data.
Example
#1: Suppose in the first version of the XML Schema this
element:
<distance>100</distance>
means
"distance from center of town." Accordingly, the client's
application
does calculations based on that meaning.
In the
version 2 data the syntax is changed in a forward-compatible
fashion.
In addition, the semantics of the <distance> element is
changed
to "distance from town line."
The
client application will be able to validate the version 2 data, but
the
calculations will be incorrect.
Example
#2: If the version 1 XML Schema defaults the <distance> units
to miles
and the version 2 XML Schema defaults the <distance> units to
kilometers
then the data will validate but the client's application
will make
incorrect calculations.
Lesson
Learned #1: Data may change syntactically in such a way that
validation
is not impacted, and yet applications break.
Lesson
Learned #2: Just because an application can validate data
doesn't
mean it can process the data.
Lesson
Learned #3: Forward-compatible XML Schemas yield increased
validation
but not necessarily increased application processing.
Lesson
Learned #4: There is no necessary correlation between the
ability
to validate data and the ability to process data.
Lesson
Learned #5: A versioning strategy must take into account:
1.
Syntactic changes
2.
Relationship changes
3.
Semantic changes
CAUTION
#2: FORWARD-COMPATIBLE CHANGES ARE BASED ON TECHNOLOGY
LIMITATIONS
RATHER THAN APPLICATION REQUIREMENTS
Designing
a new version of an XML Schema to be forward-compatible with
an old
version necessitates that the only changes made in the new
version
are "subset" changes, such as:
-
constrain an element's or attribute's datatype
-
reduce the number of occurrences of an element
-
eliminate an optional element or attribute
-
remove an element from a choice
This is
very restrictive. And to what avail? Answer: to enable
validation
of new XML instance documents against an old XML Schema.
But as
described above, just because data can be validated doesn't mean
it can be
processed.
Further,
for the scenario we have been considering, the web service has
no idea
about how its data will be processed by clients. Accordingly,
there is
no evidence that the additional validation provided by
forward-compatible
XML Schemas will help clients.
Lesson
Learned #6: A versioning strategy based on forward-compatible
XML
Schemas imposes limitations on the types of changes; those
limitations
may not be consistent with the actual changes needed by an
application.
Lesson
Learned #7: Version data based on data requirements rather than
technology
limitations.
QUESTIONS
1. Do you
agree with the cautions listed above?
2. Are
there other cautions?
3. Do you
agree with the Lessons Learned?
4. Given
the scenario described above, is it wise to base a versioning
strategy
on forward-compatible XML Schemas?
/Roger