Re: [xml-dev] Stability of schemas -- frequency of versioning

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]
From: "G. Ken Holman" <gkholman@CraneSoftwrights.com>
To: "xml-dev@lists.xml.org" <xml-dev@lists.xml.org>
Date: Mon, 21 Nov 2011 08:32:34 -0500
At 2011-11-21 12:58 +0000, Costello, Roger L. wrote:
>How frequently should schemas be allowed to change?

I think the answer depends on the user community for that schema.

>Let "schemas" refer to XML Schema, Relax NG, DTD, or Schematron.
>
>Let "change" refer to non-backward compatible changes such as 
>requiring a new element.

In the UBL project, that is referred to as a major revision.  A minor 
revision is one that doesn't invalidate instances written according 
to the previous version:  new elements added to old elements must be 
optional, and old elements can have their minimum cardinality lowered 
or their maximum raised (but not the other ways around).  BTW, 
children of newly added optional elements can, of course, have 
mandatory elements.

>I will attempt to persuade you of the following:
>
>      To be effectively deployed, schemas require a certain amount 
> of stability.
>     That is, they shouldn't change too often. Further, any changes 
> that do occur
>     should be backward compatible.

UBL works that way.  2.0 was released in 2006, 2.1 should be out 
sometime in 2012.  As a minor revision, all changes are 
backward-compatible.  All instances of UBL 2.0 are valid instances of UBL 2.1.

>That says, for example, that if your domain is Books then the kind 
>of information that goes into Books is stable; if your domain is 
>financial contracts -- swaps, options, futures -- then the kind of 
>information that goes into financial contracts is 
>stable.  Consequently your schemas are stable. Conversely, if your 
>Book or financial contract schemas are constantly changing then your 
>schema development and software development will thrash and users 
>will be constantly confused.

For UBL the concern is about a hybrid distribution in a wide 
network.  Consider the situation where hundreds of thousands of users 
are using UBL 2.0 and the edict comes down to start using UBL 2.1 
when it is necessary to take advantage of new 2.1 features.  This is 
the case for over 400,000 businesses in Denmark, and tens of 
thousands of users of Tradeshift worldwide.  The changeover will not 
be instantaneous.  A user need only change when they need to take 
advantage of the new features.  If they don't need the new features, 
and most of them likely don't since they are already successfully 
doing business with their existing systems, they can take their time migrating.

A UBL 2.1 system can accept a UBL 2.0 instance without any 
problems.  So in a hybrid environment as new 2.1 systems are added to 
the network, they continue to be able to accept instances of UBL 2.0 
created by systems that have not yet taken the time to upgrade.

For a discussion regarding *forward-compatibility*, I've proposed a 
processing model for UBL 2.0 that will be able to accept instances of 
UBL 2.1.  This is documented and illustrated in section 4 of our 
customization guidelines:

   http://docs.oasis-open.org/ubl/guidelines/UBL2-Customization1.0cs01.pdf

Remember that a schema only addresses interchange issues of ensuring 
the syntax and structure of the document are agreed on between two parties.

Sure you cite Schematron, but that to me is another animal.  Value 
constraints are far more fluid than schema constraints.  One morning 
a cheque from a customer might bounce for me and so that afternoon I 
would want to constrain the payment method for invoices from that 
customer to be cash only.  I can quickly change such value 
constraints on a UBL instance, but I would not want to have to 
quickly change value constraints in my UBL schema.  Think of the 
problems trying to reliably change, test, validate and deploy a new 
schema, let alone a schema that is now customized for one client that 
is different than the schema for all other clients.

So in this discussion, I think one needs to distinguish 
syntax/structural constraints from value constraints.

Note that there are some value constraints that are structural in 
nature:  a community of users using UBL 2.0 may agree amongst 
themselves that an optional construct found in UBL 2.0 is mandatory 
for them.  They could make this a value constraint and add it to a 
layer, such as a Schematron layer.  They could customize their 
community's schema to make it mandatory, but then only they have to 
use their community's schema.  Instances they produce with the 
mandatory item are still valid UBL 2.0 instances for users outside of 
the community because it is optional for them and just happens to be specified.

>An example of a rock-solid schema is the XML Schema for XML Schemas. 
>It hasn't changed in 10 years. And the new version is backward 
>compatible with the old. Ditto for the Relax NG schema for Relax NG schemas.

I would posit that the constraints imposed by backward compatibility 
are more important in an environment where instances of the schemas 
are interchanged with other parties.  In a closed environment, where 
I'm only creating schemas for myself and for my own documents, I can 
control using an older schema validator with an older schema, and a 
newer schema validator with a newer schema.

>Suppose, however, that the information for a domain is required to 
>frequently change, say, three times a year.

If the changes are simply accommodating newly identified requirements 
for some users, backwards compatibility isn't a problem:  make the 
new constructs optional in the schemas so that existing users without 
the new requirements are not inconvenienced.  Users who are impacted 
by the new requirements can then migrate to the new schemas when they 
are ready to or are obliged to.  Layer on some value constraints on 
top of the structural constraints if you want the users who are 
migrating to be obliged to use the new constructs.

>I have attempted to persuade you that a schema may not be a good fit 
>for describing that type of information. But I am at a loss for what 
>is a good fit. What is a good fit?

Someone in our community once said something along the lines of 
"systems should be permissive in what they accept and restrictive in 
what they produce".  In such a system, layered constraints can 
address subsets of the community.  The whole community can agree upon 
a permissive schema and the subset can agree on more constraints on 
top of that permissiveness.

If an environment fundamentally changes three times a year for all 
users, then I think they just need to accept the obligation that they 
have to change their systems in lock-step ... not very nice or 
comfortable, but if the business requirements are there, I guess they 
are stuck.  But I wouldn't take the benefits of schema technology 
away from them once all the systems in their network are brought up 
to the same level of functionality.  Schemas still play a role.

I hope this helps.

. . . . . . . . . . . Ken


--
Contact us for world-wide XML consulting and instructor-led training
Free 5-hour video lecture: XSLT/XPath 1.0 & 2.0 http://ude.my/t37DVX
Crane Softwrights Ltd.            http://www.CraneSoftwrights.com/x/
G. Ken Holman                   mailto:gkholman@CraneSoftwrights.com
Google+ profile: https://plus.google.com/116832879756988317389/about
Legal business disclaimers:    http://www.CraneSoftwrights.com/legal
References:
- Stability of schemas -- frequency of versioning
  - From: "Costello, Roger L." <costello@mitre.org>
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]