xml-dev - Schema Extensibility

Schema Extensibility
[ Lists Home | Date Index | Thread Index ]
To: xml-dev@lists.xml.org
Subject: Schema Extensibility
From: "Fraser Goffin" <goffinf@hotmail.com>
Date: Wed, 01 Mar 2006 11:47:56 +0000
Bcc:
For a while I have been continuing a thread which started out thinking about 
versioning of XML schema types, in particular enums. The debate broadened 
and a variety of helpful and interesting views were voiced about versioning 
in general and as a related subject extensibility. Personally I have been 
relating these comments to XML schema structures but I could have easily 
been talking about the service interface supported by those schema. This has 
highlighted some different opinions about the value of various approaches to 
this problem which I hope have resonated with those following the thread.

I have become quite interested in the UBL work that Ken Holman has 
introduced and the position UBL is taking about the separation of the 
validation of structural conformance versus value based.

I guess the thing that I am still mostly undecided about is to do with 
whether to allow for schema extensibility (using xs:any together with the 
'sentry' approach proposed by David Orchard (and others) or whether this is 
a recipe for an uncontrollable vocabulary.

I think the battle-ground is in part characterised by a schema (or service) 
that, once published is considered as immutable, hence any changes REQUIRE a 
NEW VERSION with a NEW NAMESPACE, versus a schema which allows non breaking 
changes to be introduced by both the schema owner and non schema authors and 
supports both forward and backwards compatibility.

The first situation is a 'clean' and explicit model where the semantics are 
guaranteed not to be usurped by a non schema owner but where even relatively 
minor change requirements can have a large impact to implementations 
(especially when there are a large number of external users of this 
vocabulary). Changes often take a relatively long while to surface through 
into the standard and this may impact business priorities. Versioning is 
enabled through support for one or more of the available schema where, from 
time to time, old versions may be deprecated.

The schema extensibility approach promotes the idea that organisations may 
want to represent private relationships using data carried at specified 
points within the standard schema in such a way that that data is only 
relevant between those parties (using a foreign namepsace) and all others 
can safely ignore it (and that the schema author should not necessarily 
attempt to constrain this type of usage). It recognises that the pace of 
change to a standard schema often lags behind the operational requirements 
of user organisations, but those organisations don't want to throw out the 
whole standard and 'go private'. It can imply that some TP extensions may be 
incorporated back into the main body of the standard at a later point in 
which case anyone pair or parties using that extension can agree a move back 
to the standard definition, at a time of their choosing. It also allows the 
schema owner to add non breaking 'compatible' change to a schema. The down 
sides seem to be, that a TP could introduce changes which subvert the 
intended semantics, and that, over time, what might have started out as a 
temporary expedient, turns into an entrenched working implementation that is 
unlikely to be allocated budget to be re-synchronised with the standard.

So, in part the question is, should a schema allow for unknown extensions 
for unknown purposes (but in specified locations) and still be considered as 
'compliant', or should schema authors attempt to constrain (eliminate) that 
behaviour. I can't help feeling the attraction of the second model, but my 
'gut' tells me that something as inflexible will soon become a business 
constraint and that will signal it's demise.

With my SOA hat on I would recognise the importance of interoperability and 
the significant role that standardised vocabularies have to play. I also 
don't especially want to promote the myriad of point-to-point relationships 
that 'going private' implies and instead want to leverage the 'reach' of a 
market standard.

Personally I still have no definative conclusion that I feel comfortable in 
turning into a recommended approach within my own organisation and within 
the industry standards body that I work with from time to time, so I thought 
I'd give it one more go.

Some of the issues and comments highlighted by the earlier thread are 
provided below. Some are direct quotes from contributors, others are excepts 
from various ramblings :-)

Cheers

Fraser

========================

- extensibility is a critical aspect of any data [or service] model. Without 
extensibility all changes (however minor) effectively 'break' all provider 
and consumer implementations.

- there are no 'minor' changes, any change implies a semantic difference.

- backwards compatible yes (the previous version of a schema must be a valid 
instance of the new version), but not necessarily the other way around

- xs:any together with the 'sentry' approach proposed by David Orchard (and 
others) provides a mechanism that allows XML schema to be extended by both 
the schema namespace owner and a non schema author independantly, in a 
manner which supports forwards and backwards compatibility for instance 
documents. That is, some category of change can be accomodated which do NOT 
cause either the consumer or provider implementation to REQUIRE change. Of 
course extensions added by non schema owners represent a private 
relationship between the communicating parties and therfore require an out 
of band exchange of the type definitions and semantics. Also such extensions 
can only be applied to specific locations in the base schema AND using a 
foreign namespace. This is sometimes referred to as the 'must ignore' 
pattern.

- A 'big bang' approach to versioning is not usually achievable in any 
practical sense. That is, it is generally not possible to enforce a 
'breaking' change on all users of a schema/service simultaneously (or even 
within a constrained time window).

- Support for a version of a schema/service can in some cases be self 
regulating. That is, if provider A only supports version 1.0 of a service 
whilst the majority of consumers expect to be able to integrate with version 
1.1 (or 2.0), then chances are that provider A will be unable to win any 
business and will therefore be forced to upgrade. If a consumer supports 
version 1.0 but all potential [preferred] providers have upgraded to a later 
version, the consumer may not be able to place any business on behalf of its 
customers, and will therefore be forced to upgrade (assuming that version 
1.0 and later versions are NOT backwardsly compatible).

- a schema or service interface is immutable. Once published it should never 
be changed (perhaps this is better stated as the operations which make up 
the service interface should never be changed).

- support for concurrent versions of a schema/service is more effective 
method of dealing with change than through schema extensibility. It makes 
versions explicitly typed without the ambiguity of untyped sections (xs:any) 
which require some out of band mechanism to be entered into by each 
participant. Implementing an explicit new version has the crucial advantage 
that it is guaranteed NOT to break a consumer implementation using the 
current vesion unless the provider removes that version.

- Any change to a schema represents a semantic difference and therefore 
cannot be considered as 'minor' and therefore requires a new version.

- We have come to the conclusion that semantically the definition of an 
enumerated field is its enumerations.  Therefore changing the enumerations 
changes the definition. Adding enumerations locally seems like a poor 
practice.

- Adding a new value to a enumeration is not a compatible change if that 
value could be returned to a consumer who currently doesn't know about it 
(using the previous schema definition). If it's just of the receiving side, 
it MAY be compatible since the previous version remains a valid sub-set.

- schema's defined and managed by a standards body often move too slowly to 
accomodate the business priorities of particpants. Allowing local extensions 
can enable an organisation to gain advantage from the broader 'reach' of the 
base standard to the majority of its partners whilst supporting specific 
third party relationships which require additional [private] data not 
[currently] available within the base standard. Sometimes this additional 
data can represent a 'candidate' standard which may be encorporated at a 
future time.

- When standards become an inhibitor to business operations they will be 
usurped by local arrangements.

- Value based validation can be implemented as a separate layer, on top of 
structural conformance.

- Synchronisation of schema variants is necessary at the point when the 
number of variants indicates that the original semantics may have become 
obfusticated or a new semantic ecosystem [related] is emerging.

- If a large number (more than 1 :-) of buisness transactional schema 
include a common complex type, and that complex type needs to be changed, 
this can create a synchronisation problem. So is there a differnt approach 
to dealing with versioning of shared types ?

- We are undertaking a new position where the schema are going to be used 
solely for structural validation, and code list value validation (as agreed 
upon by trading partners) is a separate step.
Follow-Ups:
- RE: [xml-dev] Schema Extensibility
  - From: "Marc de Graauw" <marc@marcdegraauw.com>
- Re: [xml-dev] Schema Extensibility
  - From: "G. Ken Holman" <gkholman@CraneSoftwrights.com>
Next by Date: Re: [xml-dev] Schema Extensibility
Next by thread: Re: [xml-dev] Schema Extensibility
Index(es):
- Date
- Thread