xml-dev - XML document/message versioning -- possible model?

XML document/message versioning -- possible model?

[ Lists Home | Date Index | Thread Index ]

To: xml-dev@lists.xml.org
Subject: XML document/message versioning -- possible model?
From: Ian Graham <igraham@ic-unix.ic.utoronto.ca>
Date: Sun, 29 Sep 2002 16:45:00 -0400 (EDT)
Reply-to: Ian Graham <ian.graham@utoronto.ca>


I have a question about best practices for version identification
in XML documents/messages. I'll start by explaining the situation 
I'm trying to fathom, and will finish off with my tentative thoughts
... and then hope discussion here will help me understand this 
better. Hopefully this issue is well understood (and has been 
solved long ago), in which case I can happily take that 
model, and move on. 

I have a bunch of applications exchanging XML messages, the 
messages employing multiple namespaces.  The models for 
each namespace are, to some degree, independently developed,
and are formally defined defined using XML Schema. As each 
namespace 'module' evolves, changes in the module will be
reflected in new, updated versions of the associated 
schema, to be archived at (and accesible from) unique, 
well-defined URIs. 

The namespace URIs themselves will only change when there are 
'substantial' semantic changes to the module (...intentially 
avoiding discussion of what 'substantial' means...)

When an application receives a message, it needs to determine 
if the message can be processed, and how.  The first cut is 
to ask: "Do I know the namespaces?" If I do, then I continue. 
If I don't, then I ignore the unknown namespaces, and have 
relatively straightforward rules for determining if and how 
I can proceed.

However, that's a very coarse level of selection. We can 
already envisage cases where small variations in a schema 
can lead to changes in message structure or content that can 
render the message unacceptable to some recipient. But the 
recipient won't know this has happened, as the namespaces 
are unchanged, and there's nothing else in the message to 
indicate the difference.

So the team codes _very_ defensively, and hope for the best.

Ideally I'd (the application) would like to know which 
'version' of the message I've received, and then choose 
whether or not to process it: and hopefully have a 
better sense of which parts of the message are 'safe' 
(consistent with the models I know), and which are potential 
problems.

But that identifier can't be a simple version number, as a 
single number can't easily identify all the ways a new 
'version' may arise.

First thought: the version should be a 'version set' consisting 
of the set of namespace URI / schema URI pairs relevant to
the namespaces in the message:

  {nsURI, schemaURI}

I could then use a schemaLocation attribute to explicitly 
include this information in messages, and use the _value_
as the version vector.

Advantages: uses existing mechanism to pass version information;
            doesn't require centralized management of all the 
	    schema files; 
Problems:   schemaLocation usage is not well specified; not 
            clear (to me) how to combine the referenced schema 
	    files to validate the message data;  also... 
	    several on this list have suggested schemaLocation 
	    is Not a Good Idea In The First Place ....

Second thought:  Require definition of the message structure 
via a master schema file that imports all the schemas relevant 
to all namespaces in the message. The URI for this file becomes
a unique version identifier for the message type.

Advantages: seems simpler in some weird way
Problems:   still coarse grained - recipient can't know which
            schemas correspond to which namespace without 
	    accessing the master schema; requires centralized
	    management (to create/assign/design 'master' schema
	    files); still need to use schemaLocation to pass
	    on schema URI


I'm leaning to the former approach,  but am willing to be 
convinced otherwise.  So comments / criticisms / suggestions are
more than welcome! 

Also (and unfortunately), I don't have mail/list access 
from work (I'm posting from home) ... so please don't confuse 
silence with consent ;-) -- I'll be back to follow up... 

Ian

Prev by Date: Re: [xml-dev] limits of the generic
Next by Date: Re: [xml-dev] limits of the generic
Previous by thread: Emacs; how to override built-in xml-mode?
Next by thread: Namespace Algorithms
Index(es):
- Date
- Thread