[
Lists Home |
Date Index |
Thread Index
]
I have a question about best practices for version identification
in XML documents/messages. I'll start by explaining the situation
I'm trying to fathom, and will finish off with my tentative thoughts
... and then hope discussion here will help me understand this
better. Hopefully this issue is well understood (and has been
solved long ago), in which case I can happily take that
model, and move on.
I have a bunch of applications exchanging XML messages, the
messages employing multiple namespaces. The models for
each namespace are, to some degree, independently developed,
and are formally defined defined using XML Schema. As each
namespace 'module' evolves, changes in the module will be
reflected in new, updated versions of the associated
schema, to be archived at (and accesible from) unique,
well-defined URIs.
The namespace URIs themselves will only change when there are
'substantial' semantic changes to the module (...intentially
avoiding discussion of what 'substantial' means...)
When an application receives a message, it needs to determine
if the message can be processed, and how. The first cut is
to ask: "Do I know the namespaces?" If I do, then I continue.
If I don't, then I ignore the unknown namespaces, and have
relatively straightforward rules for determining if and how
I can proceed.
However, that's a very coarse level of selection. We can
already envisage cases where small variations in a schema
can lead to changes in message structure or content that can
render the message unacceptable to some recipient. But the
recipient won't know this has happened, as the namespaces
are unchanged, and there's nothing else in the message to
indicate the difference.
So the team codes _very_ defensively, and hope for the best.
Ideally I'd (the application) would like to know which
'version' of the message I've received, and then choose
whether or not to process it: and hopefully have a
better sense of which parts of the message are 'safe'
(consistent with the models I know), and which are potential
problems.
But that identifier can't be a simple version number, as a
single number can't easily identify all the ways a new
'version' may arise.
First thought: the version should be a 'version set' consisting
of the set of namespace URI / schema URI pairs relevant to
the namespaces in the message:
{nsURI, schemaURI}
I could then use a schemaLocation attribute to explicitly
include this information in messages, and use the _value_
as the version vector.
Advantages: uses existing mechanism to pass version information;
doesn't require centralized management of all the
schema files;
Problems: schemaLocation usage is not well specified; not
clear (to me) how to combine the referenced schema
files to validate the message data; also...
several on this list have suggested schemaLocation
is Not a Good Idea In The First Place ....
Second thought: Require definition of the message structure
via a master schema file that imports all the schemas relevant
to all namespaces in the message. The URI for this file becomes
a unique version identifier for the message type.
Advantages: seems simpler in some weird way
Problems: still coarse grained - recipient can't know which
schemas correspond to which namespace without
accessing the master schema; requires centralized
management (to create/assign/design 'master' schema
files); still need to use schemaLocation to pass
on schema URI
I'm leaning to the former approach, but am willing to be
convinced otherwise. So comments / criticisms / suggestions are
more than welcome!
Also (and unfortunately), I don't have mail/list access
from work (I'm posting from home) ... so please don't confuse
silence with consent ;-) -- I'll be back to follow up...
Ian
|