OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help



   RE: [xml-dev] XML document/message versioning -- possible model?

[ Lists Home | Date Index | Thread Index ]

On Sun, 29 Sep 2002, Dare Obasanjo wrote:

> Sorry to burst the bubble but versioning in XML applications is
> neither well understood nor a solved problem. My opinion on this is in
> the TAG archive[0] in a recent thread on the topic[1].
>  Your question seems to mix validating the document interchangeably
> with performing whatever processing you need to do on the document. If
> you use xsi:schemaLocation then there isn't a problem for validation
> since the XML instance tells you where its schema is located. As for
> whether to use schema locations as a versioning mechanism to tell if
> you can process the namespace, I'd suggest using version numbers
> instead. Version numbers are very useful because you can perform
> comparisons to tell if the schema revision is one your application
> doesn't know about.

I was kinda expecting bubble bursting, but thanks for the gentle letdown

I see what you mean about mixed roles. I do want each document to contain
xsi:schemaLocation so that each application knows precisely which
schema(s) apply. But I also want to have (the potential for) programmatic
logic in the applications that can make choices about what to do with the
document, based on the (schema) versions but independent of any formal
schema processing.

So as per your comment below, we'd already decided to add 'version-like'
information into the schema URIs (basically a timestamp) to make version
comparisons easier.

But, as you say, this overloads the meaning of schemaLocation .... 

On the other hand, it makes me uncomfortable to have both schema URIs and
version numbers, in a 1-1 relationship (which we'd need to track new
versions for each schema).  That just seems like unnecessary duplication,
and moreover just one more thing to go wrong (i.e., version numbers and
schemas getting out of synch). 

Thanks for the references -- I will follow up on the TAG discussions, and
follow-up tonight if anything else comes to mind.

>  For example, application A understands how to process documents with
> elements from the "http://www.example.org"; namespace in revisions 1.0,
> 1.1, 2.0 etc. up to version 3.0 of the schema. If elements from that
> namespace from revision 2.1 or revision 4.7 of the schema show up
> there is an easy way to know if they can be handled and the code is a
> lot simpler than switching on URI names.

>  Of course, you could structure your schema locations in a manner that
> allows such comparisons (e.g.
> "http://www.example.org/2002/09/29/schema.xsd
> <http://www.example.org/2002/09/29/schema.xsd> " ) and you'd get
> similar functionality although you would be overloading the meaning of
> the xsi:schemaLocation attribute which is unwise and should very well
> documented in your business logic and to all partners involved.
> [0] http://lists.w3.org/Archives/Public/www-tag/2002Sep/0092.html
> [1] http://lists.w3.org/Archives/Public/www-tag/2002Sep/0082.html
> 	-----Original Message----- 
> 	From: Ian Graham [mailto:igraham@ic-unix.ic.utoronto.ca] 
> 	Sent: Sun 9/29/2002 1:45 PM 
> 	To: xml-dev@lists.xml.org 
> 	Cc: 
> 	Subject: [xml-dev] XML document/message versioning -- possible model?
> 	I have a question about best practices for version identification
> 	in XML documents/messages. I'll start by explaining the situation
> 	I'm trying to fathom, and will finish off with my tentative thoughts
> 	... and then hope discussion here will help me understand this
> 	better. Hopefully this issue is well understood (and has been
> 	solved long ago), in which case I can happily take that
> 	model, and move on.
> 	I have a bunch of applications exchanging XML messages, the
> 	messages employing multiple namespaces.  The models for
> 	each namespace are, to some degree, independently developed,
> 	and are formally defined defined using XML Schema. As each
> 	namespace 'module' evolves, changes in the module will be
> 	reflected in new, updated versions of the associated
> 	schema, to be archived at (and accesible from) unique,
> 	well-defined URIs.
> 	The namespace URIs themselves will only change when there are
> 	'substantial' semantic changes to the module (...intentially
> 	avoiding discussion of what 'substantial' means...)
> 	When an application receives a message, it needs to determine
> 	if the message can be processed, and how.  The first cut is
> 	to ask: "Do I know the namespaces?" If I do, then I continue.
> 	If I don't, then I ignore the unknown namespaces, and have
> 	relatively straightforward rules for determining if and how
> 	I can proceed.
> 	However, that's a very coarse level of selection. We can
> 	already envisage cases where small variations in a schema
> 	can lead to changes in message structure or content that can
> 	render the message unacceptable to some recipient. But the
> 	recipient won't know this has happened, as the namespaces
> 	are unchanged, and there's nothing else in the message to
> 	indicate the difference.
> 	So the team codes _very_ defensively, and hope for the best.
> 	Ideally I'd (the application) would like to know which
> 	'version' of the message I've received, and then choose
> 	whether or not to process it: and hopefully have a
> 	better sense of which parts of the message are 'safe'
> 	(consistent with the models I know), and which are potential
> 	problems.
> 	But that identifier can't be a simple version number, as a
> 	single number can't easily identify all the ways a new
> 	'version' may arise.
> 	First thought: the version should be a 'version set' consisting
> 	of the set of namespace URI / schema URI pairs relevant to
> 	the namespaces in the message:
> 	  {nsURI, schemaURI}
> 	I could then use a schemaLocation attribute to explicitly
> 	include this information in messages, and use the _value_
> 	as the version vector.
> 	Advantages: uses existing mechanism to pass version information;
> 	            doesn't require centralized management of all the
> 	            schema files;
> 	Problems:   schemaLocation usage is not well specified; not
> 	            clear (to me) how to combine the referenced schema
> 	            files to validate the message data;  also...
> 	            several on this list have suggested schemaLocation
> 	            is Not a Good Idea In The First Place ....
> 	Second thought:  Require definition of the message structure
> 	via a master schema file that imports all the schemas relevant
> 	to all namespaces in the message. The URI for this file becomes
> 	a unique version identifier for the message type.
> 	Advantages: seems simpler in some weird way
> 	Problems:   still coarse grained - recipient can't know which
> 	            schemas correspond to which namespace without
> 	            accessing the master schema; requires centralized
> 	            management (to create/assign/design 'master' schema
> 	            files); still need to use schemaLocation to pass
> 	            on schema URI
> 	I'm leaning to the former approach,  but am willing to be
> 	convinced otherwise.  So comments / criticisms / suggestions are
> 	more than welcome!
> 	Also (and unfortunately), I don't have mail/list access
> 	from work (I'm posting from home) ... so please don't confuse
> 	silence with consent ;-) -- I'll be back to follow up...
> 	Ian


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS