Re: [xml-dev] Caution using XML Schema backward- or forward-compatibilit

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]

Re: [xml-dev] Caution using XML Schema backward- or forward-compatibilityas a versioning strategy for data exchange

From: noah_mendelsohn@us.ibm.com
To: "Fraser Goffin" <goffinf@googlemail.com>
Date: Thu, 3 Jan 2008 17:32:36 -0500

Fraser Goffin writes:

> yes I agree that structural validation is important, and I 
> further agree that the various checks that are made on the data
> are cummulative and go to the heart of data integrity.

I think there are some further nuances worth setting out.  In most of this 
discussion, there has been an implicit assumption that the schema 
validation languages are not Turing Complete [1].  For those unfamiliar 
with the term, what I mean is that languages like XSD or RelaxNG aren't 
powerful enough to compute all the things you can with languages like C, 
Java, or Cobol.  For example, you can't compute all the prime numbers  in 
XSD or RelaxNG, so you can't in practice write a schema type that would 
validate only prime integers as the content of some element.  If your 
schema language was, say, Java then you could write a schema to make sure 
that your XML element contained a prime number, and for a mathematician 
that would be a very sensible check to attempt.  There are, of course, 
good reasons for not using Turing Complete languages as our main schema 
languages.  One obvious one is that programs in Turing complete languages 
don't necessarily execute in bounded time.  You can always check an XML 
instance against and XSD or RelaxNG schema in bounded time, and usually 
quite quickly.  Most of our schema languages also handle the simple cases, 
such as looking for a fixed sequence of elements, very easily. Incidently, 
all the Turing Complete languages like C and Java have the same 
computational power:  if you can compute prime numbers in one, you can do 
it in all the others.

Anyway, I'd say there are at least four shades of grey to consider:

* Content validation that can be implemented in your schema language (the 
element name is legal, and the content is an integer)
* Content validation that your schema language can't handle (the number is 
prime)
* Business validation (that looks like a credit card number, but our 
records show that the card was stolen, so it's not "valid" for use in a 
purchasing transaction)
* Semantic incompatibility (we used to use the field for an account 
number, but in Version 2 of the language it identifies a particular credit 
card)

BTW: I know I've sent this link from time to time before, but if you're 
interested in the tradeoffs between using powerful vs. less powerful 
languages, Tim BL did a very nice analysis, and I helped him edit it as a 
TAG finding last year.  It's at [2]. 

Noah

[1] http://en.wikipedia.org/wiki/Turing_complete
[2] http://www.w3.org/2001/tag/doc/leastPower.html

--------------------------------------
Noah Mendelsohn 
IBM Corporation
One Rogers Street
Cambridge, MA 02142
1-617-693-4036
--------------------------------------

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]