XML.orgXML.org
FOCUS AREAS |XML-DEV |XML.org DAILY NEWSLINK |REGISTRY |RESOURCES |ABOUT
OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]
RE: [xml-dev] Caution using XML Schema backward- or forward-compatibility as a versioning strategy for data exchange

Excellent discussion!

Michael has brought into the discussion a very useful idea: semantic
drift.  He asserts that it "happens naturally in the real world".  

I assert that it also occurs naturally and often in data versioning.

Here are two examples of semantic drift in data versioning:

EXAMPLE #1

Consider this simple XML document:

    <distance>100</distance>

In the v1 XML Schema the <distance> element is declared as follows:

    <element name="distance" type="nonNegativeInteger"/>

The data specification document defines distance as: 

    "Distance represents the length measurement from center of town."

In the v2 XML Schema there is no change to the declaration of the
<distance> element:

    <element name="distance" type="nonNegativeInteger"/>

However, the data specification document redefines distance: 

    "Distance represents the length measurement from the town line."

The we have an example of two versions that are "validation-compatible"
but "semantic incompatible."

The semantics of "distance" has drifted from v1 to v2.

EXAMPLE #2 

Consider the same simple XML document:

    <distance>100</distance>

In the v1 XML Schema it is declared differently:

    <element name="distance">
        <complexType>
            <simpleContent>
                <extension base="nonNegativeInteger">
                    <attribute name="units" fixed="miles"/>
                </extension>
            </simpleContent>
        </complexType>
    </element>

The <distance> element now has a "units" attribute which is fixed at
"miles."

The data specification document defines distance as: 

    "Distance represents the length measurement from center of town."

In the v2 XML Schema the declaration of the <distance> element is
modified; the units attribute is fixed at "kilometers":

    <element name="distance">
        <complexType>
            <simpleContent>
                <extension base="nonNegativeInteger">
                    <attribute name="units" fixed="kilometers"/>
                </extension>
            </simpleContent>
        </complexType>
    </element>

The data specification document is unchanged in its definition of
distance: 

    "Distance represents the length measurement from center of town."

Thus, we see a second example of two versions that are
"validation-compatible" but "semantic incompatible."

The semantics of "distance" has drifted from v1 to v2.

COMMENTS

1. I think that these examples illustrate two common changes in data.
Do you agree?

2. In the examples, the XML instance document:

    <distance>100</distance>

validates fine against both the v1 and v2 XML Schemas.  But if the
applications that process the XML instance aren't changed, then the
processing results may be incorrect.  

CAUTION

Just because an application can validate an XML instance document,
doesn't mean it can process the XML instance document.

QUESTION

Can you state in one sentence the fundamental lesson to be learned in
our discussion?

/Roger

 






-----Original Message-----
From: Michael Kay [mailto:mike@saxonica.com] 
Sent: Thursday, December 27, 2007 6:13 AM
To: 'Stephen Green'; Costello, Roger L.; xml-dev@lists.xml.org
Subject: RE: [xml-dev] Caution using XML Schema backward- or
forward-compatibility as a versioning strategy for data exchange

> e.g. because an element wasn't made optional it 
> cannot be removed and so there is a temptation to change its 
> semantics - to reuse it for something else rather than remove 
> it. 

Yes, "semantic drift" is a big problem and of course it happens even in
the
absence of schema change.

Semantic drift happens naturally in the real world, for example credit
card
numbers which once identified an account might start to identify a
specific
card with access to that account. It's not surprising that it happens,
because if a system is capable of meeting new requirements without
requiring
any software changes then people will use it creatively in new ways to
meet
those requirements. One of the challenges in designing schemas (or
database
integrity constraints) is knowing whether you should try to resist
semantic
drift as a menace to information integrity, or whether you should allow
your
system to ride the waves, thus increasing its flexibility and
longevity. 

System designers often underestimate the creativity of users in
applying
semantic overloading to data structures. I saw one system where users
were
marking certain records for review the following day, simply by
entering a
particular code that was known to be invalid and would therefore appear
in
tomorrow's validation report. The system designers helpfully introduced
stronger validation at data-entry time, and chaos ensued because the
users
had to invent a new process.

Michael Kay
http://www.saxonica.com/



[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 1993-2007 XML.org. This site is hosted by OASIS