In my discussions internally at my organization, I try to couch the
versioning discussion as an impact discussion. That is, I ask the question: "If we change something to a new version, what impact will this change have?" Impact can be things like code rework, degree of regression testing, etc. The goal, of course, is to tune the change minimize the impact . That proves to be hard unless you control or know how consumers use your XML data, schema, SOAP message, or whatever. So I go with Greg's comments, because when you pop up a level of abstraction, this also applies to business processes.. Truth be told, no matter how idiot- or future-proof you make your design, there is always a suitably qualified idiot who can do something you didn't expect. Ian Costello, Roger L. wrote: Hi Folks, Thanks for your excellent insights into the creation of a data versioning strategy! I am still in the process of assimilating all of your ideas. The discussion has given me a glimpse into the immensity and complexity of the "versioning strategy problem." To help me cope with all the information, I have focused on a few comments that were made. A FEW SELECT COMMENTS Greg Hunt challenges us to think in terms of managing change as part of a "business process":I think that you need to look at some other things, semantics, structure and syntax are at too low a level because useful version management needs to be embedded in a business process or a set of business agreements.Greg notes that a change may not cause syntax problems or semantic problem, but may nonetheless cause problems:A semantically non-breaking change for one class of consumer might present problems for another. Consider a statistical data flow with a number of elements in it that are not summed (e.g. a structure containing a count of heart attacks, count of ambulance movements and a textual status report). On the face of it, in semantic terms adding another statistical element for morbidity should not be a problem if the element can be ignored. However, someone out there will eventually try to count instances of morbidity statistics.Bruce Cox challenges us to create a change management strategy that makes no assumptions about the consumers of the data:We cannot even dream of placing any constraints on the consumers ofthe data. CLARITY SOUGHT What does this mean: "The version management needs to be embedded in a business process"? What does it mean: "Avoid placing constraints on consumers of the data"? Can we view an example of: "A semantically non-breaking change for one class of consumer might present problems for another"? EXAMPLE Let's take an example to illustrate the ideas that Greg and Bruce are raising. Suppose that the Center for Disease Control (CDC) makes available data about deaths in the U.S. Here is sample data: VERSION 1 DATA <deaths year="2004" source="http://www.cdc.gov/nchs/fastats/lcod.htm"> <heart-disease>652486</heart-disease> <cancer>553888</cancer> <stroke>150074</stroke> <chronic-lower-respitory-diseases>121987</chronic-lower-respitory-disea ses> <accidents>112012</accidents> <diabetes>73138</diabetes> <alzheimers>65965</alzheimers> <influenza-and-pneumonia>59664</influenza-and-pneumonia> <nephritis-and-nephrotic-syndrome-and-nephrosis>42480</nephritis-and-ne phrotic-syndrome-and-nephrosis> </deaths> The data conforms to an XML Schema that the CDC created [see the schema below]. Further, the CDC has documented the meaning of each piece of data. [The document defines, for example, what is meant by "the number of deaths due to accidents"] Consumers of the CDC data happily use it. Later, the CDC updates to also provide information on "the number of deaths due to septicemia." Here is a sample of the updated data: VERSION 2 DATA <deaths year="2004" source="http://www.cdc.gov/nchs/fastats/lcod.htm"> <heart-disease>652486</heart-disease> <cancer>553888</cancer> <stroke>150074</stroke> <chronic-lower-respitory-diseases>121987</chronic-lower-respitory-disea ses> <accidents>112012</accidents> <diabetes>73138</diabetes> <alzheimers>65965</alzheimers> <influenza-and-pneumonia>59664</influenza-and-pneumonia> <nephritis-and-nephrotic-syndrome-and-nephrosis>42480</nephritis-and-ne phrotic-syndrome-and-nephrosis> <septicemia>33373</septicemia> </deaths> This data conforms to the CDC's updated XML schema, which now includes a declaration of the <septicemia> element [see updated schema below]. The document containing the meaning of each piece of data is also updated to define what is meant by "the number of deaths due to septicemia." BREAKAGE? What will break as a result of the CDC adding the data on septicemia? VALIDATE NEW DATA AGAINST OLD SCHEMA Validation of the new data against the old XML Schema will result in validation errors. AVERAGE NEW DATA AGAINST OLD COUNT OF DEATH CAUSES In the version 1 data there are nine causes of death listed (heart-disease, cancer, stroke, etc). An application which computes the average number of deaths per cause by summing all the values and dividing by nine will produce an incorrect answer with the new data. UNANTICIPATED PROBLEMS We cannot anticipate or control what consumers of the data do with the data or how they write their applications. The new data could cause problems that we cannot anticipate. LESSONS LEARNED? 1. Greg challenges us to think in terms of managing change as part of a "business process." What does this mean for the CDC example? For example, should the CDC post a "usage rules" to any consumers of its data such as: --> Do not validate the data --> Anticipate new data will be added 2. Bruce challenges us to create a change management strategy that makes no assumptions about the consumers of the data. What does this mean for the CDC, which wants to add data about the number of deaths due to septicemia? Can the CDC meet the challenge by simply setting up two URLs, one for the old version and one for the new version? /Roger ---------------------------------------------- CDC VERSION 1 SCHEMA <?xml version="1.0"?> <schema xmlns="http://www.w3.org/2001/XMLSchema" elementFormDefault="qualified"> <element name="deaths"> <complexType> <sequence> <element name="heart-disease" type="unsignedInt"/> <element name="cancer" type="unsignedInt"/> <element name="stroke" type="unsignedInt"/> <element name="chronic-lower-respitory-diseases" type="unsignedInt"/> <element name="accidents" type="unsignedInt"/> <element name="diabetes" type="unsignedInt"/> <element name="alzheimers" type="unsignedInt"/> <element name="influenza-and-pneumonia" type="unsignedInt"/> <element name="nephritis-and-nephrotic-syndrome-and-nephrosis" type="unsignedInt"/> </sequence> <attribute name="year" type="gYear"/> <attribute name="source" type="anyURI"/> </complexType> </element> </schema> CDC VERSION 2 SCHEMA <?xml version="1.0"?> <schema xmlns="http://www.w3.org/2001/XMLSchema" elementFormDefault="qualified"> <element name="deaths"> <complexType> <sequence> <element name="heart-disease" type="unsignedInt"/> <element name="cancer" type="unsignedInt"/> <element name="stroke" type="unsignedInt"/> <element name="chronic-lower-respitory-diseases" type="unsignedInt"/> <element name="accidents" type="unsignedInt"/> <element name="diabetes" type="unsignedInt"/> <element name="alzheimers" type="unsignedInt"/> <element name="influenza-and-pneumonia" type="unsignedInt"/> <element name="nephritis-and-nephrotic-syndrome-and-nephrosis" type="unsignedInt"/> <element name="septicemia" type="unsignedInt"/> </sequence> <attribute name="year" type="gYear"/> <attribute name="source" type="anyURI"/> </complexType> </element> </schema> _______________________________________________________________________ XML-DEV is a publicly archived, unmoderated list hosted by OASIS to support XML implementation and development. To minimize spam in the archives, you must subscribe before posting. [Un]Subscribe/change address: http://www.oasis-open.org/mlmanage/ Or unsubscribe: xml-dev-unsubscribe@lists.xml.org subscribe: xml-dev-subscribe@lists.xml.org List archive: http://lists.xml.org/archives/xml-dev/ List Guidelines: http://www.oasis-open.org/maillists/guidelines.php -- Ian Graham // <http://www.iangraham.org> |