OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help



   On Schemas, Namespaces and Syntax vs. Semantics (long but worth it) :)

[ Lists Home | Date Index | Thread Index ]

A recent discussion on XML-DEV[0] highlighted the fact that there is some disagreement amongst practitioners of XML as to whether XML schemas and XML namespaces are meant as a means of expressing syntax or both syntax and semantics which is excarberated by the fact that the W3C recommendations describing both of the aforementioned technologies do not offer any guidance.
My personal and professional opinion is that neither namespaces nor XML schema offer enough capabilities to specify or even imply semantic differences to any degree beyond the trivial. I recently came across a post by Clemens Vasters[1] which helped condense for me why exactly I am wary of using either of these mechanisms for specifying semantic differences. Below is a condensation excerpted from his post
"Assume you do work for a bank. The bank has, of course, multiple departments. The first department is corporate loans. They create an XML Schema to exchange customer information about between the main branch and their satellite branches. The schema only covers the base customer information (name, address, primary contacts, etc.). 
'Fine! This Schema fits our need just well to exchange all base information about our customers with our branches, as well' says the financial collections division, 
Note that the customer term means two different things here. A customer of corporate loans doesn't meet two consecutive payment deadlines. So, the corporate loans department hands over their customer data as a SOAP wrapped document to the "EDI mailbox" of the financial collections business where it is queued for processing. At the same time, the usual swapping of customer data happens between the branches of the financial collections business and get queued for processing in that inbound queue. Now, the customer in the sense of corporate loans is a debtor for financial collections and corporate finance itself is a customer in the sense of financial collections. 
Now we have two documents with the same namespace but entirely different business semantics stuck in the same queue. What to do?"
Sam Ruby offered the following 
"As to your example, there is no question that two 'customer' entites that are semantically different should be modelled separately. "
The funny thing is that I completely disagree with both authors. I disagree with Clemens Vaster for actually claiming that he has posed a problem since what he actually described is an incompetent organization and not a business case that highlights any failings of XML technologies. I disagree with Sam Ruby for a.) implying that using namespaces and/or schemas is data modelling and b.) that this is a data modelling problem. 
I thought of three possible solutions to this "problem" while reading Vaster's weblog and I'm sure XML-DEVers could think of more. The purpose of posting them to such a wide audience is to dispel the misconception amongst users of XML (especially the web service folks) that XML technologies are a magic pixie dust that can transform poor business processes and bad design decisions into a smooth oiled machine. 
Sending an XML document containg raw customer data without any extra semantic information is as useless as sending a fax without a cover sheet. Sam Ruby's suggestion [which implies using namespaces to dissambiguate] is akin to suggesting that instead of using coversheets all faxes should have be identified by appending the receiver's email address/office number/name to the end of each document. In most cases this will be sufficient but in any that require additional semantic information there will be no way to convey this. However with a coversheet one can scribble any extra instructions and explanation in the section provided. With my analogy done here are some [of many] potential solutions to this "problem" 
a.) PI solution: If some executive fiat makes it mandatory that only raw customer data must sent which conforms to the shared schema then processing instructions can be embedded in the SOAP message which then indicate to the final receiver of the message what to do with each customer. 
b.) Extensible schema solution: Again, if the same executive fiat applies then the original customer data schema is written in such a way as to allow for W3C XML schema extensibility (wildcards,  xsi:type, substitution groups, abstract elements and types, redefinition etc) and each customer although conforming to the schema has enough identifying information that it is clear how the data should be processed. 
c.) No brainer solution: Place the <customer> element information item within other XMl elements which indicate what semantics to use. E.g. loan defaulters are sent as <loan-defaulter> element information item with a <customer> element information item.  
For completeness I should add the 
d.) Multiple schema solution: This is Sam Ruby's solution and it involves each division creating a schema for their customer data [although some repetition can be avoided by using a chamaleon schema] then each customer element information item is processed based on its namespace name. This solution however doesn't tell us what to do if one division may send customers to the collections department for different reasons which impact how they are to be treated.  
I prefer my solutions to Sam Ruby's because they clearly emphasize the "it's just data" aspect of XML instead of embedding accidental semantics where one wants only syntax and data. 
[0] The thread entitled "Schema Namespace name, schemaLocation, and Schema Versioning" started by Mark Feblowitz earlier this month 


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS