xml-dev - RE: On Schemas, Namespaces and Syntax vs. Semantics (long but worth it)

RE: On Schemas, Namespaces and Syntax vs. Semantics (long but worth it)
[ Lists Home | Date Index | Thread Index ]
To: "Clemens Vasters" <clemensv@newtelligence.com>,<xml-dev@lists.xml.org>
Subject: RE: On Schemas, Namespaces and Syntax vs. Semantics (long but worth it) :)
From: "Dare Obasanjo" <dareo@microsoft.com>
Date: Sun, 28 Jul 2002 02:53:17 -0700
Cc: <rubys@us.ibm.com>
Thread-index: AcI1/oKv+qSCisF6RgyC1dfumiLYowAFkm70
Thread-topic: On Schemas, Namespaces and Syntax vs. Semantics (long but worth it) :)
I once sat through a presentation where someone pointed out that the worst security flaws typically are caused by situtions where code [or objects/components] is reused outside of its original purpose or design guidelines. Trying to make technology that is immune to incompetence is a loosing battle because the Universe is one step ahead of you creating the one person who will outwit your fool-proof design. The fact that your attempts at foolproofing involve W3C XML Schema already assures me they aren't. As for your responses
 
a.) The only difference between using attributes from a special processing namespace and PIs is that the attributes can be validated while the PIs cannot. Everything else you mentioned is a red herring. 
 
b.) Given the complexity of W3C XML Schema and the lack of interoperability between implementations, I tend to avoid solutions that involve having people author more schemas than they have to. 
 
c.) Here's one attempt to provide a schema for the no-brainer approach of wrapping the customer data in an element with semantic meaning 
 
 <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"; 
          xmlns:cust="urn:xmlns:25hoursaday-com:customer" 
          targetNamespace="urn:xmlns:25hoursaday-com:customer" 
   elementFormDefault="qualified"> 
        
<xs:complexType name="CustomerActionBase" abstract="true">
  <xs:sequence>
   <xs:element ref="cust:Customer" maxOccurs="unbounded" />
  </xs:sequence>
 </xs:complexType>
 
       <xs:element name="Customer" type="cust:CustomerType" /> 
 
        <xs:complexType name="CustomerType" > 
  <xs:sequence>
    <xs:element ref="cust:FirstName" />
    <xs:element ref="cust:LastName" /> 
    <xs:element ref="cust:PhoneNumber" />   
    <!-- more stuff goes here like address, etc --> 
  </xs:sequence>         
  <xs:attribute name="customerID" type="xs:integer" />
 </xs:complexType>
 
       <xs:element name="FirstName" type="xs:string" />
       <xs:element name="LastName" type="xs:string"  />  
       <xs:element name="PhoneNumber" type="xs:string" />
       <!-- more stuff goes here --> 
      </xs:schema> 
 
The CustomerActionBase is an abstract type which means that the various departments can make specific elements referring to what actions to perform on a customer such as loans:collect-loan, loans:send-refund, loans:foreclose, etc and can all share the same definition of customer data. 
 
 
d.) I got your point, I just didn't agree with it. Let's roll this a notch back and look at relational databases since this is where most likely the customer information being stored will come from in the first place. Now it is quite possible that many thousands of departments in hundreds  of businesses have a database with the following schema 
 
create table foo{
 id integer, 
 fname varchar(50) not null, 
 lname varchar(50) not null,
 -- various fields including address, phone number, etc 
 primary key(id)}; 
 
 
Your protestations about XML Schema being relevant to providing semantics is like saying that sending the results of select queries prepended with the table name is enough to provide all the information necessary to process the data at any endpoint. For instance, imagine the above table was named "students" and was sent via EDI, SOAP, whatever to the financial aid department of a college and was now sitting in a processing queue. Does the fact that the name, ID, etc came from a table containing student information tell the financial aid department whether this message is directing them to disburse the student's loan, cut a refund check, or defer the student's payment obligations? 
 
XML Schema are about syntax not about semantics. Thinking otherwise is folly and will lead to creating fundamentally flawed architecture. 
 
PS: XML-DEVers, Clemens has indicated to me offlist that he cannot send mail to the list directly since he isn't subscribed but would like me to provide the following link to give more context to the original weblog post that started this thread 
 
<http://radio.weblogs.com/0108971/stories/2002/07/23/stayingSaneInAnXmlWe> http://radio.weblogs.com/0108971/stories/2002/07/23/stayingSaneInAnXmlWe <http://radio.weblogs.com/0108971/stories/2002/07/23/stayingSaneInAnXmlWebServicesWorld.html> bServicesWorld.html
 
 
-----Original Message----- 
From: Clemens Vasters [mailto:clemensv@newtelligence.com] 
Sent: Sat 7/27/2002 11:16 PM 
To: Dare Obasanjo; xml-dev@lists.xml.org 
Cc: rubys@us.ibm.com 
Subject: RE: On Schemas, Namespaces and Syntax vs. Semantics (long but worth it) :)



	In fact, "describing an incompetent organization" is my whole point.
	
	What I am saying is that since the folks in that other department didn't
	think long enough and had no rule written up somewhere that forced them
	to make the schema their own (because they clearly used the schema for
	another purpose than designed), the whole situation did happen.
	
	I've worked quite a few years on projects that deal with this exact
	business environment and you wouldn't believe how many terms' meanings
	are entirely reversed from "giving credit" to "reclaiming bad credit".
	So, the customer is a bit of a broad example. And, quote me fully:
	"[Author's note: The "customer" is an illustrative example to carry my
	point, but there are other, typically much simpler schemas, where such
	problems are even more likely to occur in the wild]"
	
	As to your proposals:
	
	(a) Processing instructions are, in my view, things that only a
	low-level technical infrastructure should ever worry about. If you want
	processing hints, create an additional schema/namespace and use
	attributes from that schema
	(b) xsd:any, xsi:type are things I like there.
	(c) How do you express that in schema properly? When you have expressed
	it in schema, how does it make sense?
	
	(d) You likely didn't get my point. Both organizations MUST HAVE their
	own schema, because for both organisations, the "customer" is a wholly
	different thing. A customer for /corporate credit/ is called a /debtor/
	at financial collections and submitted by their type of /customer/. The
	solution with two different schemas tell you exactly what to do with
	them , because you need to treat them differently from the start;
	otherwise your system will not understand....
	
	
	Same Ruby's blog can be found here: http://radio.weblogs.com/0101679/
	Mine can be found here: http://radio.weblogs.com/0108971/
	
	
	Best Regards,
	Clemens
	
	---------------------
	Clemens F. Vasters
	CTO, newtelligence AG
	Gilleshütte 99
	D-41352 Korschenbroich
	Germany
	
	MSDN Regional Director
	clemensv@newtelligence.com
	v-clevas@microsoft.com
	
	-----Original Message-----
	From: Dare Obasanjo [mailto:dareo@microsoft.com]
	Sent: Sunday, July 28, 2002 04:21
	To: xml-dev@lists.xml.org
	Cc: clemensv@newtelligence.com; rubys@us.ibm.com
	Subject: On Schemas, Namespaces and Syntax vs. Semantics (long but worth
	it) :)
	
	
	A recent discussion on XML-DEV[0] highlighted the fact that there is
	some disagreement amongst practitioners of XML as to whether XML schemas
	and XML namespaces are meant as a means of expressing syntax or both
	syntax and semantics which is excarberated by the fact that the W3C
	recommendations describing both of the aforementioned technologies do
	not offer any guidance.
	
	My personal and professional opinion is that neither namespaces nor XML
	schema offer enough capabilities to specify or even imply semantic
	differences to any degree beyond the trivial. I recently came across a
	post by Clemens Vasters[1] which helped condense for me why exactly I am
	wary of using either of these mechanisms for specifying semantic
	differences. Below is a condensation excerpted from his post
	
	"Assume you do work for a bank. The bank has, of course, multiple
	departments. The first department is corporate loans. They create an XML
	Schema to exchange customer information about between the main branch
	and their satellite branches. The schema only covers the base customer
	information (name, address, primary contacts, etc.).
	
	'Fine! This Schema fits our need just well to exchange all base
	information about our customers with our branches, as well' says the
	financial collections division,
	
	Note that the customer term means two different things here. A customer
	of corporate loans doesn't meet two consecutive payment deadlines. So,
	the corporate loans department hands over their customer data as a SOAP
	wrapped document to the "EDI mailbox" of the financial collections
	business where it is queued for processing. At the same time, the usual
	swapping of customer data happens between the branches of the financial
	collections business and get queued for processing in that inbound
	queue. Now, the customer in the sense of corporate loans is a debtor for
	financial collections and corporate finance itself is a customer in the
	sense of financial collections.
	
	Now we have two documents with the same namespace but entirely different
	business semantics stuck in the same queue. What to do?"
	
	Sam Ruby offered the following
	"As to your example, there is no question that two 'customer' entites
	that are semantically different should be modelled separately. "
	
	The funny thing is that I completely disagree with both authors. I
	disagree with Clemens Vaster for actually claiming that he has posed a
	problem since what he actually described is an incompetent organization
	and not a business case that highlights any failings of XML
	technologies. I disagree with Sam Ruby for a.) implying that using
	namespaces and/or schemas is data modelling and b.) that this is a data
	modelling problem.
	
	I thought of three possible solutions to this "problem" while reading
	Vaster's weblog and I'm sure XML-DEVers could think of more. The purpose
	of posting them to such a wide audience is to dispel the misconception
	amongst users of XML (especially the web service folks) that XML
	technologies are a magic pixie dust that can transform poor business
	processes and bad design decisions into a smooth oiled machine.
	
	Sending an XML document containg raw customer data without any extra
	semantic information is as useless as sending a fax without a cover
	sheet. Sam Ruby's suggestion [which implies using namespaces to
	dissambiguate] is akin to suggesting that instead of using coversheets
	all faxes should have be identified by appending the receiver's email
	address/office number/name to the end of each document. In most cases
	this will be sufficient but in any that require additional semantic
	information there will be no way to convey this. However with a
	coversheet one can scribble any extra instructions and explanation in
	the section provided. With my analogy done here are some [of many]
	potential solutions to this "problem"
	
	a.) PI solution: If some executive fiat makes it mandatory that only raw
	customer data must sent which conforms to the shared schema then
	processing instructions can be embedded in the SOAP message which then
	indicate to the final receiver of the message what to do with each
	customer.
	
	b.) Extensible schema solution: Again, if the same executive fiat
	applies then the original customer data schema is written in such a way
	as to allow for W3C XML schema extensibility (wildcards,  xsi:type,
	substitution groups, abstract elements and types, redefinition etc) and
	each customer although conforming to the schema has enough identifying
	information that it is clear how the data should be processed.
	
	c.) No brainer solution: Place the <customer> element information item
	within other XMl elements which indicate what semantics to use. E.g.
	loan defaulters are sent as <loan-defaulter> element information item
	with a <customer> element information item. 
	
	For completeness I should add the
	
	d.) Multiple schema solution: This is Sam Ruby's solution and it
	involves each division creating a schema for their customer data
	[although some repetition can be avoided by using a chamaleon schema]
	then each customer element information item is processed based on its
	namespace name. This solution however doesn't tell us what to do if one
	division may send customers to the collections department for different
	reasons which impact how they are to be treated. 
	
	I prefer my solutions to Sam Ruby's because they clearly emphasize the
	"it's just data" aspect of XML instead of embedding accidental semantics
	where one wants only syntax and data.
	
	[0] The thread entitled "Schema Namespace name, schemaLocation, and
	Schema Versioning" started by Mark Feblowitz earlier this month
Follow-Ups:
- [xml-dev] RE: On Schemas, Namespaces and Syntax vs. Semantics(long but worth it) :)
  - From: Elliotte Rusty Harold <elharo@metalab.unc.edu>
Prev by Date: XPath and Java DOM APIs
Next by Date: RE: On Schemas, Namespaces and Syntax vs. Semantics (long but worth it) :)
Previous by thread: Re: [xml-dev] On Schemas, Namespaces and Syntax vs. Semantics (long but worth it) :)
Next by thread: [xml-dev] RE: On Schemas, Namespaces and Syntax vs. Semantics(long but worth it) :)
Index(es):
- Date
- Thread