[
Lists Home |
Date Index |
Thread Index
]
I once sat through a presentation where someone pointed out that the worst security flaws typically are caused by situtions where code [or objects/components] is reused outside of its original purpose or design guidelines. Trying to make technology that is immune to incompetence is a loosing battle because the Universe is one step ahead of you creating the one person who will outwit your fool-proof design. The fact that your attempts at foolproofing involve W3C XML Schema already assures me they aren't. As for your responses
a.) The only difference between using attributes from a special processing namespace and PIs is that the attributes can be validated while the PIs cannot. Everything else you mentioned is a red herring.
b.) Given the complexity of W3C XML Schema and the lack of interoperability between implementations, I tend to avoid solutions that involve having people author more schemas than they have to.
c.) Here's one attempt to provide a schema for the no-brainer approach of wrapping the customer data in an element with semantic meaning
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:cust="urn:xmlns:25hoursaday-com:customer"
targetNamespace="urn:xmlns:25hoursaday-com:customer"
elementFormDefault="qualified">
<xs:complexType name="CustomerActionBase" abstract="true">
<xs:sequence>
<xs:element ref="cust:Customer" maxOccurs="unbounded" />
</xs:sequence>
</xs:complexType>
<xs:element name="Customer" type="cust:CustomerType" />
<xs:complexType name="CustomerType" >
<xs:sequence>
<xs:element ref="cust:FirstName" />
<xs:element ref="cust:LastName" />
<xs:element ref="cust:PhoneNumber" />
<!-- more stuff goes here like address, etc -->
</xs:sequence>
<xs:attribute name="customerID" type="xs:integer" />
</xs:complexType>
<xs:element name="FirstName" type="xs:string" />
<xs:element name="LastName" type="xs:string" />
<xs:element name="PhoneNumber" type="xs:string" />
<!-- more stuff goes here -->
</xs:schema>
The CustomerActionBase is an abstract type which means that the various departments can make specific elements referring to what actions to perform on a customer such as loans:collect-loan, loans:send-refund, loans:foreclose, etc and can all share the same definition of customer data.
d.) I got your point, I just didn't agree with it. Let's roll this a notch back and look at relational databases since this is where most likely the customer information being stored will come from in the first place. Now it is quite possible that many thousands of departments in hundreds of businesses have a database with the following schema
create table foo{
id integer,
fname varchar(50) not null,
lname varchar(50) not null,
-- various fields including address, phone number, etc
primary key(id)};
Your protestations about XML Schema being relevant to providing semantics is like saying that sending the results of select queries prepended with the table name is enough to provide all the information necessary to process the data at any endpoint. For instance, imagine the above table was named "students" and was sent via EDI, SOAP, whatever to the financial aid department of a college and was now sitting in a processing queue. Does the fact that the name, ID, etc came from a table containing student information tell the financial aid department whether this message is directing them to disburse the student's loan, cut a refund check, or defer the student's payment obligations?
XML Schema are about syntax not about semantics. Thinking otherwise is folly and will lead to creating fundamentally flawed architecture.
PS: XML-DEVers, Clemens has indicated to me offlist that he cannot send mail to the list directly since he isn't subscribed but would like me to provide the following link to give more context to the original weblog post that started this thread
<http://radio.weblogs.com/0108971/stories/2002/07/23/stayingSaneInAnXmlWe> http://radio.weblogs.com/0108971/stories/2002/07/23/stayingSaneInAnXmlWe <http://radio.weblogs.com/0108971/stories/2002/07/23/stayingSaneInAnXmlWebServicesWorld.html> bServicesWorld.html
-----Original Message-----
From: Clemens Vasters [mailto:clemensv@newtelligence.com]
Sent: Sat 7/27/2002 11:16 PM
To: Dare Obasanjo; xml-dev@lists.xml.org
Cc: rubys@us.ibm.com
Subject: RE: On Schemas, Namespaces and Syntax vs. Semantics (long but worth it) :)
In fact, "describing an incompetent organization" is my whole point.
What I am saying is that since the folks in that other department didn't
think long enough and had no rule written up somewhere that forced them
to make the schema their own (because they clearly used the schema for
another purpose than designed), the whole situation did happen.
I've worked quite a few years on projects that deal with this exact
business environment and you wouldn't believe how many terms' meanings
are entirely reversed from "giving credit" to "reclaiming bad credit".
So, the customer is a bit of a broad example. And, quote me fully:
"[Author's note: The "customer" is an illustrative example to carry my
point, but there are other, typically much simpler schemas, where such
problems are even more likely to occur in the wild]"
As to your proposals:
(a) Processing instructions are, in my view, things that only a
low-level technical infrastructure should ever worry about. If you want
processing hints, create an additional schema/namespace and use
attributes from that schema
(b) xsd:any, xsi:type are things I like there.
(c) How do you express that in schema properly? When you have expressed
it in schema, how does it make sense?
(d) You likely didn't get my point. Both organizations MUST HAVE their
own schema, because for both organisations, the "customer" is a wholly
different thing. A customer for /corporate credit/ is called a /debtor/
at financial collections and submitted by their type of /customer/. The
solution with two different schemas tell you exactly what to do with
them , because you need to treat them differently from the start;
otherwise your system will not understand....
Same Ruby's blog can be found here: http://radio.weblogs.com/0101679/
Mine can be found here: http://radio.weblogs.com/0108971/
Best Regards,
Clemens
---------------------
Clemens F. Vasters
CTO, newtelligence AG
Gilleshütte 99
D-41352 Korschenbroich
Germany
MSDN Regional Director
clemensv@newtelligence.com
v-clevas@microsoft.com
-----Original Message-----
From: Dare Obasanjo [mailto:dareo@microsoft.com]
Sent: Sunday, July 28, 2002 04:21
To: xml-dev@lists.xml.org
Cc: clemensv@newtelligence.com; rubys@us.ibm.com
Subject: On Schemas, Namespaces and Syntax vs. Semantics (long but worth
it) :)
A recent discussion on XML-DEV[0] highlighted the fact that there is
some disagreement amongst practitioners of XML as to whether XML schemas
and XML namespaces are meant as a means of expressing syntax or both
syntax and semantics which is excarberated by the fact that the W3C
recommendations describing both of the aforementioned technologies do
not offer any guidance.
My personal and professional opinion is that neither namespaces nor XML
schema offer enough capabilities to specify or even imply semantic
differences to any degree beyond the trivial. I recently came across a
post by Clemens Vasters[1] which helped condense for me why exactly I am
wary of using either of these mechanisms for specifying semantic
differences. Below is a condensation excerpted from his post
"Assume you do work for a bank. The bank has, of course, multiple
departments. The first department is corporate loans. They create an XML
Schema to exchange customer information about between the main branch
and their satellite branches. The schema only covers the base customer
information (name, address, primary contacts, etc.).
'Fine! This Schema fits our need just well to exchange all base
information about our customers with our branches, as well' says the
financial collections division,
Note that the customer term means two different things here. A customer
of corporate loans doesn't meet two consecutive payment deadlines. So,
the corporate loans department hands over their customer data as a SOAP
wrapped document to the "EDI mailbox" of the financial collections
business where it is queued for processing. At the same time, the usual
swapping of customer data happens between the branches of the financial
collections business and get queued for processing in that inbound
queue. Now, the customer in the sense of corporate loans is a debtor for
financial collections and corporate finance itself is a customer in the
sense of financial collections.
Now we have two documents with the same namespace but entirely different
business semantics stuck in the same queue. What to do?"
Sam Ruby offered the following
"As to your example, there is no question that two 'customer' entites
that are semantically different should be modelled separately. "
The funny thing is that I completely disagree with both authors. I
disagree with Clemens Vaster for actually claiming that he has posed a
problem since what he actually described is an incompetent organization
and not a business case that highlights any failings of XML
technologies. I disagree with Sam Ruby for a.) implying that using
namespaces and/or schemas is data modelling and b.) that this is a data
modelling problem.
I thought of three possible solutions to this "problem" while reading
Vaster's weblog and I'm sure XML-DEVers could think of more. The purpose
of posting them to such a wide audience is to dispel the misconception
amongst users of XML (especially the web service folks) that XML
technologies are a magic pixie dust that can transform poor business
processes and bad design decisions into a smooth oiled machine.
Sending an XML document containg raw customer data without any extra
semantic information is as useless as sending a fax without a cover
sheet. Sam Ruby's suggestion [which implies using namespaces to
dissambiguate] is akin to suggesting that instead of using coversheets
all faxes should have be identified by appending the receiver's email
address/office number/name to the end of each document. In most cases
this will be sufficient but in any that require additional semantic
information there will be no way to convey this. However with a
coversheet one can scribble any extra instructions and explanation in
the section provided. With my analogy done here are some [of many]
potential solutions to this "problem"
a.) PI solution: If some executive fiat makes it mandatory that only raw
customer data must sent which conforms to the shared schema then
processing instructions can be embedded in the SOAP message which then
indicate to the final receiver of the message what to do with each
customer.
b.) Extensible schema solution: Again, if the same executive fiat
applies then the original customer data schema is written in such a way
as to allow for W3C XML schema extensibility (wildcards, xsi:type,
substitution groups, abstract elements and types, redefinition etc) and
each customer although conforming to the schema has enough identifying
information that it is clear how the data should be processed.
c.) No brainer solution: Place the <customer> element information item
within other XMl elements which indicate what semantics to use. E.g.
loan defaulters are sent as <loan-defaulter> element information item
with a <customer> element information item.
For completeness I should add the
d.) Multiple schema solution: This is Sam Ruby's solution and it
involves each division creating a schema for their customer data
[although some repetition can be avoided by using a chamaleon schema]
then each customer element information item is processed based on its
namespace name. This solution however doesn't tell us what to do if one
division may send customers to the collections department for different
reasons which impact how they are to be treated.
I prefer my solutions to Sam Ruby's because they clearly emphasize the
"it's just data" aspect of XML instead of embedding accidental semantics
where one wants only syntax and data.
[0] The thread entitled "Schema Namespace name, schemaLocation, and
Schema Versioning" started by Mark Feblowitz earlier this month
|