[
Lists Home |
Date Index |
Thread Index
]
Hi Folks,
Thanks again for all your input. It has been most enlightening!
I have taken all your comments, assimilated them, and I believe that I
have found a new perspective on this issue that may alter things quite a
bit. Before I present this "new perspective" it is important that I
recap what was discussed.
RECAP
Issue: What are the defining characteristics of data that is "suitable"
for being interchanged? Conversely, what are the defining
characteristics of data that is "not suitable" for being interchanged?
To provide a basis for discussion, I present two examples:
1. AIRCRAFT EXAMPLE: onboard an aircraft is a system that periodically
transmits to a ground station some information. Included in this
information is:
- Distance to Destination Airport
- Distance to Navigation Aid
- Distance to Emergency Airport
2. FINANCIAL INSTITUTION EXAMPLE: a financial institution provides
recommendations to its clients on what to do with their stock shares.
The information sent to its clients is a recommendation:
- Buy, Sell, Hold
Let's look at each of these examples.
AIRCRAFT EXAMPLE
What data should be sent to the ground station? That is, what data is
"suitable" for interchange? Your first instinct might be to send
"Distance-to-X" data. For example:
<distance to="destination airport">590</distance>
<distance to="navigation aid">140</distance>
<distance to="emergency airport">75</distance>
Let's consider the advantages and disadvantages of interchanging
distance data.
Advantages:
- The sender and receiver will have the same value for the
distance. There is no ambiguity due to miscalculation.
- If we think of the aircraft as a "service" then it is
providing a service by taking its raw position data and
calculating the distance. A client may not have the ability
to compute the distance from raw data, so being able to receive
the distance data is valuable to the client.
Disadvantages:
- Distance data may be important to the client, but it has
limited utility. It discards much of the information
that recipients of the data could potentially want. For
example, it doesn't allow recipients to compute things like
heading, location relative to another aircraft, etc. Thus,
the client has paid for the service with a lack in
flexibility of the data.
Rather than interchanging Distance-to-X data the aircraft system could
send more fundamental data - position data. Let's consider the
advantages and disadvantages of interchanging position data.
Advantages:
- With position data a recipient could not only know the distance
of the aircraft, but could also determine the aircraft heading,
time to fly-over, relationship to other aircraft. Also, it could
plug the position data into other contexts, such as a map
application.
Disadvantages:
- It is possible that a recipient may calculate a distance that is
different than the distance the aircraft system calculates
(due to differing algorithms, rounding errors, etc). Thus,
there is the possibility for ambiguity and misinterpretation
between the sender and the receiver.
- The recipient may not have the computational power to take the
raw position data and calculate the distance.
- The aircraft "service" is not providing much service to
the client. Instead of the service doing value-add, the
recipient is expected to do value-add.
Some things to note from this example:
a. Where is "value-add" done? ... server-side or client-side?
- sending distance data implies doing value-add on the
server-side.
- sending position data implies doing value-add on the
client-side.
b. Value-adding on the server-side makes it easier on the client,
but restricts the utility of the data.
c. Value-adding on the client-side allows the data to be used in
more ways, but burdens the client with more work.
At this point, based upon the above example, I would like to introduce
some terminology:
DERIVED DATA: Distance represents data that is the output from doing a
calculation on raw, position data. The distance data is called derived
data.
FUNDAMENTAL DATA: Position is the raw, basic data. Position data is
called fundamental data.
FINANCIAL INSTITUTION EXAMPLE
A service that financial institutions provide is to give its clients
advice on when to buy, sell, or hold a stock. The algorithms used to
determine whether to buy, sell, or hold a stock is based upon some
fundamental data (plus other data not shown):
- performance of the stock over the past few months
- performance of the entire stock market
- performance of other stock markets
Typically financial institutions send to their clients just the
recommendation - "buy", or "sell", or "hold". The advantages of this
are pretty clear-cut:
Advantages:
- The factors that a financial institution takes into
account to generate its "recommendation" is proprietary.
It has no wish to reveal all the data that it uses, nor the
algorithm that it employs.
- Typically clients are of the mindset, "just tell me what to do".
Thus, the financial institution is providing them a quick, easy
service.
The alternative to sending the derived, recommendation data is for the
financial institution to send the fundamental data (the performances
data shown above). However, the disadvantages to this are also
clear-cut:
Disadvantages:
- A client typically does not have all the data necessary, nor
the algorithms necessary to generate a good recommendation.
In this example, the advantages of sending derived is overwhelming.
SUMMARY
When the data being interchanged is data from a Service then most likely
the data should "derived data". Otherwise, the "Service" is not
providing much of a "service".
...
A DIFFERENT PERSPECTIVE
The above examples focused on data interchange from the perspective of
the sender providing a "Service" to clients. Now I'd like to shift
perspective to this:
TWO SYSTEMS SHARING DATA
That is, System B needs data from System A. System A wishes to share
its data with System B.
Let's take the above two examples again. This time we will look at them
from the perspective of sharing data with another system.
AIRCRAFT EXAMPLE
Let's suppose that both System A and System B have requirements for
aircraft distance data. They would like to share the data:
System A <----------------> System B
distance
That is, if System A has aircraft distance data then it would like to be
able to share it with System B, or vice versa. (System
interoperability.)
Let's suppose that both systems are using XML to organize their data.
Amazingly, they both use the same element and attribute names! For the
above example, here's what the data looks like in both systems:
System A:
<distance to="distination airport">590</distance>
System B
<distance to="distination airport">545</distance>
The careful reader will have noted that, while the element and attribute
names are identical, the "data" differs slightly (590 vs 545). Let's
examine why this is the case.
System A keeps a record of the "line-of-sight distance" between the
aircraft and the ground station.
System B keeps a record of the "ground distance" between the aircraft
and the ground station.
Thus, the two systems have slightly differing "semantics" with regards
to the "distance".
How do the two systems "bridge the semantic gap"?
One approach would be for one system to "give in" and adopt the other
systems semantics. That is highly unlikely. Both systems will fight
like "cat and dogs" to keep their semantics. And there is good reason
for this - they have invested a lot of time and money to build
applications that process the data with those semantics.
With both systems the distance data is derived data - the data is
derived from the fundamental position data. If we recognize this then
lots of grief and arguing can be avoided by agreeing to interchange the
fundamental position data.
Thus, we bridge the semantic gap by simply side-stepping it!
FINANCIAL INSTITUTION EXAMPLE
Let's suppose that Institution A and Institution B are partners and both
have requirements to compute a recommendation (buy, sell, hold) for
their clients. They would like to be able to share their data:
Institution A <----------------> Institution B
what data?
What data should they share? One possibility is for them to share their
recommendation:
Institution A <----------------> Institution B
recommendation
Let's even suppose that both systems are using XML to organize their
data. Amazingly, they both use the same element name! Here's what
their data looks like:
Institution A:
<recommendation>Sell</recommendation>
Institution B
<recommendation>Hold</recommendation>
The careful reader will have noted that, while the element names are
identical, the "data" differs (Sell vs Hold). Let's examine why this is
the case.
Institution A uses a different algorithm for computing its
recommendation than does Institution B.
Thus, the two Institutions have differing "semantics" with regards to
the "recommendation".
How do the two Institutions "bridge the semantic gap"? One approach
would be for one Institution to "give in" and adopt the other systems
semantics (i.e., the other Institution's algorithm). That is highly
unlikely. Both Institutions will fight like "cat and dogs" to keep
their semantics. And there is good reason for this - they have invested
a lot of time and money to create their recommendation-producing
algorithms.
Both Institution's recommendation data is derived data - the data is
derived from the fundamental market performances data. If we recognize
this then lots of grief and arguing can be avoided by agreeing to
interchange the fundamental performances data:
Institution A <----------------> Institution B
performances data
SUMMARY
When the data being interchanged is for the purpose of "sharing data"
then it seems most prudent to interchange fundamental data. This will
enable you to gracefully avoid the "semantic mismatch" problem.
** GRAND SUMMARY **
We need to distinguish between:
- data interchange for the purpose of Services,
- data interchange for the purpose of sharing data.
For Service data interchange --> use derived data.
For data sharing --> use fundamental data.
QUESTIONS
1. Schema Implications: suppose that a System/Institution deploys a Web
service, and also shares data with a partner. Does it create two
schemas:
- one that describes the interchange data with Web service clients,
- one that describes the interchange data for data sharing.
Are two schemas needed?
Perhaps there should always be just one schema - that describes the
fundamental data. After all, derived data is "application generated
data". Here's my argument: Applications come and go, but fundamental
data endures. Thus, "application (derived) data" should not be part of
a schema. So, I will argue in favor of creating schemas just for
fundamental data. What do you think?
2. Data Sharing = Service? You can certainly think of data sharing as a
very simple form of Service. And yet, with the former it is best to
interchange fundamental data, whereas in the later it is best to
interchange derived data. At what point does "switchover" occur? That
is, at what point does it stop becoming a data sharing situation using
fundamental data, and becomes a Service using derived data?
I eagerly look forward to your thoughts and comments. /Roger
|