xml-dev - Round 2: Identifying Data for Interchange

Round 2: Identifying Data for Interchange
[ Lists Home | Date Index | Thread Index ]
To: xml-dev@lists.xml.org
Subject: Round 2: Identifying Data for Interchange
From: "Roger L. Costello" <costello@mitre.org>
Date: Tue, 07 Jan 2003 11:28:00 -0500
Cc: "Costello,Roger L." <costello@mitre.org>
Organization: The MITRE Corporation
Hi Folks,

Thanks again for all your input.  It has been most enlightening!

I have taken all your comments, assimilated them, and I believe that I
have found a new perspective on this issue that may alter things quite a
bit.  Before I present this "new perspective" it is important that I
recap what was discussed.

RECAP

Issue: What are the defining characteristics of data that is "suitable"
for being interchanged?  Conversely, what are the defining
characteristics of data that is "not suitable" for being interchanged?

To provide a basis for discussion, I present two examples:

1. AIRCRAFT EXAMPLE: onboard an aircraft is a system that periodically
transmits to a ground station some information.  Included in this
information is:

     - Distance to Destination Airport
     - Distance to Navigation Aid
     - Distance to Emergency Airport

2. FINANCIAL INSTITUTION EXAMPLE: a financial institution provides
recommendations to its clients on what to do with their stock shares. 
The information sent to its clients is a recommendation:

      - Buy, Sell, Hold

Let's look at each of these examples.

AIRCRAFT EXAMPLE

What data should be sent to the ground station? That is, what data is
"suitable" for interchange?  Your first instinct might be to send
"Distance-to-X" data.  For example:

     <distance to="destination airport">590</distance>
     <distance to="navigation aid">140</distance>
     <distance to="emergency airport">75</distance>

Let's consider the advantages and disadvantages of interchanging
distance data.

Advantages:

   - The sender and receiver will have the same value for the 
     distance.  There is no ambiguity due to miscalculation.

   - If we think of the aircraft as a "service" then it is
     providing a service by taking its raw position data and 
     calculating the distance.  A client may not have the ability
     to compute the distance from raw data, so being able to receive
     the distance data is valuable to the client.

Disadvantages:

    - Distance data may be important to the client, but it has 
      limited utility.  It discards much of the information 
      that recipients of the data could potentially want. For 
      example, it doesn't allow recipients to compute things like 
      heading, location relative to another aircraft, etc.  Thus,
      the client has paid for the service with a lack in
      flexibility of the data.

Rather than interchanging Distance-to-X data the aircraft system could
send more fundamental data - position data.  Let's consider the
advantages and disadvantages of interchanging position data.

Advantages:

   - With position data a recipient could not only know the distance
     of the aircraft, but could also determine the aircraft heading,
     time to fly-over, relationship to other aircraft.  Also, it could
     plug the position data into other contexts, such as a map 
     application.

Disadvantages:

    - It is possible that a recipient may calculate a distance that is
      different than the distance the aircraft system calculates
      (due to differing algorithms, rounding errors, etc). Thus, 
      there is the possibility for ambiguity and misinterpretation
      between the sender and the receiver.

    - The recipient may not have the computational power to take the
      raw position data and calculate the distance.  

    - The aircraft "service" is not providing much service to 
      the client.  Instead of the service doing value-add, the 
      recipient is expected to do value-add.

Some things to note from this example:

a. Where is "value-add" done? ... server-side or client-side?

     - sending distance data implies doing value-add on the
       server-side.
     - sending position data implies doing value-add on the
       client-side.

b. Value-adding on the server-side makes it easier on the client,
   but restricts the utility of the data.

c. Value-adding on the client-side allows the data to be used in
   more ways, but burdens the client with more work.

At this point, based upon the above example, I would like to introduce
some terminology:

DERIVED DATA: Distance represents data that is the output from doing a
calculation on raw, position data.  The distance data is called derived
data.

FUNDAMENTAL DATA: Position is the raw, basic data.  Position data is
called fundamental data.

FINANCIAL INSTITUTION EXAMPLE

A service that financial institutions provide is to give its clients
advice on when to buy, sell, or hold a stock.  The algorithms used to
determine whether to buy, sell, or hold a stock is based upon some
fundamental data (plus other data not shown):

    - performance of the stock over the past few months
    - performance of the entire stock market
    - performance of other stock markets

Typically financial institutions send to their clients just the
recommendation - "buy", or "sell", or "hold".  The advantages of this
are pretty clear-cut:

Advantages:

    - The factors that a financial institution takes into 
      account to generate its "recommendation" is proprietary.
      It has no wish to reveal all the data that it uses, nor the
      algorithm that it employs.

    - Typically clients are of the mindset, "just tell me what to do".
      Thus, the financial institution is providing them a quick, easy
      service.

The alternative to sending the derived, recommendation data is for the
financial institution to send the fundamental data (the performances
data shown above).  However, the disadvantages to this are also
clear-cut:

Disadvantages:

    - A client typically does not have all the data necessary, nor
      the algorithms necessary to generate a good recommendation.

In this example, the advantages of sending derived is overwhelming.

SUMMARY

When the data being interchanged is data from a Service then most likely
the data should "derived data".  Otherwise, the "Service" is not
providing much of a "service".

...

A DIFFERENT PERSPECTIVE

The above examples focused on data interchange from the perspective of
the sender providing a "Service" to clients.  Now I'd like to shift
perspective to this:

    TWO SYSTEMS SHARING DATA

That is, System B needs data from System A.  System A wishes to share
its data with System B.  

Let's take the above two examples again.  This time we will look at them
from the perspective of sharing data with another system.

AIRCRAFT EXAMPLE

Let's suppose that both System A and System B have requirements for
aircraft distance data.  They would like to share the data:

     System A  <----------------> System B
                    distance

That is, if System A has aircraft distance data then it would like to be
able to share it with System B, or vice versa. (System
interoperability.)

Let's suppose that both systems are using XML to organize their data. 
Amazingly, they both use the same element and attribute names!  For the
above example, here's what the data looks like in both systems:

System A:

    <distance to="distination airport">590</distance>

System B

    <distance to="distination airport">545</distance>

The careful reader will have noted that, while the element and attribute
names are identical, the "data" differs slightly (590 vs 545).  Let's
examine why this is the case.

System A keeps a record of the "line-of-sight distance" between the
aircraft and the ground station.

System B keeps a record of the "ground distance" between the aircraft
and the ground station.

Thus, the two systems have slightly differing "semantics" with regards
to the "distance".

How do the two systems "bridge the semantic gap"?  

One approach would be for one system to "give in" and adopt the other
systems semantics.  That is highly unlikely.  Both systems will fight
like "cat and dogs" to keep their semantics.  And there is good reason
for this - they have invested a lot of time and money to build
applications that process the data with those semantics.

With both systems the distance data is derived data - the data is
derived from the fundamental position data.  If we recognize this then
lots of grief and arguing can be avoided by agreeing to interchange the
fundamental position data.

Thus, we bridge the semantic gap by simply side-stepping it!

FINANCIAL INSTITUTION EXAMPLE

Let's suppose that Institution A and Institution B are partners and both
have requirements to compute a recommendation (buy, sell, hold) for
their clients.  They would like to be able to share their data:

     Institution A  <----------------> Institution B
                        what data?

What data should they share?  One possibility is for them to share their
recommendation:

     Institution A  <----------------> Institution B
                      recommendation

Let's even suppose that both systems are using XML to organize their
data.  Amazingly, they both use the same element name!  Here's what
their data looks like:

Institution A:

    <recommendation>Sell</recommendation>

Institution B

    <recommendation>Hold</recommendation>

The careful reader will have noted that, while the element names are
identical, the "data" differs (Sell vs Hold).  Let's examine why this is
the case.

Institution A uses a different algorithm for computing its
recommendation than does Institution B.

Thus, the two Institutions have differing "semantics" with regards to
the "recommendation".

How do the two Institutions "bridge the semantic gap"?  One approach
would be for one Institution to "give in" and adopt the other systems
semantics (i.e., the other Institution's algorithm).  That is highly
unlikely.  Both Institutions will fight like "cat and dogs" to keep
their semantics.  And there is good reason for this - they have invested
a lot of time and money to create their recommendation-producing
algorithms.

Both Institution's recommendation data is derived data - the data is
derived from the fundamental market performances data.  If we recognize
this then lots of grief and arguing can be avoided by agreeing to
interchange the fundamental performances data:

     Institution A  <----------------> Institution B
                    performances data

SUMMARY

When the data being interchanged is for the purpose of "sharing data"
then it seems most prudent to interchange fundamental data.  This will
enable you to gracefully avoid the "semantic mismatch" problem.

** GRAND SUMMARY **

We need to distinguish between: 

    - data interchange for the purpose of Services, 
    - data interchange for the purpose of sharing data.

For Service data interchange --> use derived data.
For data sharing --> use fundamental data.

QUESTIONS

1. Schema Implications: suppose that a System/Institution deploys a Web
service, and also shares data with a partner.  Does it create two
schemas:

    - one that describes the interchange data with Web service clients,
    - one that describes the interchange data for data sharing.

Are two schemas needed?

Perhaps there should always be just one schema - that describes the
fundamental data.  After all, derived data is "application generated
data". Here's my argument: Applications come and go, but fundamental
data endures.  Thus, "application (derived) data" should not be part of
a schema. So, I will argue in favor of creating schemas just for
fundamental data.  What do you think?

2. Data Sharing = Service?  You can certainly think of data sharing as a
very simple form of Service.   And yet, with the former it is best to
interchange fundamental data, whereas in the later it is best to
interchange derived data.  At what point does "switchover" occur?  That
is, at what point does it stop becoming a data sharing situation using
fundamental data, and becomes a Service using derived data?

I eagerly look forward to your thoughts and comments.  /Roger
Follow-Ups:
- Re: [xml-dev] Round 2: Identifying Data for Interchange
  - From: Don Bate <donbate@iadfw.net>
- Re: [xml-dev] Round 2: Identifying Data for Interchange
  - From: "Chiusano Joseph" <chiusano_joseph@bah.com>
Prev by Date: RE: [xml-dev] SkunkLink: a skunkworks XML linking proposal
Next by Date: Re: [xml-dev] Round 2: Identifying Data for Interchange
Previous by thread: XSL INCLUDE PROBLEM
Next by thread: Re: [xml-dev] Round 2: Identifying Data for Interchange
Index(es):
- Date
- Thread