xml-dev - Re: [xml-dev] Round 2: Identifying Data for Interchange

Re: [xml-dev] Round 2: Identifying Data for Interchange

[ Lists Home | Date Index | Thread Index ]

To: "Roger L. Costello" <costello@mitre.org>
Subject: Re: [xml-dev] Round 2: Identifying Data for Interchange
From: "Chiusano Joseph" <chiusano_joseph@bah.com>
Date: Tue, 07 Jan 2003 11:48:50 -0500
Cc: xml-dev@lists.xml.org
Organization: BAH
References: <3E1B0010.1F622F8A@mitre.org>

<Excerpt1>
Disadvantages:

    - A client typically does not have all the data necessary, nor
      the algorithms necessary to generate a good recommendation.

In this example, the advantages of sending derived is overwhelming.

SUMMARY

When the data being interchanged is data from a Service then most likely
the data should "derived data".  Otherwise, the "Service" is not
providing much of a "service".
</Excerpt1>

An additional example may be the calculation of a value, such as a
monthly payment for an auto loan.  Should the components of the payment
calculation (Annual Percentage Rate, principle, payment frequency, etc.)
be sent over the wire, or the actual calculated value?  Bandwidth
considerations would certainly point to sending the actual calculated
value, if it is available from the sender (also performance
considerations on the reciever side for the re-calculation of the
value).  However, there may be some distinct cases in which the
individual components may need to be sent, perhaps because an additional
parameter is to be factored into the calculation on the receiver end
(and only the receiver has that parameter value).  In a nutshell, I
believe it is up to the specific case, and so it is very difficult to
define a best practice for this.

<Excerpt2>
How do the two systems "bridge the semantic gap"?  

One approach would be for one system to "give in" and adopt the other
systems semantics.  That is highly unlikely.  Both systems will fight
like "cat and dogs" to keep their semantics.  And there is good reason
for this - they have invested a lot of time and money to build
applications that process the data with those semantics.

With both systems the distance data is derived data - the data is
derived from the fundamental position data.  If we recognize this then
lots of grief and arguing can be avoided by agreeing to interchange the
fundamental position data.

Thus, we bridge the semantic gap by simply side-stepping it!
</Excerpt2>

Another approach would be for the 2 systems ("trading partners") to
establish a policy (AKA agreement/contract) stipulating which "distance
type" (line-of-sight or ground) all distance values exchanged between
the 2 represent.  Perhaps this could be stipulated as part of the "Web
Service Policy".

An additional approach would be for both partners to adopt a common,
publicly developed standard (vocabulary) for the exchange of this
information (hypothetically assuming there is one).  The "distance type"
would thereby be stiplulated by the means dictated within the
vocabulary, perhaps by using an additional attribute as follows:

<distance to="distination airport" type="line-of-sight">590</distance>

Kind Regards,
Joe Chiusano
Booz | Allen | Hamilton

"Roger L. Costello" wrote:
> 
> Hi Folks,
> 
> Thanks again for all your input.  It has been most enlightening!
> 
> I have taken all your comments, assimilated them, and I believe that I
> have found a new perspective on this issue that may alter things quite a
> bit.  Before I present this "new perspective" it is important that I
> recap what was discussed.
> 
> RECAP
> 
> Issue: What are the defining characteristics of data that is "suitable"
> for being interchanged?  Conversely, what are the defining
> characteristics of data that is "not suitable" for being interchanged?
> 
> To provide a basis for discussion, I present two examples:
> 
> 1. AIRCRAFT EXAMPLE: onboard an aircraft is a system that periodically
> transmits to a ground station some information.  Included in this
> information is:
> 
>      - Distance to Destination Airport
>      - Distance to Navigation Aid
>      - Distance to Emergency Airport
> 
> 2. FINANCIAL INSTITUTION EXAMPLE: a financial institution provides
> recommendations to its clients on what to do with their stock shares.
> The information sent to its clients is a recommendation:
> 
>       - Buy, Sell, Hold
> 
> Let's look at each of these examples.
> 
> AIRCRAFT EXAMPLE
> 
> What data should be sent to the ground station? That is, what data is
> "suitable" for interchange?  Your first instinct might be to send
> "Distance-to-X" data.  For example:
> 
>      <distance to="destination airport">590</distance>
>      <distance to="navigation aid">140</distance>
>      <distance to="emergency airport">75</distance>
> 
> Let's consider the advantages and disadvantages of interchanging
> distance data.
> 
> Advantages:
> 
>    - The sender and receiver will have the same value for the
>      distance.  There is no ambiguity due to miscalculation.
> 
>    - If we think of the aircraft as a "service" then it is
>      providing a service by taking its raw position data and
>      calculating the distance.  A client may not have the ability
>      to compute the distance from raw data, so being able to receive
>      the distance data is valuable to the client.
> 
> Disadvantages:
> 
>     - Distance data may be important to the client, but it has
>       limited utility.  It discards much of the information
>       that recipients of the data could potentially want. For
>       example, it doesn't allow recipients to compute things like
>       heading, location relative to another aircraft, etc.  Thus,
>       the client has paid for the service with a lack in
>       flexibility of the data.
> 
> Rather than interchanging Distance-to-X data the aircraft system could
> send more fundamental data - position data.  Let's consider the
> advantages and disadvantages of interchanging position data.
> 
> Advantages:
> 
>    - With position data a recipient could not only know the distance
>      of the aircraft, but could also determine the aircraft heading,
>      time to fly-over, relationship to other aircraft.  Also, it could
>      plug the position data into other contexts, such as a map
>      application.
> 
> Disadvantages:
> 
>     - It is possible that a recipient may calculate a distance that is
>       different than the distance the aircraft system calculates
>       (due to differing algorithms, rounding errors, etc). Thus,
>       there is the possibility for ambiguity and misinterpretation
>       between the sender and the receiver.
> 
>     - The recipient may not have the computational power to take the
>       raw position data and calculate the distance.
> 
>     - The aircraft "service" is not providing much service to
>       the client.  Instead of the service doing value-add, the
>       recipient is expected to do value-add.
> 
> Some things to note from this example:
> 
> a. Where is "value-add" done? ... server-side or client-side?
> 
>      - sending distance data implies doing value-add on the
>        server-side.
>      - sending position data implies doing value-add on the
>        client-side.
> 
> b. Value-adding on the server-side makes it easier on the client,
>    but restricts the utility of the data.
> 
> c. Value-adding on the client-side allows the data to be used in
>    more ways, but burdens the client with more work.
> 
> At this point, based upon the above example, I would like to introduce
> some terminology:
> 
> DERIVED DATA: Distance represents data that is the output from doing a
> calculation on raw, position data.  The distance data is called derived
> data.
> 
> FUNDAMENTAL DATA: Position is the raw, basic data.  Position data is
> called fundamental data.
> 
> FINANCIAL INSTITUTION EXAMPLE
> 
> A service that financial institutions provide is to give its clients
> advice on when to buy, sell, or hold a stock.  The algorithms used to
> determine whether to buy, sell, or hold a stock is based upon some
> fundamental data (plus other data not shown):
> 
>     - performance of the stock over the past few months
>     - performance of the entire stock market
>     - performance of other stock markets
> 
> Typically financial institutions send to their clients just the
> recommendation - "buy", or "sell", or "hold".  The advantages of this
> are pretty clear-cut:
> 
> Advantages:
> 
>     - The factors that a financial institution takes into
>       account to generate its "recommendation" is proprietary.
>       It has no wish to reveal all the data that it uses, nor the
>       algorithm that it employs.
> 
>     - Typically clients are of the mindset, "just tell me what to do".
>       Thus, the financial institution is providing them a quick, easy
>       service.
> 
> The alternative to sending the derived, recommendation data is for the
> financial institution to send the fundamental data (the performances
> data shown above).  However, the disadvantages to this are also
> clear-cut:
> 
> Disadvantages:
> 
>     - A client typically does not have all the data necessary, nor
>       the algorithms necessary to generate a good recommendation.
> 
> In this example, the advantages of sending derived is overwhelming.
> 
> SUMMARY
> 
> When the data being interchanged is data from a Service then most likely
> the data should "derived data".  Otherwise, the "Service" is not
> providing much of a "service".
> 
> ...
> 
> A DIFFERENT PERSPECTIVE
> 
> The above examples focused on data interchange from the perspective of
> the sender providing a "Service" to clients.  Now I'd like to shift
> perspective to this:
> 
>     TWO SYSTEMS SHARING DATA
> 
> That is, System B needs data from System A.  System A wishes to share
> its data with System B.
> 
> Let's take the above two examples again.  This time we will look at them
> from the perspective of sharing data with another system.
> 
> AIRCRAFT EXAMPLE
> 
> Let's suppose that both System A and System B have requirements for
> aircraft distance data.  They would like to share the data:
> 
>      System A  <----------------> System B
>                     distance
> 
> That is, if System A has aircraft distance data then it would like to be
> able to share it with System B, or vice versa. (System
> interoperability.)
> 
> Let's suppose that both systems are using XML to organize their data.
> Amazingly, they both use the same element and attribute names!  For the
> above example, here's what the data looks like in both systems:
> 
> System A:
> 
>     <distance to="distination airport">590</distance>
> 
> System B
> 
>     <distance to="distination airport">545</distance>
> 
> The careful reader will have noted that, while the element and attribute
> names are identical, the "data" differs slightly (590 vs 545).  Let's
> examine why this is the case.
> 
> System A keeps a record of the "line-of-sight distance" between the
> aircraft and the ground station.
> 
> System B keeps a record of the "ground distance" between the aircraft
> and the ground station.
> 
> Thus, the two systems have slightly differing "semantics" with regards
> to the "distance".
> 
> How do the two systems "bridge the semantic gap"?
> 
> One approach would be for one system to "give in" and adopt the other
> systems semantics.  That is highly unlikely.  Both systems will fight
> like "cat and dogs" to keep their semantics.  And there is good reason
> for this - they have invested a lot of time and money to build
> applications that process the data with those semantics.
> 
> With both systems the distance data is derived data - the data is
> derived from the fundamental position data.  If we recognize this then
> lots of grief and arguing can be avoided by agreeing to interchange the
> fundamental position data.
> 
> Thus, we bridge the semantic gap by simply side-stepping it!
> 
> FINANCIAL INSTITUTION EXAMPLE
> 
> Let's suppose that Institution A and Institution B are partners and both
> have requirements to compute a recommendation (buy, sell, hold) for
> their clients.  They would like to be able to share their data:
> 
>      Institution A  <----------------> Institution B
>                         what data?
> 
> What data should they share?  One possibility is for them to share their
> recommendation:
> 
>      Institution A  <----------------> Institution B
>                       recommendation
> 
> Let's even suppose that both systems are using XML to organize their
> data.  Amazingly, they both use the same element name!  Here's what
> their data looks like:
> 
> Institution A:
> 
>     <recommendation>Sell</recommendation>
> 
> Institution B
> 
>     <recommendation>Hold</recommendation>
> 
> The careful reader will have noted that, while the element names are
> identical, the "data" differs (Sell vs Hold).  Let's examine why this is
> the case.
> 
> Institution A uses a different algorithm for computing its
> recommendation than does Institution B.
> 
> Thus, the two Institutions have differing "semantics" with regards to
> the "recommendation".
> 
> How do the two Institutions "bridge the semantic gap"?  One approach
> would be for one Institution to "give in" and adopt the other systems
> semantics (i.e., the other Institution's algorithm).  That is highly
> unlikely.  Both Institutions will fight like "cat and dogs" to keep
> their semantics.  And there is good reason for this - they have invested
> a lot of time and money to create their recommendation-producing
> algorithms.
> 
> Both Institution's recommendation data is derived data - the data is
> derived from the fundamental market performances data.  If we recognize
> this then lots of grief and arguing can be avoided by agreeing to
> interchange the fundamental performances data:
> 
>      Institution A  <----------------> Institution B
>                     performances data
> 
> SUMMARY
> 
> When the data being interchanged is for the purpose of "sharing data"
> then it seems most prudent to interchange fundamental data.  This will
> enable you to gracefully avoid the "semantic mismatch" problem.
> 
> ** GRAND SUMMARY **
> 
> We need to distinguish between:
> 
>     - data interchange for the purpose of Services,
>     - data interchange for the purpose of sharing data.
> 
> For Service data interchange --> use derived data.
> For data sharing --> use fundamental data.
> 
> QUESTIONS
> 
> 1. Schema Implications: suppose that a System/Institution deploys a Web
> service, and also shares data with a partner.  Does it create two
> schemas:
> 
>     - one that describes the interchange data with Web service clients,
>     - one that describes the interchange data for data sharing.
> 
> Are two schemas needed?
> 
> Perhaps there should always be just one schema - that describes the
> fundamental data.  After all, derived data is "application generated
> data". Here's my argument: Applications come and go, but fundamental
> data endures.  Thus, "application (derived) data" should not be part of
> a schema. So, I will argue in favor of creating schemas just for
> fundamental data.  What do you think?
> 
> 2. Data Sharing = Service?  You can certainly think of data sharing as a
> very simple form of Service.   And yet, with the former it is best to
> interchange fundamental data, whereas in the later it is best to
> interchange derived data.  At what point does "switchover" occur?  That
> is, at what point does it stop becoming a data sharing situation using
> fundamental data, and becomes a Service using derived data?
> 
> I eagerly look forward to your thoughts and comments.  /Roger
> 
> -----------------------------------------------------------------
> The xml-dev list is sponsored by XML.org <http://www.xml.org>, an
> initiative of OASIS <http://www.oasis-open.org>
> 
> The list archives are at http://lists.xml.org/archives/xml-dev/
> 
> To subscribe or unsubscribe from this list use the subscription
> manager: <http://lists.xml.org/ob/adm.pl>

begin:vcard 
n:Chiusano;Joseph
tel;work:(703) 902-6923
x-mozilla-html:FALSE
url:www.bah.com
org:Booz | Allen | Hamilton;IT Digital Strategies Team
adr:;;8283 Greensboro Drive;McLean;VA;22012;
version:2.1
email;internet:chiusano_joseph@bah.com
title:Senior Consultant
fn:Joseph M. Chiusano
end:vcard

References:
- Round 2: Identifying Data for Interchange
  - From: "Roger L. Costello" <costello@mitre.org>

Prev by Date: Round 2: Identifying Data for Interchange
Next by Date: the xml: namespace
Previous by thread: Round 2: Identifying Data for Interchange
Next by thread: Re: [xml-dev] Round 2: Identifying Data for Interchange
Index(es):
- Date
- Thread