----- Original Message ----- 
From: "Roger L. Costello" <costello@mitre.org>
To: <xml-dev@lists.xml.org>
Sent: Monday, October 11, 2004 5:45 AM
Subject: [xml-dev] [Shannon: information ~ uncertainty] Ramifications to XML
data exchange?

> 1. How does this aspect (information ~ uncertainty) of Shannon's work
> to data exchange using XML?  (I realize that this is a very broad
> Its intent is to stimulate discussion on the application of Shannon's
> information/uncertainty ideas to XML data exchange)
> 2. A schema is used to restrict the allowable forms that an instance
> document may take.  So doesn't a schema reduce information?

My response is not meant to insult the intelligence and training of the
members of this list, but to present the concepts in elementary fashion in
what seemed to be the spirit of Roger's questions.

If you ask me to tell you a number from 1 to 8, the 8 possible answers
represent 3 bits of information.  This is the information content of the
response I send you.  But how many bits do I need to send the information?
This depends upon the agreed-upon encoding.  If we exchange other kinds of
messages, then some other bits might be needed to distinguish my answer to
the 'pick a number from 1 to 8' question from the other things I could be
telling you.  If there were a total of 16 distinct questions I might be
answering, another 4 bits would be the minimum needed to encode my choice.
If you asked me the same question repeatedly and I didn't answer in order,
your question might have some extra bits attached telling you which one in
the sequence it was, and I might have to pass those back in the answer.

Clearly, XML represents a large family of possible encodings of the
information, but all the bits beyond the 3 containing the answer and
whatever is necessary to specify the question corresponding to the response
that I send you represent wasteful excess.  The point is to use our
communication bandwidth to send the minimun information necessary to
accomplish the purpose of the communication, or in other words, to maximize
the useful information content in the message that is sent.

In general it is presumed we are sending many messages, so an out-of-band
encoding description can be assumed to be amortized over those many messages
(not transmitted each time).  If I tell you that all my responses correspond
to an XML Schema I've sent you in the past, or which you can look up
somewhere, the encoding's excess bits may be reduced from some other XML
form of exchange.  Thus, I could send you the XML
indicating that my answer to the question of type 3 is 7.  This is still a
lot more than the 7 bits that is the minumum, but a lot fewer than
  <answer><question>Pick a number from 1 to 8.</question>The number from 1
to 8 that I picked is 7.</answer>.

Given the appropriate level of need and regularity of communication, the
schema-described XML could be transmitted over the wire in binary form using
ASN.1 encoding or something else, getting a little closer to the minimum
information content, though you and I would see the same text at our
respective ends of the channel.



