xml-dev - Re: [xml-dev] [Shannon: information ~ uncertainty] Ramificationsto XML d

Re: [xml-dev] [Shannon: information ~ uncertainty] Ramificationsto XML d

[ Lists Home | Date Index | Thread Index ]

To: Irene Polikoff <irene@topquadrant.com>
Subject: Re: [xml-dev] [Shannon: information ~ uncertainty] Ramificationsto XML data exchange?
From: Rick Marshall <rjm@zenucom.com>
Date: Fri, 15 Oct 2004 08:34:07 +1000
Cc: "'sterling'" <sstouden@thelinks.com>, xml-dev@lists.xml.org
In-reply-to: <0I5K00347WEE91@mta6.srv.hcvlny.cv.net>
Organization: Zenucom Pty Ltd
References: <0I5K00347WEE91@mta6.srv.hcvlny.cv.net>
User-agent: Mozilla Thunderbird 0.6 (X11/20040502)

like a lot of these discussions, this one is in danger of slipping from 
the point and confusing the mathematics with orthogonal semantics problems.

information as described by shannon is a mathematical concept. the use 
of information is, if you like, a syntactic and semantic problem.

if you want to do an experiment of information (one that i hope to have 
time to do on the weekend) try this.

method:
1. generate an ascii file of 1,000,000 random numbers. the same randon 
numbers as integers, and the same randon numbers as an xml file

<numbers><number>..</number>....</numbers>

2. compress all three files using your favourite compression program(s) 
and compare the resulting file sizes.

3. discuss.

now i haven't done the experiment yet, but if shannon is correct, they 
should all come out about the same size.

which btw goes a long way to explaining why you can't really compress a 
jpeg :)

rick

Irene Polikoff wrote:

>There are two sides here - sender and receiver. 
>
>The uncertainty is on the sender side. The more types of messages he is
>capable of sending, the more uncertain it is as to what specific message he
>will be asked to send. In this example the sender is the prison.
>
>The information is on the receiver side. This doesn't deal with any
>uncertainty the receiver may have because the information is wrong, or for
>whatever reason. The receiver presumably, needs to do something with the
>information it receives. The smaller the number of message types it could
>receive, the easier it is for "him" to react.
>
>I put "him" in quotes because in the context we are discussing, both sender
>and receiver are computer programs. Low uncertainty, low information makes
>it easier to create software to process it.
>
>For the humans/businesses behind the sender and receiver, this is another
>story: the reduction of information may have undesirable affects. I think
>Roger's question is legitimate - yes, in some ways XML schemas reduce
>information. Is it a bad thing? My guess the answer is "it depends".
>
>-----Original Message-----
>From: sterling [mailto:sstouden@thelinks.com] 
>Sent: Wednesday, October 13, 2004 11:26 PM
>To: Roger L. Costello
>Cc: xml-dev@lists.xml.org
>Subject: Re: [xml-dev] [Shannon: information ~ uncertainty] Ramifications to
>XML data exchange?
>
>
>I hate to show my stupidity, but if the message regarding the condition of 
>the prisoner [a constant] is irrevant to the condition it is meant to 
>express to the intended receiver,  then the message is just as 
>uncertain to the receiver as is no information at all. The state of no 
>information is the ultimate in uncertainity. 
>
>Do you not mean that high uncertainity equates with low amounts of 
>reliable information because high amounts of reliable information is 
>postively related to greater certainity? 
>
>Hence, the reliability of information is a factor of the conseqence of its 
>use and uncertainty approaches certainty as predicable consequence of 
>reliance becomes more certain. 
>
>Furthermore, if I am trying to predict the weather, it does not matter 
>which message or how many messages are sent, since the message is neither 
>related to the condition of person whose condition is to be expressed or 
>to the weather conditions.
>
>I know nothing of Shannon, but information that has no foundation ( that 
>is random information unrelated to the circumstances of its source or use) 
>is not information, it is at its very best, data.  
>
>Information is an object with many modifers and much utility!
>
>My two bits. 
>
>
>On Mon, 11 Oct 2004, Roger L. Costello wrote:
>
>  
>
>>Hi Folks,
>>
>>I am trying to get an understanding of Claude Shannon's work on
>>    
>>
>information
>  
>
>>theory. Below I describe one small part of Shannon's work.  I would like
>>    
>>
>to
>  
>
>>hear your thoughts on its ramifications to information exchange using XML.
>>    
>>
>
>  
>
>>INFORMATION
>>
>>Shannon defines information as follows: 
>>
>>    Information is proportional to uncertainty.  High uncertainty equates
>>    to a high amount of information.  Low uncertainty equates to a low
>>    amount of information.
>>
>>    More specifically, Shannon talks about a set of possible data.
>>    A set comprised of 10 possible choices of data has less information
>>    
>>
>than
>  
>
>>    a set comprised of a hundred possible choices.
>>
>>This may seem rather counterintuitive, but bear with me as I give an
>>example. 
>>
>>In a book I am reading[1] the author gives an example which provides a
>>    
>>
>nice
>  
>
>>intuition of Shannon's statement that information is proportional to
>>uncertainty.
>>
>>EXAMPLE
>>
>>Imagine that a man is in prison and wants to send a message to his wife.
>>Suppose that the prison only allows one message to be sent, "I am fine".
>>Even if the person is deathly ill all he can
>>send is, "I am fine".  Clearly there is no information in this message.  
>>
>>Here the set of possible messages is one.  There is no uncertainty and
>>    
>>
>there
>  
>
>>is no information. 
>>
>>Suppose that the prison allows one of two messages to be sent, "I am fine"
>>or "I am ill".  If the prisoner sends one of these messages then some
>>information will be passed to his wife.
>>
>>Here the set of possible messages is two.  There is uncertainty (of which
>>message will be sent).  When one of the two messages is selected by the
>>prisoner and sent to his wife some information is
>>passed.
>>
>>Suppose that the prison allows one of four messages to be sent:
>>
>>1. I am healthy and happy
>>2. I am healthy but not happy
>>3. I am happy but not healthy
>>4. I am not happy and not healthy
>>
>>If the person sends one of these messages then even more information will
>>    
>>
>be
>  
>
>>passed.
>>
>>Thus, the bigger the set of potential messages the more uncertainty. The
>>more uncertainty there is the more information there is.
>>
>>Interestingly, it doesn't matter what the messages are.  All that matters
>>    
>>
>is
>  
>
>>the "number" of messages in the set.  Thus, there is the same amount of
>>information in this set:
>>
>>   {"I am fine", "I am ill"}
>>
>>as there is in this set:
>>
>>   {A, B}
>>
>>SIDE NOTES
>>
>>a. Part of Shannon's goal was to measure the "amount" of information.
>>   In the example above where there are two possible messages the amount
>>   of information is 1 bit.  In the example where there are four
>>   possible messages the amount of information is 2 bits.
>>
>>b. Shannon refers to uncertainty as "entropy".  Thus, the higher the
>>   entropy (uncertainty) the higher the information.  The lower the
>>   entropy the lower the information.
>>
>>QUESTIONS
>>
>>1. How does this aspect (information ~ uncertainty) of Shannon's work
>>    
>>
>relate
>  
>
>>to data exchange using XML?  (I realize that this is a very broad
>>    
>>
>question.
>  
>
>>Its intent is to stimulate discussion on the application of Shannon's
>>information/uncertainty ideas to XML data exchange)
>>
>>2. A schema is used to restrict the allowable forms that an instance
>>document may take.  So doesn't a schema reduce information?  
>>
>>/Roger  
>> 
>>[1] An Introduction to Cybernetics by Ross Ashby
>>
>>
>>
>>-----------------------------------------------------------------
>>The xml-dev list is sponsored by XML.org <http://www.xml.org>, an
>>initiative of OASIS <http://www.oasis-open.org>
>>
>>The list archives are at http://lists.xml.org/archives/xml-dev/
>>
>>To subscribe or unsubscribe from this list use the subscription
>>manager: <http://www.oasis-open.org/mlmanage/index.php>
>>
>>    
>>
>
>  
>

begin:vcard
fn:Rick  Marshall
n:Marshall;Rick 
email;internet:rjm@zenucom.com
tel;cell:+61 411 287 530
x-mozilla-html:TRUE
version:2.1
end:vcard

References:
- RE: [xml-dev] [Shannon: information ~ uncertainty] Ramifications toXML data exchange?
  - From: Irene Polikoff <irene@topquadrant.com>

Prev by Date: RE: [xml-dev] Which Will Be Released First, the W3C's XQuery Spec or Longhorn?
Next by Date: ANNOUNCE: Rx4RDF and Rhizome 0.4.1
Previous by thread: RE: [xml-dev] [Shannon: information ~ uncertainty] Ramifications toXML data exchange?
Next by thread: Re: [xml-dev] [Shannon: information ~ uncertainty] Ramificationsto XML data exchange?
Index(es):
- Date
- Thread