[
Lists Home |
Date Index |
Thread Index
]
like a lot of these discussions, this one is in danger of slipping from
the point and confusing the mathematics with orthogonal semantics problems.
information as described by shannon is a mathematical concept. the use
of information is, if you like, a syntactic and semantic problem.
if you want to do an experiment of information (one that i hope to have
time to do on the weekend) try this.
method:
1. generate an ascii file of 1,000,000 random numbers. the same randon
numbers as integers, and the same randon numbers as an xml file
<numbers><number>..</number>....</numbers>
2. compress all three files using your favourite compression program(s)
and compare the resulting file sizes.
3. discuss.
now i haven't done the experiment yet, but if shannon is correct, they
should all come out about the same size.
which btw goes a long way to explaining why you can't really compress a
jpeg :)
rick
Irene Polikoff wrote:
>There are two sides here - sender and receiver.
>
>The uncertainty is on the sender side. The more types of messages he is
>capable of sending, the more uncertain it is as to what specific message he
>will be asked to send. In this example the sender is the prison.
>
>The information is on the receiver side. This doesn't deal with any
>uncertainty the receiver may have because the information is wrong, or for
>whatever reason. The receiver presumably, needs to do something with the
>information it receives. The smaller the number of message types it could
>receive, the easier it is for "him" to react.
>
>I put "him" in quotes because in the context we are discussing, both sender
>and receiver are computer programs. Low uncertainty, low information makes
>it easier to create software to process it.
>
>For the humans/businesses behind the sender and receiver, this is another
>story: the reduction of information may have undesirable affects. I think
>Roger's question is legitimate - yes, in some ways XML schemas reduce
>information. Is it a bad thing? My guess the answer is "it depends".
>
>-----Original Message-----
>From: sterling [mailto:sstouden@thelinks.com]
>Sent: Wednesday, October 13, 2004 11:26 PM
>To: Roger L. Costello
>Cc: xml-dev@lists.xml.org
>Subject: Re: [xml-dev] [Shannon: information ~ uncertainty] Ramifications to
>XML data exchange?
>
>
>I hate to show my stupidity, but if the message regarding the condition of
>the prisoner [a constant] is irrevant to the condition it is meant to
>express to the intended receiver, then the message is just as
>uncertain to the receiver as is no information at all. The state of no
>information is the ultimate in uncertainity.
>
>Do you not mean that high uncertainity equates with low amounts of
>reliable information because high amounts of reliable information is
>postively related to greater certainity?
>
>Hence, the reliability of information is a factor of the conseqence of its
>use and uncertainty approaches certainty as predicable consequence of
>reliance becomes more certain.
>
>Furthermore, if I am trying to predict the weather, it does not matter
>which message or how many messages are sent, since the message is neither
>related to the condition of person whose condition is to be expressed or
>to the weather conditions.
>
>I know nothing of Shannon, but information that has no foundation ( that
>is random information unrelated to the circumstances of its source or use)
>is not information, it is at its very best, data.
>
>Information is an object with many modifers and much utility!
>
>My two bits.
>
>
>On Mon, 11 Oct 2004, Roger L. Costello wrote:
>
>
>
>>Hi Folks,
>>
>>I am trying to get an understanding of Claude Shannon's work on
>>
>>
>information
>
>
>>theory. Below I describe one small part of Shannon's work. I would like
>>
>>
>to
>
>
>>hear your thoughts on its ramifications to information exchange using XML.
>>
>>
>
>
>
>>INFORMATION
>>
>>Shannon defines information as follows:
>>
>> Information is proportional to uncertainty. High uncertainty equates
>> to a high amount of information. Low uncertainty equates to a low
>> amount of information.
>>
>> More specifically, Shannon talks about a set of possible data.
>> A set comprised of 10 possible choices of data has less information
>>
>>
>than
>
>
>> a set comprised of a hundred possible choices.
>>
>>This may seem rather counterintuitive, but bear with me as I give an
>>example.
>>
>>In a book I am reading[1] the author gives an example which provides a
>>
>>
>nice
>
>
>>intuition of Shannon's statement that information is proportional to
>>uncertainty.
>>
>>EXAMPLE
>>
>>Imagine that a man is in prison and wants to send a message to his wife.
>>Suppose that the prison only allows one message to be sent, "I am fine".
>>Even if the person is deathly ill all he can
>>send is, "I am fine". Clearly there is no information in this message.
>>
>>Here the set of possible messages is one. There is no uncertainty and
>>
>>
>there
>
>
>>is no information.
>>
>>Suppose that the prison allows one of two messages to be sent, "I am fine"
>>or "I am ill". If the prisoner sends one of these messages then some
>>information will be passed to his wife.
>>
>>Here the set of possible messages is two. There is uncertainty (of which
>>message will be sent). When one of the two messages is selected by the
>>prisoner and sent to his wife some information is
>>passed.
>>
>>Suppose that the prison allows one of four messages to be sent:
>>
>>1. I am healthy and happy
>>2. I am healthy but not happy
>>3. I am happy but not healthy
>>4. I am not happy and not healthy
>>
>>If the person sends one of these messages then even more information will
>>
>>
>be
>
>
>>passed.
>>
>>Thus, the bigger the set of potential messages the more uncertainty. The
>>more uncertainty there is the more information there is.
>>
>>Interestingly, it doesn't matter what the messages are. All that matters
>>
>>
>is
>
>
>>the "number" of messages in the set. Thus, there is the same amount of
>>information in this set:
>>
>> {"I am fine", "I am ill"}
>>
>>as there is in this set:
>>
>> {A, B}
>>
>>SIDE NOTES
>>
>>a. Part of Shannon's goal was to measure the "amount" of information.
>> In the example above where there are two possible messages the amount
>> of information is 1 bit. In the example where there are four
>> possible messages the amount of information is 2 bits.
>>
>>b. Shannon refers to uncertainty as "entropy". Thus, the higher the
>> entropy (uncertainty) the higher the information. The lower the
>> entropy the lower the information.
>>
>>QUESTIONS
>>
>>1. How does this aspect (information ~ uncertainty) of Shannon's work
>>
>>
>relate
>
>
>>to data exchange using XML? (I realize that this is a very broad
>>
>>
>question.
>
>
>>Its intent is to stimulate discussion on the application of Shannon's
>>information/uncertainty ideas to XML data exchange)
>>
>>2. A schema is used to restrict the allowable forms that an instance
>>document may take. So doesn't a schema reduce information?
>>
>>/Roger
>>
>>[1] An Introduction to Cybernetics by Ross Ashby
>>
>>
>>
>>-----------------------------------------------------------------
>>The xml-dev list is sponsored by XML.org <http://www.xml.org>, an
>>initiative of OASIS <http://www.oasis-open.org>
>>
>>The list archives are at http://lists.xml.org/archives/xml-dev/
>>
>>To subscribe or unsubscribe from this list use the subscription
>>manager: <http://www.oasis-open.org/mlmanage/index.php>
>>
>>
>>
>
>
>
begin:vcard
fn:Rick Marshall
n:Marshall;Rick
email;internet:rjm@zenucom.com
tel;cell:+61 411 287 530
x-mozilla-html:TRUE
version:2.1
end:vcard
|