OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help



   Re: [xml-dev] [Shannon: information ~ uncertainty] Ramificationsto XML d

[ Lists Home | Date Index | Thread Index ]

hi roger

this is essential reading. although i hadn't read shannon's work before 
i wrote about binary exchange, it is an excellent discussion of the ideas.

the important concept is entropy (as in the second law of 
thermodynamics). basically it's a measure of messiness (i explain to non 
scientists/engineers as "weeds grow"). in a closed system entropy 
increases. the only way to make entropy decrease is to add energy. that 
was the criticism of my paper on binary xml - entropy applies to energy.

well it's also used in information theory as a measure if order. 
something that is well ordered (has little information) has low entropy. 
something that has a lot of information (not well ordered) has high 
entropy. this then dictates the degree to which something can be 
compressed. if there is a lot of information you can't compress it very 
well. if there's not much information then it compresses very well. 
general algorithms for compression such as lzh do a very good job of 
identifying total information content without any special knowledge 
about the domain of the message. some algorithms such as jpeg/mpeg 
recognise that in some domains not all the information content is 
important and can do a better job by ignoring insignificant information. 
(a lot of maths and engineering works on a similar concept of factors so 
small they have no bearing on the result - but that is another topic)

xml messages compress well because the tags as a group in a message 
contain very little information. the message content however contains a 
lot of information and doesn't compress as well, hence it doesn't matter 
if it's binary or ascii - the information content is high and therefore 
the bits to represent it will be high. most arguments about binary/ascii 
representation of numbers look at specifics and ignore the principle.

a schema doesn't of itself reduce information to any great extent 
because as i said the tags are the low information part of the message. 
they may reduce information by causing some messages to be rejected, but 
then those messages weren't in the problem space to begin with. that's a 
bit like measuring something and ignoring the rest of the universe - yes 
we have reduced the problem space, but the rest of the universe is 
irrelevant to the measurement and therefore can't add information to our 

this problem of entropy in information theory also goes to the heart of 
seti. basically seti is looking for complex messages with low entropy. 
we know about the simple oscillating messages from rotating things. we 
know about the white noise from space. so all the signals from space 
that we know about either have very low entropy - apporaching 0 
(pulsating star remains etc) or virtually infinite entropy - white 
noise. there doesn't seem to be anything in between, except here on 
earth :( that we know of. or put another way, a million monkeys typing 
randomly for a million years on all those typewriters will never produce 
the works of shakespeare, or anyone else for that matter. but an alien......

my thoughts....


ps entropy is as important in information theory as friction is in 
physics. friction means you can never have a perpetual motion machine - 
and the us patent office will not consider them and entropy in 
information theory means you can't have infinite compression - and quite 
a few patent applications should have been ignored on that basis - 
millions of investment dollars could have been saved.

Roger L. Costello wrote:

>Hi Folks,
>I am trying to get an understanding of Claude Shannon's work on information
>theory. Below I describe one small part of Shannon's work.  I would like to
>hear your thoughts on its ramifications to information exchange using XML. 
>Shannon defines information as follows: 
>    Information is proportional to uncertainty.  High uncertainty equates
>    to a high amount of information.  Low uncertainty equates to a low
>    amount of information.
>    More specifically, Shannon talks about a set of possible data.
>    A set comprised of 10 possible choices of data has less information than
>    a set comprised of a hundred possible choices.
>This may seem rather counterintuitive, but bear with me as I give an
>In a book I am reading[1] the author gives an example which provides a nice
>intuition of Shannon's statement that information is proportional to
>Imagine that a man is in prison and wants to send a message to his wife.
>Suppose that the prison only allows one message to be sent, "I am fine".
>Even if the person is deathly ill all he can
>send is, "I am fine".  Clearly there is no information in this message.  
>Here the set of possible messages is one.  There is no uncertainty and there
>is no information. 
>Suppose that the prison allows one of two messages to be sent, "I am fine"
>or "I am ill".  If the prisoner sends one of these messages then some
>information will be passed to his wife.
>Here the set of possible messages is two.  There is uncertainty (of which
>message will be sent).  When one of the two messages is selected by the
>prisoner and sent to his wife some information is
>Suppose that the prison allows one of four messages to be sent:
>1. I am healthy and happy
>2. I am healthy but not happy
>3. I am happy but not healthy
>4. I am not happy and not healthy
>If the person sends one of these messages then even more information will be
>Thus, the bigger the set of potential messages the more uncertainty. The
>more uncertainty there is the more information there is.
>Interestingly, it doesn't matter what the messages are.  All that matters is
>the "number" of messages in the set.  Thus, there is the same amount of
>information in this set:
>   {"I am fine", "I am ill"}
>as there is in this set:
>   {A, B}
>a. Part of Shannon's goal was to measure the "amount" of information.
>   In the example above where there are two possible messages the amount
>   of information is 1 bit.  In the example where there are four
>   possible messages the amount of information is 2 bits.
>b. Shannon refers to uncertainty as "entropy".  Thus, the higher the
>   entropy (uncertainty) the higher the information.  The lower the
>   entropy the lower the information.
>1. How does this aspect (information ~ uncertainty) of Shannon's work relate
>to data exchange using XML?  (I realize that this is a very broad question.
>Its intent is to stimulate discussion on the application of Shannon's
>information/uncertainty ideas to XML data exchange)
>2. A schema is used to restrict the allowable forms that an instance
>document may take.  So doesn't a schema reduce information?  
>[1] An Introduction to Cybernetics by Ross Ashby
>The xml-dev list is sponsored by XML.org <http://www.xml.org>, an
>initiative of OASIS <http://www.oasis-open.org>
>The list archives are at http://lists.xml.org/archives/xml-dev/
>To subscribe or unsubscribe from this list use the subscription
>manager: <http://www.oasis-open.org/mlmanage/index.php>

fn:Rick  Marshall
tel;cell:+61 411 287 530


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS