OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

 


 

   Re: [xml-dev] [Shannon: information ~ uncertainty] Ramifications to XML

[ Lists Home | Date Index | Thread Index ]

The discussion on unequal probability is unnecessary
and only side-tracks the discusssion.

Lots of possible choice is not information, knowing
which particular choice is taken is information.

It is like the lotto ticket numbers, knowing that the
number you picked is just one of the many millions of
possibility is not much of information. Kowning the
actual winning number is information.

The more the choices, the more information you get
when you know which choices.

Without a schema, when you got the actual XML data,
you know which choices it is out of infinity, that is
a lot of information.

With a schema that only allows 4 valid XML instance,
getting the XML data give you the knowledge of which
of the four, that is not a lot of information.

So a schema does reduces the information a XML message
can carry.

However that comes from knowing the schema, if you
know the schema, then there is less information in the
message, but this comes about because you already
knows a lot, you know the schema. The schema carries a
lot of information, so once you know the schema there
not much more you can know.

Back to the lotto example, if you know the first 5
balls, you already have a lot of information, the
lotto drawing would carry very little information.
Knowing the schema is like knowing the first 5 balls.

So if someone knows the first 5 numbers of the next
lotto winning ticket, please tell me.

Ed Lai

--- TAN Kuan Hui <kuanhui@xemantics.com> wrote:

> > EXAMPLE
> >
> > Imagine that a man is in prison and wants to send
> a message to his wife.
> > Suppose that the prison only allows one message to
> be sent, "I am fine".
> > Even if the person is deathly ill all he can
> > send is, "I am fine".  Clearly there is no
> information in this message.
> >
> > Here the set of possible messages is one.  There
> is no uncertainty and
> there
> > is no information.
> >
> > Suppose that the prison allows one of two messages
> to be sent, "I am fine"
> > or "I am ill".  If the prisoner sends one of these
> messages then some
> > information will be passed to his wife.
> >
> > Here the set of possible messages is two.  There
> is uncertainty (of which
> > message will be sent).  When one of the two
> messages is selected by the
> > prisoner and sent to his wife some information is
> > passed.
> >
> > Suppose that the prison allows one of four
> messages to be sent:
> >
> > 1. I am healthy and happy
> > 2. I am healthy but not happy
> > 3. I am happy but not healthy
> > 4. I am not happy and not healthy
> >
> > If the person sends one of these messages then
> even more information will
> be
> > passed.
> >
> > Thus, the bigger the set of potential messages the
> more uncertainty. The
> > more uncertainty there is the more information
> there is.
> >
> > Interestingly, it doesn't matter what the messages
> are.  All that matters
> is
> > the "number" of messages in the set.  Thus, there
> is the same amount of
> > information in this set:
> >
> 
> You are making the assumption that all possibilities
> occur with equal
> probability. This is equivalent to white noise. If
> the probability
> distribution of your data set is close to white
> noise, then we can
> conclude that there is no redundancy in your encoded
> data. This
> would be ideal.
> 
> 
> >    {"I am fine", "I am ill"}
> >
> > as there is in this set:
> >
> >    {A, B}
> >
> > SIDE NOTES
> >
> > a. Part of Shannon's goal was to measure the
> "amount" of information.
> >    In the example above where there are two
> possible messages the amount
> >    of information is 1 bit.  In the example where
> there are four
> >    possible messages the amount of information is
> 2 bits.
> >
> > b. Shannon refers to uncertainty as "entropy". 
> Thus, the higher the
> >    entropy (uncertainty) the higher the
> information.  The lower the
> >    entropy the lower the information.
> >
> > QUESTIONS
> >
> > 1. How does this aspect (information ~
> uncertainty) of Shannon's work
> relate
> > to data exchange using XML?  (I realize that this
> is a very broad
> question.
> > Its intent is to stimulate discussion on the
> application of Shannon's
> > information/uncertainty ideas to XML data
> exchange)
> >
> > 2. A schema is used to restrict the allowable
> forms that an instance
> > document may take.  So doesn't a schema reduce
> information?
> >
> A schema will constrain the data into a conforming
> set of data but
> that does not mean that every possible combination
> is useful;
> some permutation probably never occur and cannot
> factor in
> your evaluation of the value of the information
> carried.
> In real data, a small set of data would occur with
> significantly higher
> probability than others; therefore you have to
> factor in the
> "probability distribution profile" (pdf) to make
> your discussion meaningful.
> 
> Lets say, if you have an XML schema that constrain
> your data
> into 1 of 10 possible combinations with each one
> occurring with equal
> probability verus another with 100 possibilities but
> with 1 combination
> occurring 99% of the time, which data feed then has
> greater information
> value ?
> 
> The key to your discussion, in my humble opinion, is
> in
> "information predictability" rather than
> "information uncertainty".
> 
> rgds,
> Kuan Hui
> 
> 
> 
> 
> 
> 
>
-----------------------------------------------------------------
> The xml-dev list is sponsored by XML.org
> <http://www.xml.org>, an
> initiative of OASIS <http://www.oasis-open.org>
> 
> The list archives are at
> http://lists.xml.org/archives/xml-dev/
> 
> To subscribe or unsubscribe from this list use the
> subscription
> manager:
> <http://www.oasis-open.org/mlmanage/index.php>
> 
> 


=====
Ed Lai
data_mechanic@yahoo.com


		
_______________________________
Do you Yahoo!?
Express yourself with Y! Messenger! Free. Download now. 
http://messenger.yahoo.com




 

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS