OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

 


 

   Re: [xml-dev] [Shannon: information ~ uncertainty] Ramifications to XML

[ Lists Home | Date Index | Thread Index ]

You mean

<xs:element name="strikeNumber">
 <xs:complexType>
  <xs:sequence>
   <xs:element name="digit" type="xs:integer" minOccurs="16'
maxOccurs="16"/>
  </xs:sequence>
 </xs:complexType>
</xs:element>

has more info value than

<xs:element name="strikeNumber">
 <xs:complexType>
  <xs:sequence>
   <xs:element name="digit" type="xs:integer" minOccurs="8' maxOccurs="8"/>
  </xs:sequence>
 </xs:complexType>
</xs:element>

?

----- Original Message ----- 
From: "Ed Lai" <data_mechanic@yahoo.com>
To: "TAN Kuan Hui" <kuanhui@xemantics.com>; "Roger L. Costello"
<costello@mitre.org>; <xml-dev@lists.xml.org>
Sent: Wednesday, October 13, 2004 12:09 AM
Subject: Re: [xml-dev] [Shannon: information ~ uncertainty] Ramifications to
XML data exchange?


> The discussion on unequal probability is unnecessary
> and only side-tracks the discusssion.
>
> Lots of possible choice is not information, knowing
> which particular choice is taken is information.
>
> It is like the lotto ticket numbers, knowing that the
> number you picked is just one of the many millions of
> possibility is not much of information. Kowning the
> actual winning number is information.
>
> The more the choices, the more information you get
> when you know which choices.
>
> Without a schema, when you got the actual XML data,
> you know which choices it is out of infinity, that is
> a lot of information.
>
> With a schema that only allows 4 valid XML instance,
> getting the XML data give you the knowledge of which
> of the four, that is not a lot of information.
>
> So a schema does reduces the information a XML message
> can carry.
>
> However that comes from knowing the schema, if you
> know the schema, then there is less information in the
> message, but this comes about because you already
> knows a lot, you know the schema. The schema carries a
> lot of information, so once you know the schema there
> not much more you can know.
>
> Back to the lotto example, if you know the first 5
> balls, you already have a lot of information, the
> lotto drawing would carry very little information.
> Knowing the schema is like knowing the first 5 balls.
>
> So if someone knows the first 5 numbers of the next
> lotto winning ticket, please tell me.
>
> Ed Lai
>
> --- TAN Kuan Hui <kuanhui@xemantics.com> wrote:
>
> > > EXAMPLE
> > >
> > > Imagine that a man is in prison and wants to send
> > a message to his wife.
> > > Suppose that the prison only allows one message to
> > be sent, "I am fine".
> > > Even if the person is deathly ill all he can
> > > send is, "I am fine".  Clearly there is no
> > information in this message.
> > >
> > > Here the set of possible messages is one.  There
> > is no uncertainty and
> > there
> > > is no information.
> > >
> > > Suppose that the prison allows one of two messages
> > to be sent, "I am fine"
> > > or "I am ill".  If the prisoner sends one of these
> > messages then some
> > > information will be passed to his wife.
> > >
> > > Here the set of possible messages is two.  There
> > is uncertainty (of which
> > > message will be sent).  When one of the two
> > messages is selected by the
> > > prisoner and sent to his wife some information is
> > > passed.
> > >
> > > Suppose that the prison allows one of four
> > messages to be sent:
> > >
> > > 1. I am healthy and happy
> > > 2. I am healthy but not happy
> > > 3. I am happy but not healthy
> > > 4. I am not happy and not healthy
> > >
> > > If the person sends one of these messages then
> > even more information will
> > be
> > > passed.
> > >
> > > Thus, the bigger the set of potential messages the
> > more uncertainty. The
> > > more uncertainty there is the more information
> > there is.
> > >
> > > Interestingly, it doesn't matter what the messages
> > are.  All that matters
> > is
> > > the "number" of messages in the set.  Thus, there
> > is the same amount of
> > > information in this set:
> > >
> >
> > You are making the assumption that all possibilities
> > occur with equal
> > probability. This is equivalent to white noise. If
> > the probability
> > distribution of your data set is close to white
> > noise, then we can
> > conclude that there is no redundancy in your encoded
> > data. This
> > would be ideal.
> >
> >
> > >    {"I am fine", "I am ill"}
> > >
> > > as there is in this set:
> > >
> > >    {A, B}
> > >
> > > SIDE NOTES
> > >
> > > a. Part of Shannon's goal was to measure the
> > "amount" of information.
> > >    In the example above where there are two
> > possible messages the amount
> > >    of information is 1 bit.  In the example where
> > there are four
> > >    possible messages the amount of information is
> > 2 bits.
> > >
> > > b. Shannon refers to uncertainty as "entropy".
> > Thus, the higher the
> > >    entropy (uncertainty) the higher the
> > information.  The lower the
> > >    entropy the lower the information.
> > >
> > > QUESTIONS
> > >
> > > 1. How does this aspect (information ~
> > uncertainty) of Shannon's work
> > relate
> > > to data exchange using XML?  (I realize that this
> > is a very broad
> > question.
> > > Its intent is to stimulate discussion on the
> > application of Shannon's
> > > information/uncertainty ideas to XML data
> > exchange)
> > >
> > > 2. A schema is used to restrict the allowable
> > forms that an instance
> > > document may take.  So doesn't a schema reduce
> > information?
> > >
> > A schema will constrain the data into a conforming
> > set of data but
> > that does not mean that every possible combination
> > is useful;
> > some permutation probably never occur and cannot
> > factor in
> > your evaluation of the value of the information
> > carried.
> > In real data, a small set of data would occur with
> > significantly higher
> > probability than others; therefore you have to
> > factor in the
> > "probability distribution profile" (pdf) to make
> > your discussion meaningful.
> >
> > Lets say, if you have an XML schema that constrain
> > your data
> > into 1 of 10 possible combinations with each one
> > occurring with equal
> > probability verus another with 100 possibilities but
> > with 1 combination
> > occurring 99% of the time, which data feed then has
> > greater information
> > value ?
> >
> > The key to your discussion, in my humble opinion, is
> > in
> > "information predictability" rather than
> > "information uncertainty".
> >
> > rgds,
> > Kuan Hui
> >
> >
> >
> >
> >
> >
> >
> -----------------------------------------------------------------
> > The xml-dev list is sponsored by XML.org
> > <http://www.xml.org>, an
> > initiative of OASIS <http://www.oasis-open.org>
> >
> > The list archives are at
> > http://lists.xml.org/archives/xml-dev/
> >
> > To subscribe or unsubscribe from this list use the
> > subscription
> > manager:
> > <http://www.oasis-open.org/mlmanage/index.php>
> >
> >
>
>
> =====
> Ed Lai
> data_mechanic@yahoo.com
>
>
>
> _______________________________
> Do you Yahoo!?
> Express yourself with Y! Messenger! Free. Download now.
> http://messenger.yahoo.com
>
> -----------------------------------------------------------------
> The xml-dev list is sponsored by XML.org <http://www.xml.org>, an
> initiative of OASIS <http://www.oasis-open.org>
>
> The list archives are at http://lists.xml.org/archives/xml-dev/
>
> To subscribe or unsubscribe from this list use the subscription
> manager: <http://www.oasis-open.org/mlmanage/index.php>
>
>
>





 

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS