OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help



   RE: [xml-dev] [Shannon: information ~ uncertainty] Ramifications to XML

[ Lists Home | Date Index | Thread Index ]

Correct.  Discussions of Markov models are appropriate.   

Note that the systems are seldom isolates and that entropy 
requires an arrow of time.


Analysis of frequency of letters is applied to text categorization, 
and other pattern-based analysis used for prediction.  Imagine a 
tool that scans texts and based on this analysis, creates a 
schematic description of the frequencies of occurrence of 
some set of categorical types.  Would that output be close 
or equivalent to a DTD or Schema?  Is a DTD/Schema a pattern 
generated by a learning/negotiation process?    

In the following, I will summarize from


The three principle applications of Markov modeling are:

o  Evaluation model: Discovering the probability of an observable sequence
   (apply the forward algorithm)
o  Decoding model: Discovering the sequence of hidden states that create the
   state (apply the Viterbi algorithm)
o  Learning model:  Given an observable state, discovering the hidden states

   (apply the forward/backward algorithm)

In fact, the majority of texts we exchange are not random and 
all choices in the Shannon sense are not equally probable. They 
are 'meaningful'.  Understanding how texts acquire the property 
of meaning infers one understands how multiple systems, even 
ones where within each system some choices are equally probable 
(non-deterministic) and some are not (relative determinism)
when interacting reduce or increase entropy.

Determinism varies system by system.   The arrow of time does not 
in and of itself produce steady increases in entropy.  Only
isolated systems fit that model.   A system interoperating 
with other systems and exchanging energy changes that outcome. 
A Markov model assumes we can predict a future state based 
on past states.
In the system we think of as the WWW, the URI = Energy. 
XML can be used to control the states of messages exchanged 
as identified by URIs.  You could assign probabilities to 
the URIs identifying transitions among states (first order 
Markov systems and so on).  If you have M states and the 
probabilities do not vary in time, you have M squared 
transitions.  This can be modeled as a transition matrix.  
This is a Markov process model triple: 

o States:  all possible states of system
o Initial Vector:  initial probability of each state at time zero
o Transition Matrix:  probability of state given previous state

Note well:  this is a discrete system (not continuous so no 
infinities to worry about: discrete time steps, discrete state 
values, probabilities do not vary in time).

That is obviously not a model of the world as we observe it. 
System behavior may be determined by hidden processes that 
create the observable states and the probabilities are modeled 
using a hidden Markov process that relates the observable states. 
The 'confusion matrix' is the first-order hidden Markov processes 
that set probabilities that then determine the observable states.

The Hidden Markov model is a modeled as a triple (RDFers Unite!):

o Vector of initial state probabilities
o State transition matrix
o Confusion matrix

Again, the probabilities do no vary in time and that is an unrealistic 
assumption, and a weakness of Markov models.


From: TAN Kuan Hui [mailto:kuanhui@xemantics.com]

You are making the assumption that all possibilities occur with equal
probability. This is equivalent to white noise. If the probability
distribution of your data set is close to white noise, then we can
conclude that there is no redundancy in your encoded data. This
would be ideal.
A schema will constrain the data into a conforming set of data but
that does not mean that every possible combination is useful;
some permutation probably never occur and cannot factor in
your evaluation of the value of the information carried.
In real data, a small set of data would occur with significantly higher
probability than others; therefore you have to factor in the
"probability distribution profile" (pdf) to make your discussion meaningful.

Lets say, if you have an XML schema that constrain your data
into 1 of 10 possible combinations with each one occurring with equal
probability verus another with 100 possibilities but with 1 combination
occurring 99% of the time, which data feed then has greater information
value ?

The key to your discussion, in my humble opinion, is in
"information predictability" rather than "information uncertainty".


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS