Lists Home |
Date Index |
Correct. Discussions of Markov models are appropriate.
Note that the systems are seldom isolates and that entropy
requires an arrow of time.
Analysis of frequency of letters is applied to text categorization,
and other pattern-based analysis used for prediction. Imagine a
tool that scans texts and based on this analysis, creates a
schematic description of the frequencies of occurrence of
some set of categorical types. Would that output be close
or equivalent to a DTD or Schema? Is a DTD/Schema a pattern
generated by a learning/negotiation process?
In the following, I will summarize from
The three principle applications of Markov modeling are:
o Evaluation model: Discovering the probability of an observable sequence
(apply the forward algorithm)
o Decoding model: Discovering the sequence of hidden states that create the
state (apply the Viterbi algorithm)
o Learning model: Given an observable state, discovering the hidden states
(apply the forward/backward algorithm)
In fact, the majority of texts we exchange are not random and
all choices in the Shannon sense are not equally probable. They
are 'meaningful'. Understanding how texts acquire the property
of meaning infers one understands how multiple systems, even
ones where within each system some choices are equally probable
(non-deterministic) and some are not (relative determinism)
when interacting reduce or increase entropy.
Determinism varies system by system. The arrow of time does not
in and of itself produce steady increases in entropy. Only
isolated systems fit that model. A system interoperating
with other systems and exchanging energy changes that outcome.
A Markov model assumes we can predict a future state based
on past states.
In the system we think of as the WWW, the URI = Energy.
XML can be used to control the states of messages exchanged
as identified by URIs. You could assign probabilities to
the URIs identifying transitions among states (first order
Markov systems and so on). If you have M states and the
probabilities do not vary in time, you have M squared
transitions. This can be modeled as a transition matrix.
This is a Markov process model triple:
o States: all possible states of system
o Initial Vector: initial probability of each state at time zero
o Transition Matrix: probability of state given previous state
Note well: this is a discrete system (not continuous so no
infinities to worry about: discrete time steps, discrete state
values, probabilities do not vary in time).
That is obviously not a model of the world as we observe it.
System behavior may be determined by hidden processes that
create the observable states and the probabilities are modeled
using a hidden Markov process that relates the observable states.
The 'confusion matrix' is the first-order hidden Markov processes
that set probabilities that then determine the observable states.
The Hidden Markov model is a modeled as a triple (RDFers Unite!):
o Vector of initial state probabilities
o State transition matrix
o Confusion matrix
Again, the probabilities do no vary in time and that is an unrealistic
assumption, and a weakness of Markov models.
From: TAN Kuan Hui [mailto:email@example.com]
You are making the assumption that all possibilities occur with equal
probability. This is equivalent to white noise. If the probability
distribution of your data set is close to white noise, then we can
conclude that there is no redundancy in your encoded data. This
would be ideal.
A schema will constrain the data into a conforming set of data but
that does not mean that every possible combination is useful;
some permutation probably never occur and cannot factor in
your evaluation of the value of the information carried.
In real data, a small set of data would occur with significantly higher
probability than others; therefore you have to factor in the
"probability distribution profile" (pdf) to make your discussion meaningful.
Lets say, if you have an XML schema that constrain your data
into 1 of 10 possible combinations with each one occurring with equal
probability verus another with 100 possibilities but with 1 combination
occurring 99% of the time, which data feed then has greater information
The key to your discussion, in my humble opinion, is in
"information predictability" rather than "information uncertainty".