Lists Home |
Date Index |
Roger L. Costello wrote:
> I am trying to get an understanding of Claude Shannon's work on
> information theory. Below I describe one small part of Shannon's
> work. I would like to hear your thoughts on its ramifications to
> information exchange using XML.
> Shannon defines information as follows:
> Information is proportional to uncertainty. High uncertainty equates
> to a high amount of information. Low uncertainty equates to a low
> amount of information.
> More specifically, Shannon talks about a set of possible data. A set
> comprised of 10 possible choices of data has less information than a
> set comprised of a hundred possible choices.
> ... QUESTIONS
> 1. How does this aspect (information ~ uncertainty) of Shannon's work
> relate to data exchange using XML? (I realize that this is a very
> broad question. Its intent is to stimulate discussion on the
> application of Shannon's information/uncertainty ideas to XML data
> 2. A schema is used to restrict the allowable forms that an instance
> document may take. So doesn't a schema reduce information?
I think you should be very cautious and thoughtful about trying to apply
Shannon to the sending of xml messages. I think there are some tricky
aspects that could make things non-obvious. Some examples -
1) A schema does not necessarily reduce the number of possible messages
if it is possible to send schema-invalid messages over the channel.
What conditions about restricting the sessage set need to be in place
for Shannon's work to apply directly?
2) Under most schemas, there are an infinite number of possible messages
(since most or all elements or attributes could hold content of
indefinite length). The usual measures of log N of log N/N aren't
useful in this circumstance.
3) Shannon's work is usually thought of in terms of whether tokens get
through the communications channel uncorrupted or not. Is it
technically correct to think of a single xml message as a token (I think
not)? If not, what if anything would play this role in an xml message?
I am not well-versed in this area, so I will step back and let others
who are, do the talking.
Thomas B. Passin
Explorer's Guide to the Semantic Web (Manning Books)