OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help



   Re: [xml-dev] [Shannon: information ~ uncertainty] Ramificationsto XML d

[ Lists Home | Date Index | Thread Index ]

Roger L. Costello wrote:
> I am trying to get an understanding of Claude Shannon's work on
> information theory. Below I describe one small part of Shannon's
> work.  I would like to hear your thoughts on its ramifications to
> information exchange using XML.
> Shannon defines information as follows:
> Information is proportional to uncertainty.  High uncertainty equates
>  to a high amount of information.  Low uncertainty equates to a low 
> amount of information.
> More specifically, Shannon talks about a set of possible data. A set
> comprised of 10 possible choices of data has less information than a
> set comprised of a hundred possible choices.

> 1. How does this aspect (information ~ uncertainty) of Shannon's work
> relate to data exchange using XML?  (I realize that this is a very
> broad question. Its intent is to stimulate discussion on the
> application of Shannon's information/uncertainty ideas to XML data
> exchange)
> 2. A schema is used to restrict the allowable forms that an instance 
> document may take.  So doesn't a schema reduce information?

I think you should be very cautious and thoughtful about trying to apply
Shannon to the sending of xml messages.  I think there are some tricky 
aspects that could make things non-obvious.  Some examples  -

1) A schema does not necessarily reduce the number of possible messages 
if it is possible to send schema-invalid messages over the channel. 
What conditions about restricting the sessage set need to be in place 
for Shannon's work to apply directly?

2) Under most schemas, there are an infinite number of possible messages 
(since most or all elements or attributes could hold content of 
indefinite length).  The usual measures of log N of log N/N aren't 
useful in this circumstance.

3) Shannon's work is usually thought of in terms of whether tokens get 
through the communications channel uncorrupted or not.  Is it 
technically correct to think of a single xml message as a token (I think 
not)?  If not, what if anything would play this role in an xml message?

I am not well-versed in this area, so I will step back and let others 
who are, do the talking.


Tom P

Thomas B. Passin
Explorer's Guide to the Semantic Web (Manning Books)


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS