xml-dev - Re: [xml-dev] [Watchers of the Web] The evolving form of informationon t

Re: [xml-dev] [Watchers of the Web] The evolving form of informationon t

[ Lists Home | Date Index | Thread Index ]

To: "Costello, Roger L." <costello@mitre.org>
Subject: Re: [xml-dev] [Watchers of the Web] The evolving form of informationon the Web?
From: Greg Hunt <greg@firmansyah.com>
Date: Mon, 08 May 2006 09:13:16 +1000
Cc: xml-dev@lists.xml.org
In-reply-to: <B8415163A689094689542C617ECA0366B1D91A@IMCSRV5.MITRE.ORG>
References: <B8415163A689094689542C617ECA0366B1D91A@IMCSRV5.MITRE.ORG>
User-agent: Mozilla Thunderbird 1.0.6 (Windows/20050716)

Roger,
I assume that you mean information that is usable by people?  If you 
mean machine-oriented information then counting bytes will work.  If you 
mean people-oriented information, that is much harder to get a grip on. 

For example, I think that for image content you are attempting the 
impossible.  Given three JPEGS of the same size, a JPEG of a Rothko 
painting, a JPEG of a Vermeer landscape and a JPEG of an aerial 
photograph of a city, the three arguably have different amounts of 
people-oriented information imbedded in them.  I have never heard anyone 
suggest that Rothko's paintings sould be decomposed and looked at in 
pieces, the Vermeer is arguably interesting to look at in pieces but you 
always want the whole as well, and the main use of the aerial photo is 
in looking at pieces of it (the whole photo may not be interesting at 
all).  The ability and desire to decompose suggests the amount of 
identifiable, separable and technical information elements in each image 
(I know Rothko provokes complex emotional responses, but for the sake of 
this argument I am pretending that emtional responses are not information).

The image problem described above applies to all other types of 
information, otherwise my son listening to a Rammstein MP3 is getting 
the same quantity of information as a podcast on health of the same 
length from the Australian Broadcasting Corporation (a result that I 
have to reject both intellectually and emotionally).

To get a handle on information on the web more broadly, what you may 
need to consider is consumption by types of consumers (information 
available but unaccessed might not be so interesting) and changing 
patterns of access/consumption (are people substituting audio for HTML 
as podcasting becomes more important?). Are there different classes of 
consumers?  Perhaps the answer to the quantity issue is the bandwidth of 
the consumers, perhaps people can only extract information at a fixed 
rate from specific content types (image, HTML, audio, PDF), and 
measuring time spent consuming, by content type, would give an 
indication of the quantum, making those measurements over time would 
give an indication of trend. 

Greg

Costello, Roger L. wrote:

>Bryan asks an excellent question: 
>
>What data should be collected to provide meaningful results?
>
>Bryan notes two kinds of data that could be collected:
>
>(1) File Count Data: count the number of files, i.e., count the number
>HTML files, count the number of MP3 files, count the number of MPEG
>files, and so forth.
>
>A problem to be resolved is: suppose that an HTML file contains, say,
>three GIF images.  Do you count that as: 
>
>1 for HTML
>2 for GIF
>
>Will file count yield the best data?
>
>(2) Byte Count Data: count the number of bytes of the information on
>the Web that is in HTML form, count the number of bytes of the
>information on the Web that is in MPEG form, and so forth.
>
>Would byte count yield more meaningful data than file count?
>
>Is there other data besides file count data and byte count data?
>
>If you were to design an experiment to determine the percentage of
>information per content type, what data would you measure?  (I am not
>asking "how" to measure the data; I am asking "what" data you would
>measure)
>
>Any ideas?  /Roger
> 
>
>-----Original Message-----
>From: bryan rasmussen [mailto:rasmussen.bryan@gmail.com] 
>Sent: Sunday, May 07, 2006 10:32 AM
>To: Costello, Roger L.
>Cc: xml-dev@lists.xml.org
>Subject: Re: [xml-dev] [Watchers of the Web] The evolving form of
>information on the Web?
>
>How exactly are you defining percentage, for example percentage of
>actual data size would probably be quite a bit differently than
>percentage of number of files.
>
>Cheers,
>Bryan Rasmussen
>
>On 5/7/06, Costello, Roger L. <costello@mitre.org> wrote:
>  
>
>>Hi Folks,
>>
>>There are over 350 different content (MIME) types.  Some common
>>    
>>
>content
>  
>
>>types include HTML, XML, GIF, JPG, JPEG, MP3, MPEG, RSS, SVG.
>>
>>Information exchanged on the Web is in the form of one of these
>>    
>>
>content
>  
>
>>types.  (Sometimes an information exchange contains a collection of
>>items, each item with different content type.)
>>
>>I would like to know:
>>
>>Of all the information being exchanged on the Web:
>>
>>what percentage of the information is in the form of the HTML content
>>type, what percentage of the information is in the form of the XML
>>content type, what percentage of the information is in the form of
>>    
>>
>the
>  
>
>>GIF content type, what percentage of the information is in the form
>>    
>>
>of
>  
>
>>the MP3 content type, what percentage of the information is in the
>>    
>>
>form
>  
>
>>of the MPEG content type, what percentage of the information is in
>>    
>>
>the
>  
>
>>form of the JPG content type, and so forth, for all the content
>>    
>>
>types.
>  
>
>>I speculate that the percentages are something like this:
>>
>>Content type   Percentage
>>---------------------------
>>HTML           90%
>>JPG             2%
>>JPEG            2%
>>GIF             2%
>>MP3             2%
>>XML             1%
>>...
>>
>>However, that's purely my guess.  (What is your guess?)
>>
>>In addition, I am interested in seeing how the percentage is changing
>>over time - I am interested in seeing the evolving form of
>>    
>>
>information
>  
>
>>on the Web.
>>
>>Has anyone done such an investigation?
>>
>>/Roger
>>
>>-----------------------------------------------------------------
>>The xml-dev list is sponsored by XML.org <http://www.xml.org>, an
>>initiative of OASIS <http://www.oasis-open.org>
>>
>>The list archives are at http://lists.xml.org/archives/xml-dev/
>>
>>To subscribe or unsubscribe from this list use the subscription
>>manager: <http://www.oasis-open.org/mlmanage/index.php>
>>
>>
>>    
>>
>
>-----------------------------------------------------------------
>The xml-dev list is sponsored by XML.org <http://www.xml.org>, an
>initiative of OASIS <http://www.oasis-open.org>
>
>The list archives are at http://lists.xml.org/archives/xml-dev/
>
>To subscribe or unsubscribe from this list use the subscription
>manager: <http://www.oasis-open.org/mlmanage/index.php>
>
>
>  
>

References:
- RE: [xml-dev] [Watchers of the Web] The evolving form of information on the Web?
  - From: "Costello, Roger L." <costello@mitre.org>

Prev by Date: Re: [xml-dev] [Watchers of the Web] The evolving form of informationon the Web?
Next by Date: Re: Fwd: [xml-dev] Java NVDL implementation
Previous by thread: RE: [xml-dev] [Watchers of the Web] The evolving form ofinformation on the Web?
Next by thread: RE: [xml-dev] [Watchers of the Web] The evolving form of information on the Web?
Index(es):
- Date
- Thread