[
Lists Home |
Date Index |
Thread Index
]
Roger,
I assume that you mean information that is usable by people? If you
mean machine-oriented information then counting bytes will work. If you
mean people-oriented information, that is much harder to get a grip on.
For example, I think that for image content you are attempting the
impossible. Given three JPEGS of the same size, a JPEG of a Rothko
painting, a JPEG of a Vermeer landscape and a JPEG of an aerial
photograph of a city, the three arguably have different amounts of
people-oriented information imbedded in them. I have never heard anyone
suggest that Rothko's paintings sould be decomposed and looked at in
pieces, the Vermeer is arguably interesting to look at in pieces but you
always want the whole as well, and the main use of the aerial photo is
in looking at pieces of it (the whole photo may not be interesting at
all). The ability and desire to decompose suggests the amount of
identifiable, separable and technical information elements in each image
(I know Rothko provokes complex emotional responses, but for the sake of
this argument I am pretending that emtional responses are not information).
The image problem described above applies to all other types of
information, otherwise my son listening to a Rammstein MP3 is getting
the same quantity of information as a podcast on health of the same
length from the Australian Broadcasting Corporation (a result that I
have to reject both intellectually and emotionally).
To get a handle on information on the web more broadly, what you may
need to consider is consumption by types of consumers (information
available but unaccessed might not be so interesting) and changing
patterns of access/consumption (are people substituting audio for HTML
as podcasting becomes more important?). Are there different classes of
consumers? Perhaps the answer to the quantity issue is the bandwidth of
the consumers, perhaps people can only extract information at a fixed
rate from specific content types (image, HTML, audio, PDF), and
measuring time spent consuming, by content type, would give an
indication of the quantum, making those measurements over time would
give an indication of trend.
Greg
Costello, Roger L. wrote:
>Bryan asks an excellent question:
>
>What data should be collected to provide meaningful results?
>
>Bryan notes two kinds of data that could be collected:
>
>(1) File Count Data: count the number of files, i.e., count the number
>HTML files, count the number of MP3 files, count the number of MPEG
>files, and so forth.
>
>A problem to be resolved is: suppose that an HTML file contains, say,
>three GIF images. Do you count that as:
>
>1 for HTML
>2 for GIF
>
>Will file count yield the best data?
>
>(2) Byte Count Data: count the number of bytes of the information on
>the Web that is in HTML form, count the number of bytes of the
>information on the Web that is in MPEG form, and so forth.
>
>Would byte count yield more meaningful data than file count?
>
>Is there other data besides file count data and byte count data?
>
>If you were to design an experiment to determine the percentage of
>information per content type, what data would you measure? (I am not
>asking "how" to measure the data; I am asking "what" data you would
>measure)
>
>Any ideas? /Roger
>
>
>-----Original Message-----
>From: bryan rasmussen [mailto:rasmussen.bryan@gmail.com]
>Sent: Sunday, May 07, 2006 10:32 AM
>To: Costello, Roger L.
>Cc: xml-dev@lists.xml.org
>Subject: Re: [xml-dev] [Watchers of the Web] The evolving form of
>information on the Web?
>
>How exactly are you defining percentage, for example percentage of
>actual data size would probably be quite a bit differently than
>percentage of number of files.
>
>Cheers,
>Bryan Rasmussen
>
>On 5/7/06, Costello, Roger L. <costello@mitre.org> wrote:
>
>
>>Hi Folks,
>>
>>There are over 350 different content (MIME) types. Some common
>>
>>
>content
>
>
>>types include HTML, XML, GIF, JPG, JPEG, MP3, MPEG, RSS, SVG.
>>
>>Information exchanged on the Web is in the form of one of these
>>
>>
>content
>
>
>>types. (Sometimes an information exchange contains a collection of
>>items, each item with different content type.)
>>
>>I would like to know:
>>
>>Of all the information being exchanged on the Web:
>>
>>what percentage of the information is in the form of the HTML content
>>type, what percentage of the information is in the form of the XML
>>content type, what percentage of the information is in the form of
>>
>>
>the
>
>
>>GIF content type, what percentage of the information is in the form
>>
>>
>of
>
>
>>the MP3 content type, what percentage of the information is in the
>>
>>
>form
>
>
>>of the MPEG content type, what percentage of the information is in
>>
>>
>the
>
>
>>form of the JPG content type, and so forth, for all the content
>>
>>
>types.
>
>
>>I speculate that the percentages are something like this:
>>
>>Content type Percentage
>>---------------------------
>>HTML 90%
>>JPG 2%
>>JPEG 2%
>>GIF 2%
>>MP3 2%
>>XML 1%
>>...
>>
>>However, that's purely my guess. (What is your guess?)
>>
>>In addition, I am interested in seeing how the percentage is changing
>>over time - I am interested in seeing the evolving form of
>>
>>
>information
>
>
>>on the Web.
>>
>>Has anyone done such an investigation?
>>
>>/Roger
>>
>>-----------------------------------------------------------------
>>The xml-dev list is sponsored by XML.org <http://www.xml.org>, an
>>initiative of OASIS <http://www.oasis-open.org>
>>
>>The list archives are at http://lists.xml.org/archives/xml-dev/
>>
>>To subscribe or unsubscribe from this list use the subscription
>>manager: <http://www.oasis-open.org/mlmanage/index.php>
>>
>>
>>
>>
>
>-----------------------------------------------------------------
>The xml-dev list is sponsored by XML.org <http://www.xml.org>, an
>initiative of OASIS <http://www.oasis-open.org>
>
>The list archives are at http://lists.xml.org/archives/xml-dev/
>
>To subscribe or unsubscribe from this list use the subscription
>manager: <http://www.oasis-open.org/mlmanage/index.php>
>
>
>
>
|