xml-dev - RE: [xml-dev] [Watchers of the Web] The evolving form of informationon t

RE: [xml-dev] [Watchers of the Web] The evolving form of informationon t

[ Lists Home | Date Index | Thread Index ]

To: "Costello, Roger L." <costello@mitre.org>
Subject: RE: [xml-dev] [Watchers of the Web] The evolving form of informationon the Web?
From: Chris Gray <cpgray@library.uwaterloo.ca>
Date: Mon, 8 May 2006 10:16:48 -0400 (EDT)
Cc: xml-dev@lists.xml.org
In-reply-to: <B8415163A689094689542C617ECA0366B1D944@IMCSRV5.MITRE.ORG>
References: <B8415163A689094689542C617ECA0366B1D944@IMCSRV5.MITRE.ORG>

Roger,

Given the behavior you seem interested in (given alternate formats for the 
"same" info, what are user preferences), you might gain some of the 
insight you're looking for from the following article:

<http://www.emeraldinsight.com/Insight/ViewContentServlet?Filename=/published/emeraldfulltextarticle/pdf/2640290502.pdf>

It is a study of image vs. text queries rather than traffic.

Chris

On Mon, 8 May 2006, Costello, Roger L. wrote:

> Hi Folks,
>
> Excellent discussion!
>
> I would like to give an example that demonstrates the data that I think
> is needed to make a meaningful statement about the form of information
> on the Web today.
>
> Example: Today cnn.com is running a news story about using quantum
> science to determine the best way to score a goal in soccer. CNN allows
> you to consume the news story in any of these forms:
>
> - HTML
> - audio (MP3)
> - video (MPEG)
> - RSS
>
> Suppose that we monitor all the requests for that news story.  By the
> end of the day, here are the numbers:
>
> - 50 clients have consumed the news story in HTML form
> - 20 clients have consumed the news story in audio (MP3) form
> - 10 clients have consumed the news story in video (MPEG) form
> - 20 clients have consumed the news story in RSS form
>
> If these numbers represented a statistically significant sampling of
> the Web, then we could state:
>
> "On May 8, 2006 the information on the Web took this form:"
>
> Content Type    Percentage
> ---------------------------
> HTML            50%
> MP3             20%
> MPEG            10%
> RSS             20%
>
> Obviously this isn't a statistically significant sample, but I think
> that it does show the data that is needed.
>
> So, let me try to state the experiment more precisely:
>
> Of all the documents/files that are exchanged on the Web over a 24 hour
> period,
> - what percentage of them are in the form of HTML,
> - what percentage of them are in the form of XML,
> - what percentage of them are in the form of MP3,
> - what percentage of them are in the form of MPEG,
> - what percentage of them are in the form of RSS,
> - and so forth for all the content types.
>
> Now let me introduce a slight complexity to the above example: suppose
> that the HTML form of the above news story contains two GIF images.
>
> Since 50 clients consumed the HTML form, 100 GIF images were consumed.
> If we just count documents/files then in the 24 hour period:
>
> - 50 HTML documents were consumed
> - 100 GIF files were consumed
> - 20 MP3 files were consumed
> - 10 MPEG files were consumed
> - 20 RSS documents were consumed
>
> Now the percentage is this:
>
> Content Type    Percentage
> ---------------------------
> HTML            25%
> GIF             50%
> MP3             10%
> MPEG            5%
> RSS             10%
>
> Somehow this doesn't "feel" right.  The news story is being carried by
> the HTML document, the GIF images were just add-on.  It doesn't seem
> right to say:
>
> "On May 8, 2006 50% of the information on the Web took the form of GIF
> images."
>
> Perhaps a content type that is used within another content type should
> be weighted differently?  For example, perhaps we should give a
> "dependent content type" a weight of 0.1
>
> Now the data looks like this:
>
> - 50 HTML documents were consumed
> - 10 GIF files were consumed (100 * 0.1 = 10)
> - 20 MP3 files were consumed
> - 10 MPEG files were consumed
> - 20 RSS documents were consumed
>
> And then the percentage is this:
>
> Content Type    Percentage
> ---------------------------
> HTML            45%
> GIF             9%
> MP3             18%
> MPEG            9%
> RSS             18%
>
> This feels more correct to me.  What do you suggest?  /Roger
>
> -----------------------------------------------------------------
> The xml-dev list is sponsored by XML.org <http://www.xml.org>, an
> initiative of OASIS <http://www.oasis-open.org>
>
> The list archives are at http://lists.xml.org/archives/xml-dev/
>
> To subscribe or unsubscribe from this list use the subscription
> manager: <http://www.oasis-open.org/mlmanage/index.php>
>

References:
- RE: [xml-dev] [Watchers of the Web] The evolving form of information on the Web?
  - From: "Costello, Roger L." <costello@mitre.org>

Prev by Date: What Is Extensibility? (WAS RE: [xml-dev] overheads of pipelines)
Next by Date: RE: [xml-dev] [Watchers of the Web] The evolving form of information on the Web?
Previous by thread: RE: [xml-dev] [Watchers of the Web] The evolving form of information on the Web?
Next by thread: RE: [xml-dev] [Watchers of the Web] The evolving form of information on the Web?
Index(es):
- Date
- Thread