[
Lists Home |
Date Index |
Thread Index
]
Roger,
Given the behavior you seem interested in (given alternate formats for the
"same" info, what are user preferences), you might gain some of the
insight you're looking for from the following article:
<http://www.emeraldinsight.com/Insight/ViewContentServlet?Filename=/published/emeraldfulltextarticle/pdf/2640290502.pdf>
It is a study of image vs. text queries rather than traffic.
Chris
On Mon, 8 May 2006, Costello, Roger L. wrote:
> Hi Folks,
>
> Excellent discussion!
>
> I would like to give an example that demonstrates the data that I think
> is needed to make a meaningful statement about the form of information
> on the Web today.
>
> Example: Today cnn.com is running a news story about using quantum
> science to determine the best way to score a goal in soccer. CNN allows
> you to consume the news story in any of these forms:
>
> - HTML
> - audio (MP3)
> - video (MPEG)
> - RSS
>
> Suppose that we monitor all the requests for that news story. By the
> end of the day, here are the numbers:
>
> - 50 clients have consumed the news story in HTML form
> - 20 clients have consumed the news story in audio (MP3) form
> - 10 clients have consumed the news story in video (MPEG) form
> - 20 clients have consumed the news story in RSS form
>
> If these numbers represented a statistically significant sampling of
> the Web, then we could state:
>
> "On May 8, 2006 the information on the Web took this form:"
>
> Content Type Percentage
> ---------------------------
> HTML 50%
> MP3 20%
> MPEG 10%
> RSS 20%
>
> Obviously this isn't a statistically significant sample, but I think
> that it does show the data that is needed.
>
> So, let me try to state the experiment more precisely:
>
> Of all the documents/files that are exchanged on the Web over a 24 hour
> period,
> - what percentage of them are in the form of HTML,
> - what percentage of them are in the form of XML,
> - what percentage of them are in the form of MP3,
> - what percentage of them are in the form of MPEG,
> - what percentage of them are in the form of RSS,
> - and so forth for all the content types.
>
> Now let me introduce a slight complexity to the above example: suppose
> that the HTML form of the above news story contains two GIF images.
>
> Since 50 clients consumed the HTML form, 100 GIF images were consumed.
> If we just count documents/files then in the 24 hour period:
>
> - 50 HTML documents were consumed
> - 100 GIF files were consumed
> - 20 MP3 files were consumed
> - 10 MPEG files were consumed
> - 20 RSS documents were consumed
>
> Now the percentage is this:
>
> Content Type Percentage
> ---------------------------
> HTML 25%
> GIF 50%
> MP3 10%
> MPEG 5%
> RSS 10%
>
> Somehow this doesn't "feel" right. The news story is being carried by
> the HTML document, the GIF images were just add-on. It doesn't seem
> right to say:
>
> "On May 8, 2006 50% of the information on the Web took the form of GIF
> images."
>
> Perhaps a content type that is used within another content type should
> be weighted differently? For example, perhaps we should give a
> "dependent content type" a weight of 0.1
>
> Now the data looks like this:
>
> - 50 HTML documents were consumed
> - 10 GIF files were consumed (100 * 0.1 = 10)
> - 20 MP3 files were consumed
> - 10 MPEG files were consumed
> - 20 RSS documents were consumed
>
> And then the percentage is this:
>
> Content Type Percentage
> ---------------------------
> HTML 45%
> GIF 9%
> MP3 18%
> MPEG 9%
> RSS 18%
>
> This feels more correct to me. What do you suggest? /Roger
>
> -----------------------------------------------------------------
> The xml-dev list is sponsored by XML.org <http://www.xml.org>, an
> initiative of OASIS <http://www.oasis-open.org>
>
> The list archives are at http://lists.xml.org/archives/xml-dev/
>
> To subscribe or unsubscribe from this list use the subscription
> manager: <http://www.oasis-open.org/mlmanage/index.php>
>
|