OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

 


 

   RE: [xml-dev] [Watchers of the Web] The evolving form of information on

[ Lists Home | Date Index | Thread Index ]
  • To: <xml-dev@lists.xml.org>
  • Subject: RE: [xml-dev] [Watchers of the Web] The evolving form of information on the Web?
  • From: "Costello, Roger L." <costello@mitre.org>
  • Date: Mon, 8 May 2006 07:35:35 -0400
  • Thread-index: AcZx8h5GznihMSfkQMuBHm3B1CvspAAmM16A
  • Thread-topic: [xml-dev] [Watchers of the Web] The evolving form of information on the Web?

Hi Folks,

Excellent discussion! 

I would like to give an example that demonstrates the data that I think
is needed to make a meaningful statement about the form of information
on the Web today.

Example: Today cnn.com is running a news story about using quantum
science to determine the best way to score a goal in soccer. CNN allows
you to consume the news story in any of these forms:

- HTML
- audio (MP3)
- video (MPEG)
- RSS

Suppose that we monitor all the requests for that news story.  By the
end of the day, here are the numbers:

- 50 clients have consumed the news story in HTML form
- 20 clients have consumed the news story in audio (MP3) form
- 10 clients have consumed the news story in video (MPEG) form
- 20 clients have consumed the news story in RSS form

If these numbers represented a statistically significant sampling of
the Web, then we could state:

"On May 8, 2006 the information on the Web took this form:"

Content Type    Percentage
---------------------------
HTML            50%
MP3             20%
MPEG            10%
RSS             20%

Obviously this isn't a statistically significant sample, but I think
that it does show the data that is needed.

So, let me try to state the experiment more precisely:

Of all the documents/files that are exchanged on the Web over a 24 hour
period, 
- what percentage of them are in the form of HTML, 
- what percentage of them are in the form of XML, 
- what percentage of them are in the form of MP3, 
- what percentage of them are in the form of MPEG, 
- what percentage of them are in the form of RSS, 
- and so forth for all the content types.

Now let me introduce a slight complexity to the above example: suppose
that the HTML form of the above news story contains two GIF images. 

Since 50 clients consumed the HTML form, 100 GIF images were consumed.
If we just count documents/files then in the 24 hour period:

- 50 HTML documents were consumed
- 100 GIF files were consumed
- 20 MP3 files were consumed
- 10 MPEG files were consumed
- 20 RSS documents were consumed

Now the percentage is this:

Content Type    Percentage
---------------------------
HTML            25%
GIF             50%
MP3             10%
MPEG            5%
RSS             10%

Somehow this doesn't "feel" right.  The news story is being carried by
the HTML document, the GIF images were just add-on.  It doesn't seem
right to say:

"On May 8, 2006 50% of the information on the Web took the form of GIF
images."

Perhaps a content type that is used within another content type should
be weighted differently?  For example, perhaps we should give a
"dependent content type" a weight of 0.1 

Now the data looks like this:

- 50 HTML documents were consumed
- 10 GIF files were consumed (100 * 0.1 = 10)
- 20 MP3 files were consumed
- 10 MPEG files were consumed
- 20 RSS documents were consumed

And then the percentage is this:

Content Type    Percentage
---------------------------
HTML            45%
GIF             9%
MP3             18%
MPEG            9%
RSS             18%

This feels more correct to me.  What do you suggest?  /Roger




 

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS