[
Lists Home |
Date Index |
Thread Index
]
- To: "Costello, Roger L." <costello@mitre.org>,<xml-dev@lists.xml.org>
- Subject: RE: [xml-dev] [Watchers of the Web] The evolving form of information on the Web?
- From: "Bullard, Claude L \(Len\)" <len.bullard@intergraph.com>
- Date: Mon, 8 May 2006 09:24:23 -0500
- Thread-index: AcZx8h5GznihMSfkQMuBHm3B1CvspAAmM16AAAei/sA=
- Thread-topic: [xml-dev] [Watchers of the Web] The evolving form of information on the Web?
The data leaks across the transforms in that model. Is the dependent
relationship of information to format or of choice of format to author
at a point in time and space?
What is the coupling relationship of format to information?
len
From: Costello, Roger L. [mailto:costello@mitre.org]
Hi Folks,
Excellent discussion!
I would like to give an example that demonstrates the data that I think
is needed to make a meaningful statement about the form of information
on the Web today.
Example: Today cnn.com is running a news story about using quantum
science to determine the best way to score a goal in soccer. CNN allows
you to consume the news story in any of these forms:
- HTML
- audio (MP3)
- video (MPEG)
- RSS
Suppose that we monitor all the requests for that news story. By the
end of the day, here are the numbers:
- 50 clients have consumed the news story in HTML form
- 20 clients have consumed the news story in audio (MP3) form
- 10 clients have consumed the news story in video (MPEG) form
- 20 clients have consumed the news story in RSS form
If these numbers represented a statistically significant sampling of
the Web, then we could state:
"On May 8, 2006 the information on the Web took this form:"
Content Type Percentage
---------------------------
HTML 50%
MP3 20%
MPEG 10%
RSS 20%
Obviously this isn't a statistically significant sample, but I think
that it does show the data that is needed.
So, let me try to state the experiment more precisely:
Of all the documents/files that are exchanged on the Web over a 24 hour
period,
- what percentage of them are in the form of HTML,
- what percentage of them are in the form of XML,
- what percentage of them are in the form of MP3,
- what percentage of them are in the form of MPEG,
- what percentage of them are in the form of RSS,
- and so forth for all the content types.
Now let me introduce a slight complexity to the above example: suppose
that the HTML form of the above news story contains two GIF images.
Since 50 clients consumed the HTML form, 100 GIF images were consumed.
If we just count documents/files then in the 24 hour period:
- 50 HTML documents were consumed
- 100 GIF files were consumed
- 20 MP3 files were consumed
- 10 MPEG files were consumed
- 20 RSS documents were consumed
Now the percentage is this:
Content Type Percentage
---------------------------
HTML 25%
GIF 50%
MP3 10%
MPEG 5%
RSS 10%
Somehow this doesn't "feel" right. The news story is being carried by
the HTML document, the GIF images were just add-on. It doesn't seem
right to say:
"On May 8, 2006 50% of the information on the Web took the form of GIF
images."
Perhaps a content type that is used within another content type should
be weighted differently? For example, perhaps we should give a
"dependent content type" a weight of 0.1
Now the data looks like this:
- 50 HTML documents were consumed
- 10 GIF files were consumed (100 * 0.1 = 10)
- 20 MP3 files were consumed
- 10 MPEG files were consumed
- 20 RSS documents were consumed
And then the percentage is this:
Content Type Percentage
---------------------------
HTML 45%
GIF 9%
MP3 18%
MPEG 9%
RSS 18%
This feels more correct to me. What do you suggest? /Roger
-----------------------------------------------------------------
The xml-dev list is sponsored by XML.org <http://www.xml.org>, an
initiative of OASIS <http://www.oasis-open.org>
The list archives are at http://lists.xml.org/archives/xml-dev/
To subscribe or unsubscribe from this list use the subscription
manager: <http://www.oasis-open.org/mlmanage/index.php>
|