[
Lists Home |
Date Index |
Thread Index
]
Michael Champion wrote:
> OK, I'll defend it -- Tell me which of these you disagree with :-)
>
> ◦ 2.1 People lie
> ◦ 2.2 People are lazy
> ◦ 2.3 People are stupid
> ◦ 2.4 Mission: Impossible -- know thyself
> ◦ 2.5 Schemas aren't neutral
> ◦ 2.6 Metrics influence results
> ◦ 2.7 There's more than one way to describe something
I can ask just as easily ask you which of those are not relevant to
Google or any other statistical approach (answer: none).
> What is there to disagree with here?
The title. But specifically - "reliable".
About all you can say about statistically accrued metadata is that
it's inherently more statistical - beyond that you have to get into
specifics.
> I personally find this the least
> plausible part of the semantic web vision -- I won't even begin to
> believe it until it has survived the onslaughts of the meta-spammers and
> the semantic-bombers who will go after the semantic web the way they've
> gone after data in meta tags and the links that Google harvests.
> Dare said something the other day about having second thoughts about
> Doctorow's argument because RSS feeds are an existence proof that useful
> metadata is practical.
One can't completely buy into either of those positions without
being ignorant about a number of AI and ex-AI technologies. They're
both wrong because they're both polarized in a Jerry Springer kind
of way.
> I'm not sure which of the straw men that
> demolishes -- I'd agree that people are less likely to lie or act lazy
> and stupid when they know that people(like the boss, or colleagues, or
> potential employers) are watching. And anyway, RSS *is* mostly
> observational metadata extracted from an article or post, or at least
> generated from the same inputs used to generate the content it syndicates.
If you look at RSS data for long enough you realize it's information
rich and that information is being produced almost totaly as
side-effect of blogging. Then maybe you go and read Metacrap, but
with your eyes opened.
Here's the RDF triples taken from an Atom feed with two entries, and
excluding the content and the summary.
1
http://www.dehora.net/journal/
http://www.w3.org/1999/02/22-rdf-syntax-ns#type
http://purl.org/atom/ns#feed
2
http://www.dehora.net/journal/
http://purl.org/atom/ns#title
"Bill de h?????ra"
3
http://www.dehora.net/journal/
http://purl.org/atom/ns#rel
"alternate"
4
http://www.dehora.net/journal/
http://purl.org/atom/ns#type
"text/html"
5
http://www.dehora.net/journal/
http://purl.org/atom/ns#href
"http://www.dehora.net/journal/"
6
http://www.dehora.net/journal/
http://purl.org/atom/ns#link
http://www.dehora.net/journal/
7
http://www.dehora.net/journal/
http://purl.org/atom/ns#modified
"2004-05-23T01:45:36Z"
8
http://www.dehora.net/journal/
http://purl.org/atom/ns#tagline
"FD85 1117 1888 1681 7689 B5DF E696 885C 20D8 21F8"
9
http://www.dehora.net/journal/
http://purl.org/atom/ns#id
tag:www.dehora.net,2004:/journal?id
10
http://www.dehora.net/journal/
http://purl.org/atom/ns#generator
"Movable Type"
11
http://www.dehora.net/journal/
http://purl.org/atom/ns#copyright
"Copyright (c) 2004, dehora"
12
http://www.dehora.net/journal/2004/05/thus_sprach_metadata.html?id
http://www.w3.org/1999/02/22-rdf-syntax-ns#type
http://purl.org/atom/ns#entry
13
http://www.dehora.net/journal/2004/05/thus_sprach_metadata.html?id
http://purl.org/atom/ns#feed
http://www.dehora.net/journal/
14
http://www.dehora.net/journal/2004/05/thus_sprach_metadata.html?id
http://purl.org/atom/ns#title
"Thus sprach metadata"
15
http://www.dehora.net/journal/2004/05/thus_sprach_metadata.html
http://purl.org/atom/ns#rel
"alternate"
16
http://www.dehora.net/journal/2004/05/thus_sprach_metadata.html
http://purl.org/atom/ns#type
"text/html"
17
http://www.dehora.net/journal/2004/05/thus_sprach_metadata.html
http://purl.org/atom/ns#href
"http://www.dehora.net/journal/2004/05/thus_sprach_metadata.html"
18
http://www.dehora.net/journal/2004/05/thus_sprach_metadata.html?id
http://purl.org/atom/ns#link
http://www.dehora.net/journal/2004/05/thus_sprach_metadata.html
19
http://www.dehora.net/journal/2004/05/thus_sprach_metadata.html?id
http://purl.org/atom/ns#modified
"2004-05-23T01:45:36Z"
20
http://www.dehora.net/journal/2004/05/thus_sprach_metadata.html?id
http://purl.org/atom/ns#issued
"2004-05-23T01:45:36+00:00"
21
http://www.dehora.net/journal/2004/05/thus_sprach_metadata.html?id
http://purl.org/atom/ns#id
http://www.dehora.net/journal/2004/05/thus_sprach_metadata.html?id
22
http://www.dehora.net/journal/2004/05/thus_sprach_metadata.html?id
http://purl.org/atom/ns#created
"2004-05-23T01:45:36Z"
23
http://www.dehora.net/journal/2004/05/thus_sprach_metadata.html?id
http://purl.org/atom/ns#summary
"Seairth Jacobs"
24
mailto:bill@dehora.net
http://purl.org/atom/ns#name
"dehora"
25
mailto:bill@dehora.net
http://purl.org/atom/ns#url
http://www.dehora.net/journal
26
mailto:bill@dehora.net
http://purl.org/atom/ns#email
"bill@dehora.net"
27
http://www.dehora.net/journal/2004/05/thus_sprach_metadata.html?id
http://purl.org/atom/ns#author mailto:bill@dehora.net
28
http://www.dehora.net/journal/2004/05/thus_sprach_metadata.html?id
http://purl.org/dc/elements/1.1/subject
"SemanticWeb"
29
genid:ARP58526
http://purl.org/atom/ns#type
"text/html"
30
genid:ARP58526
http://purl.org/atom/ns#mode
"escaped"
31
http://www.dehora.net/journal/2004/05/thus_sprach_metadata.html?id
http://purl.org/atom/ns#content
genid:ARP58526
32
http://www.dehora.net/journal/2004/05/mt3_are_you_not_entertained.html?id
http://www.w3.org/1999/02/22-rdf-syntax-ns#type
http://purl.org/atom/ns#entry
33
http://www.dehora.net/journal/2004/05/mt3_are_you_not_entertained.html?id
http://purl.org/atom/ns#feed
http://www.dehora.net/journal/
34
http://www.dehora.net/journal/2004/05/mt3_are_you_not_entertained.html?id
http://purl.org/atom/ns#title
"MT3: are you not entertained?"
35
http://www.dehora.net/journal/2004/05/mt3_are_you_not_entertained.html
http://purl.org/atom/ns#rel
"alternate"
36
http://www.dehora.net/journal/2004/05/mt3_are_you_not_entertained.html
http://purl.org/atom/ns#type
"text/html"
37
http://www.dehora.net/journal/2004/05/mt3_are_you_not_entertained.html
http://purl.org/atom/ns#href
"http://www.dehora.net/journal/2004/05/mt3_are_you_not_entertained.html"
38
http://www.dehora.net/journal/2004/05/mt3_are_you_not_entertained.html?id
http://purl.org/atom/ns#link
http://www.dehora.net/journal/2004/05/mt3_are_you_not_entertained.html
39
http://www.dehora.net/journal/2004/05/mt3_are_you_not_entertained.html?id
http://purl.org/atom/ns#modified
"2004-05-21T20:57:11Z"
40
http://www.dehora.net/journal/2004/05/mt3_are_you_not_entertained.html?id
http://purl.org/atom/ns#issued
"2004-05-21T20:57:11+00:00"
41
http://www.dehora.net/journal/2004/05/mt3_are_you_not_entertained.html?id
http://purl.org/atom/ns#id
http://www.dehora.net/journal/2004/05/mt3_are_you_not_entertained.html?id
42
http://www.dehora.net/journal/2004/05/mt3_are_you_not_entertained.html?id
http://purl.org/atom/ns#created
"2004-05-21T20:57:11Z"
43
http://www.dehora.net/journal/2004/05/mt3_are_you_not_entertained.html?id
http://purl.org/atom/ns#summary
"foo"
44
mailto:bill@dehora.net
http://purl.org/atom/ns#name
"dehora"
45
mailto:bill@dehora.net
http://purl.org/atom/ns#url
http://www.dehora.net/journal
46
mailto:bill@dehora.net
http://purl.org/atom/ns#email
"bill@dehora.net"
47
http://www.dehora.net/journal/2004/05/mt3_are_you_not_entertained.html?id
http://purl.org/atom/ns#author
mailto:bill@dehora.net
48
http://www.dehora.net/journal/2004/05/mt3_are_you_not_entertained.html?
http://purl.org/dc/elements/1.1/subject
"CuzIDontLikeToDreamAboutGettinPaid"
49
genid:ARP58533
http://purl.org/atom/ns#type
"text/html"
50
genid:ARP58533
http://purl.org/atom/ns#mode
"escaped"
51
http://www.dehora.net/journal/2004/05/mt3_are_you_not_entertained.html?id
http://purl.org/atom/ns#content
genid:ARP58533
I agree it ain't pretty, but it is perfectly good machine
processable metadata graph of 3-tuples (at its most basic,
property-value pairs bound to a named entity). Forget RDF, you can
usefully run SQL or grep piplines against this stuff; a small script
will do the job most of the time. But as you build these datasets
over time you get to do all kinds of things (such as type inference
and joins across arbitary XML vocabularies and database schemas) if
you don't mind using RDF aware tools.
Moreover an RDF engine couldn't care less that the metadata was
sourced from an RSS feed. I can append the windows event log or
/var/log/messages to that lot without changing my scripts or RDF
queries.
Yes, you may not have the schema to hand for the URIs if you want to
do more heavy lifting (but this problem is not restricted to RDF).
As a down to earth example I tend more and more to generate logs
designed to be loaded up as RDF triples. This is extrememly useful
for systems management, server operations and message tracking or
anything which doesn't (and shouldn't, and simply can't) care about
the details of a plethora application suites, grammars, log formats,
protocols, server toplogies, data-centers and so on, but do have
to care about finding out what's the heck is going on. And no, you
can't do this with XML+Namespaces+HTTP, not to the same extent and
at the same cost.
> On the other hand, Doctorow's "screed" does call into question the
> WinFS vision, or am I missing something here? To what extent does WinFS
> not presuppose honest, energetic, intelligent, and self-aware humans to
> create the metadata it will manage and query?
Not much. Most of it is flying about and never captured; you just
have to know how to grab it. Huge amounts of useful metadata can be
captured without ever asking users to anything extra - witness the
RSS above; but the residue produced by your blogging activity is
going to be a fraction of the residue produced by your operating
system activity.
cheers
Bill
--
Propylon
http://www.propylon.com
|