OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

 


 

   Re: [xml-dev] Meta-somethingorother (was the semantic web mega-permathre

[ Lists Home | Date Index | Thread Index ]

Michael Champion wrote:


> OK, I'll defend it -- Tell me which of these you disagree with :-)
> 
>     ◦     2.1 People lie
>     ◦     2.2 People are lazy
>     ◦     2.3 People are stupid
>     ◦     2.4 Mission: Impossible -- know thyself
>     ◦     2.5 Schemas aren't neutral
>     ◦     2.6 Metrics influence results
>     ◦     2.7 There's more than one way to describe something


I can ask just as easily ask you which of those are not relevant to 
Google or any other statistical approach (answer: none).


> What is there to disagree with here?  

The title. But specifically - "reliable".

About all you can say about statistically accrued metadata is that 
it's inherently more statistical - beyond that you have to get into 
specifics.


> I personally find this the least 
> plausible part of the semantic web vision -- I won't even begin to 
> believe it until it has survived the onslaughts of the meta-spammers and 
> the semantic-bombers who will go after the semantic web the way they've 
> gone after data in meta tags and the links that Google harvests.
> Dare said something the other day about having second thoughts about 
> Doctorow's argument because RSS feeds are an existence proof that useful 
> metadata is practical.  

One can't completely buy into either of those positions without 
being ignorant about a number of AI and ex-AI technologies. They're 
both wrong because they're both polarized in a Jerry Springer kind 
of way.


> I'm not sure which of the straw men that 
> demolishes -- I'd agree that people are less likely to lie or act lazy 
> and stupid when they know that people(like the boss, or colleagues, or 
> potential employers) are watching.     And anyway, RSS *is* mostly  
> observational metadata extracted from an article or post, or at least 
> generated from the same inputs used to generate the content it syndicates.

If you look at RSS data for long enough you realize it's information 
rich and that information is being produced almost totaly as 
side-effect of blogging. Then maybe you go and read Metacrap, but 
with your eyes opened.

Here's the RDF triples taken from an Atom feed with two entries, and 
excluding the content and the summary.

1 	
http://www.dehora.net/journal/ 
http://www.w3.org/1999/02/22-rdf-syntax-ns#type 
http://purl.org/atom/ns#feed
2 	
http://www.dehora.net/journal/ 	
http://purl.org/atom/ns#title 	
"Bill de h?????ra"
3 	
http://www.dehora.net/journal/ 	
http://purl.org/atom/ns#rel 	
"alternate"
4 	
http://www.dehora.net/journal/ 	
http://purl.org/atom/ns#type 	
"text/html"
5 	
http://www.dehora.net/journal/ 	
http://purl.org/atom/ns#href 	
"http://www.dehora.net/journal/";
6 	
http://www.dehora.net/journal/ 	
http://purl.org/atom/ns#link 	
http://www.dehora.net/journal/
7 	
http://www.dehora.net/journal/ 	
http://purl.org/atom/ns#modified 	
"2004-05-23T01:45:36Z"
8 	
http://www.dehora.net/journal/ 	
http://purl.org/atom/ns#tagline 	
"FD85 1117 1888 1681 7689 B5DF E696 885C 20D8 21F8"
9 	
http://www.dehora.net/journal/ 	
http://purl.org/atom/ns#id 	
tag:www.dehora.net,2004:/journal?id
10 	
http://www.dehora.net/journal/ 	
http://purl.org/atom/ns#generator 	
"Movable Type"
11 	
http://www.dehora.net/journal/ 	
http://purl.org/atom/ns#copyright 	
"Copyright (c) 2004, dehora"
12 
http://www.dehora.net/journal/2004/05/thus_sprach_metadata.html?id 
http://www.w3.org/1999/02/22-rdf-syntax-ns#type 
http://purl.org/atom/ns#entry
13 
http://www.dehora.net/journal/2004/05/thus_sprach_metadata.html?id 
http://purl.org/atom/ns#feed 	
http://www.dehora.net/journal/
14 
http://www.dehora.net/journal/2004/05/thus_sprach_metadata.html?id 
http://purl.org/atom/ns#title 	
"Thus sprach metadata"
15 
http://www.dehora.net/journal/2004/05/thus_sprach_metadata.html 
http://purl.org/atom/ns#rel 	
"alternate"
16 
http://www.dehora.net/journal/2004/05/thus_sprach_metadata.html 
http://purl.org/atom/ns#type 	
"text/html"
17 
http://www.dehora.net/journal/2004/05/thus_sprach_metadata.html 
http://purl.org/atom/ns#href 
"http://www.dehora.net/journal/2004/05/thus_sprach_metadata.html";
18 
http://www.dehora.net/journal/2004/05/thus_sprach_metadata.html?id 
http://purl.org/atom/ns#link 
http://www.dehora.net/journal/2004/05/thus_sprach_metadata.html
19 
http://www.dehora.net/journal/2004/05/thus_sprach_metadata.html?id 
http://purl.org/atom/ns#modified 	
"2004-05-23T01:45:36Z"
20 
http://www.dehora.net/journal/2004/05/thus_sprach_metadata.html?id 
http://purl.org/atom/ns#issued 	
"2004-05-23T01:45:36+00:00"
21 
http://www.dehora.net/journal/2004/05/thus_sprach_metadata.html?id 
http://purl.org/atom/ns#id 
http://www.dehora.net/journal/2004/05/thus_sprach_metadata.html?id
22 
http://www.dehora.net/journal/2004/05/thus_sprach_metadata.html?id 
http://purl.org/atom/ns#created 	
"2004-05-23T01:45:36Z"
23 
http://www.dehora.net/journal/2004/05/thus_sprach_metadata.html?id 
http://purl.org/atom/ns#summary 	
"Seairth Jacobs"
24 	
mailto:bill@dehora.net 	
http://purl.org/atom/ns#name 	
"dehora"
25 	
mailto:bill@dehora.net 	
http://purl.org/atom/ns#url 	
http://www.dehora.net/journal
26 	
mailto:bill@dehora.net 	
http://purl.org/atom/ns#email 	
"bill@dehora.net"
27 
http://www.dehora.net/journal/2004/05/thus_sprach_metadata.html?id 
http://purl.org/atom/ns#author 	mailto:bill@dehora.net
28 
http://www.dehora.net/journal/2004/05/thus_sprach_metadata.html?id 
http://purl.org/dc/elements/1.1/subject 	
"SemanticWeb"
29 	
genid:ARP58526 	
http://purl.org/atom/ns#type 	
"text/html"
30 	
genid:ARP58526 	
http://purl.org/atom/ns#mode 	
"escaped"
31 
http://www.dehora.net/journal/2004/05/thus_sprach_metadata.html?id 
http://purl.org/atom/ns#content 	
genid:ARP58526
32 
http://www.dehora.net/journal/2004/05/mt3_are_you_not_entertained.html?id 
http://www.w3.org/1999/02/22-rdf-syntax-ns#type 
http://purl.org/atom/ns#entry
33 
http://www.dehora.net/journal/2004/05/mt3_are_you_not_entertained.html?id 

http://purl.org/atom/ns#feed 	
http://www.dehora.net/journal/
34 
http://www.dehora.net/journal/2004/05/mt3_are_you_not_entertained.html?id 

http://purl.org/atom/ns#title 	
"MT3: are you not entertained?"
35 
http://www.dehora.net/journal/2004/05/mt3_are_you_not_entertained.html 
http://purl.org/atom/ns#rel 	
"alternate"
36 
http://www.dehora.net/journal/2004/05/mt3_are_you_not_entertained.html 
http://purl.org/atom/ns#type 	
"text/html"
37 
http://www.dehora.net/journal/2004/05/mt3_are_you_not_entertained.html
http://purl.org/atom/ns#href 
"http://www.dehora.net/journal/2004/05/mt3_are_you_not_entertained.html";
38 
http://www.dehora.net/journal/2004/05/mt3_are_you_not_entertained.html?id
http://purl.org/atom/ns#link 
http://www.dehora.net/journal/2004/05/mt3_are_you_not_entertained.html
39 
http://www.dehora.net/journal/2004/05/mt3_are_you_not_entertained.html?id 
http://purl.org/atom/ns#modified 	
"2004-05-21T20:57:11Z"
40 
http://www.dehora.net/journal/2004/05/mt3_are_you_not_entertained.html?id 
http://purl.org/atom/ns#issued 	
"2004-05-21T20:57:11+00:00"
41 
http://www.dehora.net/journal/2004/05/mt3_are_you_not_entertained.html?id
http://purl.org/atom/ns#id 
http://www.dehora.net/journal/2004/05/mt3_are_you_not_entertained.html?id
42 
http://www.dehora.net/journal/2004/05/mt3_are_you_not_entertained.html?id
http://purl.org/atom/ns#created 	
"2004-05-21T20:57:11Z"
43 
http://www.dehora.net/journal/2004/05/mt3_are_you_not_entertained.html?id
http://purl.org/atom/ns#summary 	
"foo"
44 	
mailto:bill@dehora.net 	
http://purl.org/atom/ns#name 	
"dehora"
45 	
mailto:bill@dehora.net 	
http://purl.org/atom/ns#url 	
http://www.dehora.net/journal
46 	
mailto:bill@dehora.net 	
http://purl.org/atom/ns#email 	
"bill@dehora.net"
47 
http://www.dehora.net/journal/2004/05/mt3_are_you_not_entertained.html?id
http://purl.org/atom/ns#author 	
mailto:bill@dehora.net
48 
http://www.dehora.net/journal/2004/05/mt3_are_you_not_entertained.html?
http://purl.org/dc/elements/1.1/subject 
"CuzIDontLikeToDreamAboutGettinPaid"
49 	
genid:ARP58533 	
http://purl.org/atom/ns#type 	
"text/html"
50 	
genid:ARP58533 	
http://purl.org/atom/ns#mode 	
"escaped"
51 
http://www.dehora.net/journal/2004/05/mt3_are_you_not_entertained.html?id
http://purl.org/atom/ns#content 	
genid:ARP58533


I agree it ain't pretty, but it is perfectly good machine 
processable metadata graph of 3-tuples (at its most basic, 
property-value pairs bound to a named entity). Forget RDF, you can 
usefully run SQL or grep piplines against this stuff; a small script 
will do the job most of the time. But as you build these datasets 
over time you get to do all kinds of things (such as type inference 
and joins across arbitary XML vocabularies and database schemas) if 
you don't mind using RDF aware tools.

Moreover an RDF engine couldn't care less that the metadata was 
sourced from an RSS feed. I can append the windows event log or 
/var/log/messages to that lot without changing my scripts or RDF 
queries.

Yes, you may not have the schema to hand for the URIs if you want to 
do more heavy lifting (but this problem is not restricted to RDF).


As a down to earth example I tend more and more to generate logs 
designed to be loaded up as RDF triples. This is extrememly  useful 
for systems management, server operations and message tracking or 
anything which doesn't (and shouldn't, and simply can't) care about 
the details of a plethora application suites, grammars, log formats, 
  protocols, server toplogies, data-centers and so on, but do have 
to care about finding out what's the heck is going on. And no, you 
can't do this with XML+Namespaces+HTTP, not to the same extent and 
at the same cost.


> On the other hand,  Doctorow's "screed" does call into question the 
> WinFS vision, or am I missing something here?  To what extent does WinFS 
> not presuppose honest, energetic, intelligent, and self-aware humans to 
> create the metadata it will manage and query?

Not much. Most of it is flying about and never captured; you just 
have to know how to grab it. Huge amounts of useful metadata can be 
captured without ever asking users to anything extra  - witness the 
RSS above; but the residue produced by your blogging activity is 
going to be a fraction of the residue produced by your operating 
system activity.

cheers
Bill
-- 
Propylon
http://www.propylon.com




 

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS