xml-dev - RE: [xml-dev] Statistical vs "semantic web" approaches to makingsense of

RE: [xml-dev] Statistical vs "semantic web" approaches to makingsense of

[ Lists Home | Date Index | Thread Index ]

To: "'martin@hack.org'" <martin@hack.org>, Danny Ayers <danny666@virgilio.it>
Subject: RE: [xml-dev] Statistical vs "semantic web" approaches to makingsense of the Net
From: "Hunsberger, Peter" <Peter.Hunsberger@stjude.org>
Date: Thu, 24 Apr 2003 13:28:11 -0500
Cc: Mike Champion <mc@xegesis.org>, xml-dev@lists.xml.org

martin@hack.org <martin@hack.org> wrote:

> 
> On Thu, 24 Apr 2003, Danny Ayers wrote:
> 
> > By coincidence I've been writing up a semi-refutation of Cory's 
> > 'metacrap' piece, hopefully ready in a day or so.
> 
> i'd be interested to see that. my initial reaction to this 
> piece was 'crap'! can't help it, but i think it should be 
> obvious that all his arguments apply equally well to data as 
> it does to metadata.
> 
> there seems to be an underlying view that anything done by a 
> machine - set-top boxes for TV stats or google for metadata - 
> is almost by definition better and more reliable than 
> anything produced by a human.

I realize I'm close to slipping down the rabbit hole here, but given the way
you posed this statement I can't resist playing devils advocate for a
moment:  if you're dealing with a random representative of the masses here
then it's probably true that Google type information is more reliable.  Just
because Martha down the street tells you that the best TV on the market is a
SuperSuchAndSuch "because her cousin Freddy has a 60" in his double wide"
doesn't make it true any more so than finding the same opinion on some
random Web site.  However, Google finding what it considers the best
authority on TV's is a lot more likely to get you a true evaluation of which
TV is the best.  The fact is that the Web is the first searchable and cross
referenced repository of 1000,000s of opinions and as such it is reasonable
that there are actually ways to reliably sample those opinions and weight
them. This has little to do with semantics or XML (but certainly something
to do with linking); its more a case of finding algorithms that can judge
authority in some way or other (more on that in a moment). It almost seems
that all the metadata in the world won't really change the way something
like Google works except that the types of links one can make will become
greater in number, and perhaps some types of links may prove better than
others (eg. RSS vs. xpointer to stretch things a little).

> "Google can derive statistics about the number of Web-authors 
> who believe that that page is important enough to link to, 
> and hence make extremely reliable guesses about how reputable 
> the information on that page is." really? my friend freddy's 
> got a website with links to the most unreliable sites on the 
> web. how does that affect google's 'reputability' scoring?

Probably not one iota; your friend Freddy is likely not going to be
considered an authority by Google.  The evaluation of links is recursive and
global. So, someone else has to also value Freddy's opinion before Google is
going to let him influence things. Yes Freddy and Martha could collude to
point to each others sites but a local island of links still won't have much
affect on the global evaluation.

> maybe the number of links to a page is a measure of exactly 
> that and nothing else - but do feel free make any assumptions 
> you want about why those links are there. personally i don't 
> tend to see googles search results as a reputability grading 
> at all, and i wouldn't recommend that anyone does ("it's 
> true, i found it on google!").

Of course not, TV news anchors and newspaper editors are the true font of
all knowledge...

> ultimately, if you care about the information that you 
> publish, then you care about the metainformation. and yes, 
> it's generally much easier to find web pages that have 
> meaningful titles.
>

Follow-Ups:
- RE: [xml-dev] Statistical vs "semantic web" approaches to makingsense of the Net
  - From: <martin@hack.org>
- RE: [xml-dev] Statistical vs "semantic web" approaches to makingsense of the Net
  - From: "Danny Ayers" <danny666@virgilio.it>

Prev by Date: Re: [xml-dev] Statistical vs "semantic web" approaches to makingsenseof the Net
Next by Date: RE: [xml-dev] Data streams and schema use and identification
Previous by thread: ANN: Article on using RDF to provide extensibility and modularity
Next by thread: RE: [xml-dev] Statistical vs "semantic web" approaches to makingsense of the Net
Index(es):
- Date
- Thread