[
Lists Home |
Date Index |
Thread Index
]
Mike Champion wrote:
> I can see the utility of ontology building in domains
> where things more or less sit still while we examine
> them, e.g. the assumptions about human anatomy and
> physiology built into SNOMED (although I suppose that
> it evolves fairly quickly as disease organisms evolve
> and as the etiology of known diseases is better
> understood. It's just not clear to me how that is
> going to help us find stuff on the Web better than we
> can with heuristic / statistical approaches. For
> example, Google doesn't know a stinkin' thing about
> "cameras" except that the word appears on a lot of
> pages with words such as "picture" in it (and its
> synonyms, equivalents in other languages, etc.), so it
> has no trouble with the idea that a cellphone can also
> be a camera. So, we can do useful things with these
> statistically useful "attractors" of one term for
> another in the space of actual documents that would
> utterly defeat a reasoning agent with an out-of-date
> ontology that is trying to figure out why anyone would
> object to people bringing cellphones into a locker
> room.
In fairness, technologies like OWL don't know anything about
"cameras" either. And unfairly, I could twist your argument as being
equally against relational data, though I'm sure that's not your
intention :)
But think about FOAF, or calendaring - search engines may be good at
determining the relative importance of some chunk of data, but they
just couldn't begin to provide the sort of information a naive graph
walker or inference engine could, given a set of foaf graphs, iCal,
and a party to organize.
There's a place for webtastic meatydata, and Google will doubtless
leverage it, perhaps by warping the pagerank to provide trust
metrics about data sources.
> Sure, the approach Google uses is beginning to fall
> apart under the various strains on it, and clearly the
> world needs to keep working on this problem. There
> may be some way to leverage relatively static
> ontologies to steer one away from "false attractors",
> but the only practical way I see to keep up with
> evolving language is to continuously sample real
> communications.
Google's approach to query (pagerank) is fine, their approach to
search (download the web into a refrigerated uber-cluster) has real
issues - just because you're good with a shovel doesn't mean you're
digging the right hole. Now, if something like jxta search ever
caught on...
Bill de hÓra
|