[
Lists Home |
Date Index |
Thread Index
]
Mike Champion wrote:
>
> This raises a question, for me anyway: If it will take a "better Google
> than Google" (or perhaps an "Autonomy meets RDF") that uses Baysian or
> similar statistical techniques to create the markup that the Semantic Web
> will exploit, what's the point of the semantic markup? Why won't people
> just use the "intelligent" software directly? Wearing my "XML database
> guy" hat, I hope that the answer is that it will be much more efficient
and
> programmer-friendly to query databases generated by the 'bots containing
> markup and metadata to find the information one needs. But I must admit
> that 5-6 years ago I thought the world would need standardized, widely
> deployed XML markup before we could get the quality of searches that
Google
> allows today using only raw HTML and PageRank heuristic algorithm.
>
> So, anyone care to pick holes in my assumptions, or reasoning? If one
does
> accept the hypothesis that it will take smart software to produce the
> markup that the Semantic Web will exploit, what *is* the case for
believing
> that it will be ontology-based logical inference engines rather than
> statistically-based heuristic search engines that people will be using in
> 5-10 years? Or is this a false dichotomy?
Yes this is an entirely false dichotomy but you've asked an extremely
important question.
Forget all the hype that we've been hearing about the SW/AI etc and let's
look at what the current reality is -- OWL is *fundamentally* about
classifications. OWL "reasoners" are rightly termed "classifiers" but OWL
doesn't employ statistics -- a thing is or isn't a member of a class.
To link OWL type classifiers with real world data, there must be a leap that
puts something into a class in the first place and this is where
statistical-type processors might function. Let's use the following example:
Suppose we have a bunch of noisy binary data about a group of people some of
whom let's say have SARS, some of the data might be audio, some video, some
text etc etc.
Now suppose we have a statistical process that is able to cluster
individuals together in groups. This processor might emit the following
class:
<owl:Class rdf:ID="Foo">
<owl:oneOf rdf:parseType="Literal">
<ex:person rdf:resource="#Bill"/>
<ex:person rdf:resource="#Dave"/>
<ex:person rdf:resource="#Sue"/>
<ex:person rdf:resource="#Nancy"/>
<ex:person rdf:resource="#Freddy"/>
<owl:oneOf>
</owl:Class>
our reasoner might be able to derive that
<owl:Class rdf:ID="Bar">
<owl:intersectionOf>
<owl:Class rdf:resource="#hasCough"/>
<owl:Class rdf:resource="#hasFever"/>
<owl:Class rdf:resource="#hasVirus.x233444"/>
...
#Foo owl:subClassOf #Bar
and even, in the proper circumstances that...
#Bar owl:sameClassAs #SARS
so the Bayesian/statistical processes might be very well used to jumpstart a
logical classification process that tells us something quite useful.
Jonathan
|