OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

 


 

   Re: [xml-dev] Statistical vs "semantic web" approaches to making sense o

[ Lists Home | Date Index | Thread Index ]

Mike Champion wrote:
>
> This raises a question, for me anyway:  If it will take a "better Google
> than Google" (or perhaps an "Autonomy meets RDF") that uses Baysian or
> similar statistical techniques to create the markup that the Semantic Web
> will exploit, what's the point of the semantic markup?  Why won't people
> just use the "intelligent" software directly?  Wearing my "XML database
> guy" hat, I hope that the answer is that it will be much more efficient
and
> programmer-friendly to query databases generated by the 'bots containing
> markup and metadata to find the information one needs.  But I must admit
> that 5-6 years ago I thought the world would need standardized, widely
> deployed XML markup before we could get the quality of searches that
Google
> allows today using only raw HTML and PageRank heuristic algorithm.
>
> So, anyone care to pick holes in my assumptions, or reasoning?  If one
does
> accept the hypothesis that it will take smart software to produce the
> markup that the Semantic Web will exploit, what *is* the case for
believing
> that it will be ontology-based logical inference engines rather than
> statistically-based heuristic search engines that people will be using in
> 5-10 years?  Or is this a false dichotomy?

Yes this is an entirely false dichotomy but you've asked an extremely
important question.

Forget all the hype that we've been hearing about the SW/AI etc and let's
look at what the current reality is -- OWL is *fundamentally* about
classifications. OWL "reasoners" are rightly termed "classifiers" but OWL
doesn't employ statistics -- a thing is or isn't a member of a class.

To link OWL type classifiers with real world data, there must be a leap that
puts something into a class in the first place and this is where
statistical-type processors might function. Let's use the following example:
Suppose we have a bunch of noisy binary data about a group of people some of
whom let's say have SARS, some of the data might be audio, some video, some
text etc etc.

Now suppose we have a statistical process that is able to cluster
individuals together in groups. This processor might emit the following
class:

<owl:Class rdf:ID="Foo">
    <owl:oneOf rdf:parseType="Literal">
        <ex:person rdf:resource="#Bill"/>
        <ex:person rdf:resource="#Dave"/>
        <ex:person rdf:resource="#Sue"/>
        <ex:person rdf:resource="#Nancy"/>
        <ex:person rdf:resource="#Freddy"/>
    <owl:oneOf>
</owl:Class>

our reasoner might be able to derive that

<owl:Class rdf:ID="Bar">
    <owl:intersectionOf>
         <owl:Class rdf:resource="#hasCough"/>
         <owl:Class rdf:resource="#hasFever"/>
         <owl:Class rdf:resource="#hasVirus.x233444"/>
...

#Foo owl:subClassOf #Bar

and even, in the proper circumstances that...

#Bar owl:sameClassAs #SARS

so the Bayesian/statistical processes might be very well used to jumpstart a
logical classification process that tells us something quite useful.

Jonathan





 

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS