xml-dev - RE: [xml-dev] Is google a conceptual graph engine?

RE: [xml-dev] Is google a conceptual graph engine?

[ Lists Home | Date Index | Thread Index ]

To: "'Murali Mani'" <mani@CS.UCLA.EDU>,<xml-dev@lists.xml.org>
Subject: RE: [xml-dev] Is google a conceptual graph engine?
From: "Didier PH Martin" <martind@netfolder.com>
Date: Mon, 6 Oct 2003 14:17:49 -0400
Importance: Normal
In-reply-to: <Pine.SOL.4.33.0310060854570.20596-100000@panther.cs.ucla.edu>

Hi Murali,

Murali said:
Note also that we need to use English words in role, otherwise whatever
little semantics we have is lost. Your partOf will probably be considered
by a search engine as one special keyword..

Didier replies:
Off course Murali and actually most of search engine do not process xlink. I
gave that as an example of an "eplicit" ontology created from links between
documents. This by opposition to a "tacit" ontology created from links
between documents. 

If we add some semantics to a link (i.e. through a role or relationship
typing) then we add more information to the link. This is something we do
not have today in currently published documents. Relationships are not
typed. 

Let's say that we live in a totally honest world. You publish a document on
the web; I publish a document on the web. We start to build a certain view
of the world by linking our documents. However, for an external observer;
the kind of relationship between these two documents is not that obvious.
Yes indeed, a human reader with average intelligence can infer the type of
relationship between the two documents but a totally dumb machine named a
computer will struggle to figure out the type of relationship. However, if
we add extra information about the type of link/relationship we then add
some additional information that could help build an ontology from these two
documents, especially if we can associate these two documents to a theme.
Again, let's suppose that these two documents help our dumb machine to
figure out what going on by including some statements like:
Document 1:
<rdf:description about="self">
   <author>Didier PH Martin</author>
   <theme>ontology</theme>
</rdf:description>

document 2:
<rdf:description about="self">
   <author>Murali Mani</author>
   <theme>model theory</theme>
</rdf:description>

If now you include a link in your document such as this one:

<a xlink:type="simple" xlink:href=http://adomain.com/ontology.html
xlink:role="is_partOf"> more info...</a>

Then you add a new statement in your document and you create a new kind of
statement:

[ontology]->(is_part of)->[model theory]

That assertion may be true or false but nonetheless will reflect a view of
the world or an assertion about a world.

The relationship is explicit and specified in your document. Otherwise it is
tacit and deduce from the content and the algorithm used to classify the
content.

Today, we can say that the ontology created from links between documents
encoded either in HTML (SGML based) or XHTML (xml based) represents a tacit
ontology. This tacit ontology cannot easily be discovered from the tags or
the content of the marked up documents.

The more I think about this whole issue, the more I think that ontologies as
specified by W3C can work only in some domains and this only if tools to
make it simple and easy are available; for example, for some type of
transactions or international transaction where more formal definitions are
required to facilitate the exchanges. This won't be the internet per se, but
more intra or extranets, more limited networks than what we call the "web".

You also mentioned something about page rank. You are right, the series is
converging quite rapidly (ref: http://www.iprcom.com/papers/pagerank/).

 I totally disagree with you on the fact that Google's page rank cannot be
cheated. Just go to:
http://www2.google.com/search?hl=en&ie=UTF-8&oe=UTF-8&q=weapon+of+mass+destr
uction and look at the first link. Now try to figure out how a joke about
such serious subject can be ranked so high (yes the trick is no secrets).
The actual page rank algorithm without theme or concept cluster (a la toema)
is simply a political statement corresponding to a vote. It's a little bit
like the highgrade sausage ads of a couple years ago. Everybody likes it
because everybody eats it and everybody eats it because every body likes it.
Said differently, a document should be important if a lot of important
people say so. It doesn't say that the assertion made by a document is true,
false, serious or a joke. Just that a lot of important people voted that is
important. The previous example shows that this algorithm can be fooled by a
group. 

Cheers
Didier PH Martin

References:
- Re: [xml-dev] Is google a conceptual graph engine?
  - From: Murali Mani <mani@CS.UCLA.EDU>

Prev by Date: Re: [xml-dev] Is google a conceptual graph engine?
Next by Date: RE: [xml-dev] Is google a conceptual graph engine?
Previous by thread: Re: [xml-dev] Is google a conceptual graph engine?
Next by thread: XML and Access databases
Index(es):
- Date
- Thread