OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help



   Re: [xml-dev] Is google a conceptual graph engine?

[ Lists Home | Date Index | Thread Index ]

My comments:

On Mon, 6 Oct 2003, Didier PH Martin wrote:

> Now if we are using xlink, some additional information can be added
> <a xlink:type="simple" xlink:href="index.html" xlink:role="partOf">XML
> Guide</a>
> Since the source is currentDoc and the destination index.html, then the
> conceptual graph for this statement is:
> [currentDoc]->(partOf)->[index.html] --- currentDoc is part of index.html
> Which could make sense if we consider that the first document represents the
> cover page or that it is the domain's table of contents (most of the time,
> the document associated to the domain is also a table of contents linking to
> the other documents). Based on this premise, documents are organized as a
> hierarchy and the document associated to the domain is the root.

Note also that we need to use English words in role, otherwise whatever
little semantics we have is lost. Your partOf will probably be considered
by a search engine as one special keyword..

For example, consider a document, which google indexes as:

I am partOf xml-dev

if you search "part of xml-dev" the above document will not be returned.

google uses "anchor context" also to determine the importance of pages.

> Now the problem is, for any classification agent that in other to satisfy
> mercantile appetites (or simply to pay the monthly bills) some people
> knowing that agent are using the role to establish relationship between two
> documents will play with the system in order to get a good ranking. Some
> would reply, let's then get rid of these search engines and let's create
> autonomous agents that will travel the web to collect relevant documents. No
> problems, How long will it take for such agent to cover enough of the web to
> collect significant documents. What are your guaranties that all links will
> honestly report (by will or simply by error) their relationship with other
> documents to your agent? Your agent travel agenda may be dependent on these
> relationship types....
> Hummm, definitively, the semantic web is not a simple affair... As some of
> our social problems are rooted in our nature or prehistoric times, some
> problems which could potentially be a plague to the semantic web are rooted
> in today's web.

Cheating on the web and getting false importance is one of the things that
google cleverly avoids. This is how google give importance to pages - this
is the page rank algorithm..

every web page is given the same rank of say 1.

now google does the following for several iterations. In every iteration,
the rank of every page is given by the sum of the rank of pages that point
to this page. For example, if your page is pointed to by yahoo, then you
have a high rank, but if your page is pointed to say by my home page, you
will have a lesser rank.

They do this for several iterations, and finally it converges (I think
experiments as well as theoretical results show that the no. of iterations
needed is < 10, if I remember correctly, irrespective of the starting
rank for every page)..

Main thing is: it is difficult to get your page ranked highly by doing

Note; Can someone seen any case when even page rank can be fooled? I do
not remember having seen anything..

best regards,


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS