Lists Home |
Date Index |
I gave some thoughts on what has been said about Google and the fact that
its classification schema is link based.
Are there any issues with that statement?
"Google is a tacit conceptual graph engine".
When two documents are linked together a statement, a logical relationship
is established between these two documents. XHTML or HTML doesn't allow to
explicitly specify the "role" played by this link. Said differently, the
only clue about this relationship is contained in the <a> element's data
content. However, this could be tremendously ambiguous and any automated
agent will struggle to find such logical relationship from the contained
information. Take as an example the following link:
Expressed as a conceptual graph:
[currentDoc]->(home)->[index.html] ---- CurrentDoc is index.html's home
Even if the link meant the inverse relationship I doubt that it makes more
[index.html]->(home)->[currentDoc] ----- index.html is currentDoc`s home.
Hummmm... Is that true? Are documents having a home? Maybe if we have the
tacit knowledge that a web document home is the domain name. Now the problem
is how can we interpret these links from destination to source or vise
versa. Sometime the relationship makes sense in one direction and sometime
in another. If we map the relationship to the <a> element direction then we
have currentDoc is index.html's home which is contrary to what is really
Some SEO (Search Engine Optimizers) are using optimization techniques and
label their links with key phrases. The key phrase contains targeted
keywords for the particular purpose to be well ranked. As an example:
<a href="index.html">XML Guide</a>
[currentDoc]->(guide)->[index.html] ---- currentDoc is a guide to index.html
Even taking the reverse doesn't make more sense
[index.html]->(guide)->[currentDoc] ---- index.html is a guide to
Now if we are using xlink, some additional information can be added
<a xlink:type="simple" xlink:href="index.html" xlink:role="partOf">XML
Since the source is currentDoc and the destination index.html, then the
conceptual graph for this statement is:
[currentDoc]->(partOf)->[index.html] --- currentDoc is part of index.html
Which could make sense if we consider that the first document represents the
cover page or that it is the domain's table of contents (most of the time,
the document associated to the domain is also a table of contents linking to
the other documents). Based on this premise, documents are organized as a
hierarchy and the document associated to the domain is the root.
Now the problem is, for any classification agent that in other to satisfy
mercantile appetites (or simply to pay the monthly bills) some people
knowing that agent are using the role to establish relationship between two
documents will play with the system in order to get a good ranking. Some
would reply, let's then get rid of these search engines and let's create
autonomous agents that will travel the web to collect relevant documents. No
problems, How long will it take for such agent to cover enough of the web to
collect significant documents. What are your guaranties that all links will
honestly report (by will or simply by error) their relationship with other
documents to your agent? Your agent travel agenda may be dependent on these
Hummm, definitively, the semantic web is not a simple affair... As some of
our social problems are rooted in our nature or prehistoric times, some
problems which could potentially be a plague to the semantic web are rooted
in today's web.
Didier PH Martin
For those of us not knowing what conceptual graphs are go to
http://www.hum.auc.dk/cg for an online course on the subject.