OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

 


 

   RE: [xml-dev] Is google a conceptual graph engine?

[ Lists Home | Date Index | Thread Index ]

Hi,

Still trying to find an answer to the question I came to the following
conclusions:

a) yes Google is a conceptual graph engine
b) yes Google present us a fuzzy set logic

I made the following request to Google
Query: dsssl

I got back a page containing the first 10 URI from about 220 000 documents
having something related to the concept "dsssl". Let's consider that an item
listed in the page represents the following statement:

<rdf:description about=http://www.netfolder.com/DSSSL>
   <theme>dsssl<theme>
</rdf:description> 

In fact this URI is listed in fourth position (bloggers help me get a better
position :-) just joking). What is then said by google about this URI
compared to the one listed at position 220 000? That the former URI is
closer to the predicate "theme" than the latter. Quite intuitive and even
related to the cost of the transaction. To get to the 220 000 URI I'll need
to click 220 000/10 (if I do not know how to increment the number of URI per
page) thus, 22 000 click to get the last URI. We can say that the URI
mentioned in the RDF statement is closer in terms of the number of clicks or
the efforts needed to reach the URI. We can infer that Google publish a
statement of value by saying that the first URI is having a bigger ownership
value to the set "theme" than other URI positioned further. Thus, Google is
somewhat using a fuzzy set relationship between a URI and a predicate. We
could, for instance, translate that into:

<rdf:description about=http://www.netfolder.com/DSSSL>
   <theme value="4">dsssl<theme>
</rdf:description> 

The smaller the value, the more the URI is owned by the "theme" set. The
first one being the perfect match (not always :-)

We can say also that the result page from a query is a statement: a
statement saying that a URI (i.e. an RDF resource) is related to a
predicate. Checking my logs I noticed that Google stated that:

<rdf:description about=http://www.netfolder.com/DSSSL>
   <rdf:bag>
      <theme value="4">dsssl<theme>
      <theme value="3">openJade</theme>
   </rdf:bag>
</rdf:description> 

Google also states that:

Dsssl class is part of this facet:
Computers > Data Formats > Markup Languages > SGML > Style Sheets > DSSSL

That for example the facet "XSL" is also related to other facets:
  Computers > Programming > Internet > CSS Computers > Data Formats > Markup
Languages > SGML > Style Sheets > DSSSL
    Computers > Data Formats > Markup Languages > XML > Tools > Servers)
    Computers > Programming > Languages > Java > XML > Class Libraries > XSL

These facets represent a view of the world, a model or ontology. Google can
reach some conclusion about how a particular theme (ex: dsssl) can be
related to XML or SGML even if nothing about XML or SGML is mentioned in the
document, just by using the DMOZ ontology.

In fact, I could have just described Google's ontology using OWL. Moreover,
I could write a PERL or Python script that would parse the returned page and
translate it into RDF/OWL and therefore make Google's tacit ontology
explicit in the XML world. 

Conclusion: Google is a conceptual graph engine, Google is providing fuzzy
set ownership (URI to predicate). The fuzzy set membership function's value
can be set from the ranking position. To make Google's ontology explicit,
what is needed is some scripts to translate the tacit ontology into an XML
based explicit one.

So far so good. Now what do I do If after having translated this model or
view of the world into  RDF/OWL I try to merge that with an other
conflicting view of the world? This is a real problem we will have to face
in the semantic web. The real issue is not to encode the ontology, not it is
to create a web of trust to validate the ownership of these views, it is
more how can we re-conciliate conflicting views of the world? What we did up
to now was to use a social process to resolve the issue. 

Can someone give some references about ideas proposed to resolve these
issues (I mean with the W3C semantic web technologies off course).

Cheers
Didier PH Martin







 

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS