OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

 


 

   RE: [xml-dev] When Searching With Google

[ Lists Home | Date Index | Thread Index ]

It is the fact of page ranking and other metrics not 
reflected in the GUI that is at the heart of the 
question.   

It is possible that some do not search 
by global popularity but by other means of 
assessing relevance.  It is too easy to game some 
relevance indicators, so a user might wish to 
set filters in accordance with their own mental
models or in accordance with policy as in the 
example of the scholastic researcher.  They 
may wish to use means of visualizing results that 
change depending on the filters applied.  
The link one is looking for may not be at the 
top of the list, and/or, the person is not 
looking for one link but a set of links. 

A subscription search engine might be preferred because 
it offers:

o  Enhanced visualization

o  Superior search interface with filters enabled by 
   selection, not fiat.

o  Customizable interface

o  Superior source vetting

o  Report generator support

o  Domain, say WAN/LAN or trusted site selectors of course.

and so on.  This does not need to be a domain specific 
engine, but it might indeed be an executive service and 
it may return results not just to the human, but to a 
decision support service.  That is one way to implement 
this; use the Google web service and layer another 
set of filters on top of it, but again, for this to work, 
the Google algorithms would need to be transparent. So 
it is possible that one might not want to build this over 
Google given that the algorithms are not transparent.

Google doesn't do badly, but this is a domain of hot 
interest and one should ask questions and speculate on 
possible better systems.   The closer one gets to the 
report generator interface, the more one needs to 
understand the supporting search engine. Also, 
it is likely that subscription costs would increase 
as that would be an enhanced service.

len


From: Irene Polikoff [mailto:Irene@topquadrant.com]

Google's current differentiation comes not from their ability to discern
meaning or provide a user interface that is better then that of the
other search engines. Instead, it is in the algorithms that figure out
the 'popularity' of the page based on how many other pages (and what
kind of pages) link to it.

By doing this, Google effectively incorporates opinions of a large set
of people. The most popular pages percolate to the top of the result
list. It is the fact that the link one is looking for is right at the
top of the list (as opposed to being buried on page 17), that creates
the perception of higher relevancy of Google search results.  

I say "current" because they also experiment with other stuff, for
example using certain taxonomies like the Open Directory Project index.
In fact, with their recent acquisition of Applied Semantics, they seem
to be very much into knowledge representation, Semantic Web approaches
to search. One evidence can be seen if you search on Google for
"Semantic Web". Notice that one of the adds served on the right is their
own "Work at Google" advertisement.

Getting back to the original question, I think the subscription search
engines that contract for the quality of their results, would be more
viable within the specific specialized domains as opposed to the general
search areas.

Regards,

Irene Polikoff
Executive Partner
TopQuadrant

Main office: 724-846-9300x212
Direct line:  914-777-0888
Cell:           914-329-8576
www.topquadrant.com

-----Original Message-----
From: Bullard, Claude L (Len) [mailto:clbullar@ingr.com] 
Sent: Monday, December 08, 2003 10:30 AM
To: 'michael.h.kay@ntlworld.com'; xml-dev@lists.xml.org
Subject: RE: [xml-dev] When Searching With Google


Right.  And that is why I am asking.  Should the GUI 
give clues to the filtering?  If yes, it gets harder 
to use.  If no, its reliability vis a vis a common 
mental model is lowered.

One should be sure what those Google numbers are 
saying.  One should know about the phrase trade. 
One should understand blogging keiretsu.  One should 
be able to set a search based on the credentials 
of the sources.  One should be able to pick the 
types of credentials, not let the bot do that.

Amplified acceptance of unverified assumptions 
is the very essence of robot wisdom.  I am 
wondering about the viability of subscription 
search engines that contract for the quality 
of their results.  

len


From: Michael Kay [mailto:michael.h.kay@ntlworld.com]

Most modern search engines give greater weight to a term the more
infrequent it is in the corpus. Most also weight terms according to
where and how often they appear in the source document, and some also
recognize when adjacent words in the query constitute a noun phrase.
What google does is anyone's guess.

-----------------------------------------------------------------
The xml-dev list is sponsored by XML.org <http://www.xml.org>, an
initiative of OASIS <http://www.oasis-open.org>

The list archives are at http://lists.xml.org/archives/xml-dev/

To subscribe or unsubscribe from this list use the subscription
manager: <http://lists.xml.org/ob/adm.pl>




 

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS