OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

 


 

   RE: [xml-dev] When Searching With Google

[ Lists Home | Date Index | Thread Index ]

None of which does the average user understand. 
The question is what would they pay in terms 
of learning curve or subscription costs for a 
search engine that behaves exactly as they 
think it should.

The model with which they insert terms and the 
results are at variance; two systems are contending 
for the same resource.  It makes it's best 
guess, and then the user starts searching in 
the results.  Ok.  The human is smart enough 
one assumes to recognize what they are looking 
for if they find it in the results.  On the 
other hand, ascribing importance to the order 
of the results, the Google numbers, or the 
negative space (results not returned) is at 
best, a superstitious endeavor as long as 
the model they used to pick the initial 
terms and the model by which those terms 
are used to select results are not the same. 

Multiple systems contending for the same resource 
is a working definition of non-linearity, or 
unpredictable correlation.  This is the well-known 
mental ontology contending with the search ontology 
problem.

Then there is the further problem of source vetting.
Are authors doing high quality credentialed work?

Note that Michael Kay did not write that first 
bit below.  I did.  You removed my name and 
left Michael's.   Now what does Google do with 
that?  Possibly nothing, but a human might and 
it is likely to be wrong unless they follow 
the thread back to pick up the source.  Now we have not 
only the mystery of Google's algorithms, but 
the vagaries of human authoring habits.  That 
is why credentialed sources would be of value 
as part of a search filter.  Let's say I am a 
university professor and I want my students to 
use the web to do research.  How should I interpret 
their results if their sources are uncredentialed?

The simple interface can lead to amplified error.  
The complex interface can lead to high costs and reduce the scale 
of use.  But is it better to swap scale for reliable results?

len

Also I heard recently that google is making the search results adaptive
based on user using some heuristics - probably domain or something..??

In short, I heard that if I search for the key words "w1 w2 ..." and
someone else searches for the same set of key words, google might give
different ranked results - in other words, user perceives the results
ranking as non-deterministic.

I am not sure if that is true actually.. can someone confirm this??

Google uses lot of proprietary heuristics for fine-tuning the search
results ranking, such as tf-idf (which is greater weight to a term that
occurs infrequently) which is well known in literature etc...

anyways, best, murali.

On Mon, 8 Dec 2003, Michael Kay wrote:

> > That is why I wondered if it picked up on the topic
> > word or phrase.  That is likely what they are after.
> > The other words are qualifiers, at least, that is
> > how I use it.  I was questioning the Google strategy
> > because I realized I have a mental model of how it
> > works, and that is how I select and enter search
> > terms.  It is probably not the right mental model
> > but the interface doesn't make it clear, and as a
> > result, its filtering strategy is opaque.  The user
> > does the best they can.
>
> Most modern search engines give greater weight to a term the more
> infrequent it is in the corpus. Most also weight terms according to
> where and how often they appear in the source document, and some also
> recognize when adjacent words in the query constitute a noun phrase.
> What google does is anyone's guess.
>
> Michael Kay
>
>
> -----------------------------------------------------------------
> The xml-dev list is sponsored by XML.org <http://www.xml.org>, an
> initiative of OASIS <http://www.oasis-open.org>
>
> The list archives are at http://lists.xml.org/archives/xml-dev/
>
> To subscribe or unsubscribe from this list use the subscription
> manager: <http://lists.xml.org/ob/adm.pl>
>

-----------------------------------------------------------------
The xml-dev list is sponsored by XML.org <http://www.xml.org>, an
initiative of OASIS <http://www.oasis-open.org>

The list archives are at http://lists.xml.org/archives/xml-dev/

To subscribe or unsubscribe from this list use the subscription
manager: <http://lists.xml.org/ob/adm.pl>




 

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS