[
Lists Home |
Date Index |
Thread Index
]
None of which does the average user understand.
The question is what would they pay in terms
of learning curve or subscription costs for a
search engine that behaves exactly as they
think it should.
The model with which they insert terms and the
results are at variance; two systems are contending
for the same resource. It makes it's best
guess, and then the user starts searching in
the results. Ok. The human is smart enough
one assumes to recognize what they are looking
for if they find it in the results. On the
other hand, ascribing importance to the order
of the results, the Google numbers, or the
negative space (results not returned) is at
best, a superstitious endeavor as long as
the model they used to pick the initial
terms and the model by which those terms
are used to select results are not the same.
Multiple systems contending for the same resource
is a working definition of non-linearity, or
unpredictable correlation. This is the well-known
mental ontology contending with the search ontology
problem.
Then there is the further problem of source vetting.
Are authors doing high quality credentialed work?
Note that Michael Kay did not write that first
bit below. I did. You removed my name and
left Michael's. Now what does Google do with
that? Possibly nothing, but a human might and
it is likely to be wrong unless they follow
the thread back to pick up the source. Now we have not
only the mystery of Google's algorithms, but
the vagaries of human authoring habits. That
is why credentialed sources would be of value
as part of a search filter. Let's say I am a
university professor and I want my students to
use the web to do research. How should I interpret
their results if their sources are uncredentialed?
The simple interface can lead to amplified error.
The complex interface can lead to high costs and reduce the scale
of use. But is it better to swap scale for reliable results?
len
Also I heard recently that google is making the search results adaptive
based on user using some heuristics - probably domain or something..??
In short, I heard that if I search for the key words "w1 w2 ..." and
someone else searches for the same set of key words, google might give
different ranked results - in other words, user perceives the results
ranking as non-deterministic.
I am not sure if that is true actually.. can someone confirm this??
Google uses lot of proprietary heuristics for fine-tuning the search
results ranking, such as tf-idf (which is greater weight to a term that
occurs infrequently) which is well known in literature etc...
anyways, best, murali.
On Mon, 8 Dec 2003, Michael Kay wrote:
> > That is why I wondered if it picked up on the topic
> > word or phrase. That is likely what they are after.
> > The other words are qualifiers, at least, that is
> > how I use it. I was questioning the Google strategy
> > because I realized I have a mental model of how it
> > works, and that is how I select and enter search
> > terms. It is probably not the right mental model
> > but the interface doesn't make it clear, and as a
> > result, its filtering strategy is opaque. The user
> > does the best they can.
>
> Most modern search engines give greater weight to a term the more
> infrequent it is in the corpus. Most also weight terms according to
> where and how often they appear in the source document, and some also
> recognize when adjacent words in the query constitute a noun phrase.
> What google does is anyone's guess.
>
> Michael Kay
>
>
> -----------------------------------------------------------------
> The xml-dev list is sponsored by XML.org <http://www.xml.org>, an
> initiative of OASIS <http://www.oasis-open.org>
>
> The list archives are at http://lists.xml.org/archives/xml-dev/
>
> To subscribe or unsubscribe from this list use the subscription
> manager: <http://lists.xml.org/ob/adm.pl>
>
-----------------------------------------------------------------
The xml-dev list is sponsored by XML.org <http://www.xml.org>, an
initiative of OASIS <http://www.oasis-open.org>
The list archives are at http://lists.xml.org/archives/xml-dev/
To subscribe or unsubscribe from this list use the subscription
manager: <http://lists.xml.org/ob/adm.pl>
|