Lists Home |
Date Index |
> I believe we can use vector-space model only when the document collection
> is "homogeneous" in some manner.. and has repetitive words etc.
> Also note -- vector space model, you have to obtain rank of documents in
> real-time given a query.
Cohen's '99 WHIRL paper discusses the ranking heuristics, the storing of
similarities instead of computing them in real-time, and the use of views to
persist information about the highest-scoring answers:
"Fortunately, in most cases, it is not necessary to compute all answers to a
query, as only the high-scoring answers will be of interest. WHIRL's inference
algorithms are thus designed to finds a few good answers to a query, without
generating all possible answers. The operations most commonly performed by a
user (or program) interacting with WHIRL are to define and r-materialize views.
To r-materialize a view, WHIRL finds the "r" highest-scoring ground atoms "a"
associated with a view, and store those facts in the EDB (extensional database)
for later use."
> For other metrics such as say pagerank, rank of documents can be
> pre-computed, and we can use better algorithms based on this property.
In the "Recommending Music by Crawling The Web" paper, Cohen and Fan researched
music preferences by spidering the web and using four different scoring
algorithms: popularity, K-nearest neighbor, weighted majority and a extended
direct Bayesian prediction.
In a 1998 paper, Cohen, Shapir and Yagir discussed the use of a preference
function when determining ranking (excerpt below):
Learning to Order Things
There are many applications in which it is desirable to order rather than
instances. Here we consider the problem of learning how to order, given feedback
in the form of preference judgments, i.e., statements to the effect that one
should be ranked ahead of another. We outline a two-stage approach in which one
first learns by conventional means a preference function, of the form PREF
... Nevertheless, we describe a simple greedy algorithm that is guaranteed to
good approximation. We then discuss an on-line learning algorithm, based on the
"Hedge" algorithm, for finding a good linear combination of ranking "experts."