[
Lists Home |
Date Index |
Thread Index
]
One disadvantage of term-based weighting or vector space model is the
well-known example cited in the Google's original paper (rather sales
pitch??) --
A document with only the words "Bill Clinton sucks"; as opposed to the
actual white house page was considered more important for the query "Bill
Clinton" (when Clinton was the president)
I believe we can use vector-space model only when the document collection
is "homogeneous" in some manner.. and has repetitive words etc.
Also note -- vector space model, you have to obtain rank of documents in
real-time given a query.
For other metrics such as say pagerank, rank of documents can be
pre-computed, and we can use better algorithms based on this property.
best, murali.
|