Lists Home |
Date Index |
Murali Mani wrote:
> One disadvantage of term-based weighting or vector space model is the
> well-known example cited in the Google's original paper (rather sales
> pitch??) --
> A document with only the words "Bill Clinton sucks"; as opposed to the
> actual white house page was considered more important for the query "Bill
> Clinton" (when Clinton was the president)
> I believe we can use vector-space model only when the document collection
> is "homogeneous" in some manner.. and has repetitive words etc.
Google is apparently looking at a noun clustering scheme.
Norvig highlighted a research paper written by a Google employee last year
regarding a classification engine the company is testing. The technology can
parse a proper noun or compound nouns into several categories in order to
deliver clustered results, for example. For a query on "ATM," or asynchronous
transfer mode, the engine would be able to use the terms "such as" on Web pages
indexed with the term to discover that it can be linked to the expression
"high-speed networks." As a result, a search for high-speed networks might pull
up a cluster on ATM.