OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help



   Re: [xml-dev] Something altogether different?

[ Lists Home | Date Index | Thread Index ]

One disadvantage of term-based weighting or vector space model is the
well-known example cited in the Google's original paper (rather sales
pitch??) --

A document with only the words "Bill Clinton sucks"; as opposed to the
actual white house page was considered more important for the query "Bill
Clinton" (when Clinton was the president)

I believe we can use vector-space model only when the document collection 
is "homogeneous" in some manner.. and has repetitive words etc.

Also note -- vector space model, you have to obtain rank of documents in
real-time given a query.

For other metrics such as say pagerank, rank of documents can be 
pre-computed, and we can use better algorithms based on this property.

best, murali.


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS