xml-dev - Re: [xml-dev] Something altogether different?

Re: [xml-dev] Something altogether different?

[ Lists Home | Date Index | Thread Index ]

To: "Bullard, Claude L (Len)" <len.bullard@intergraph.com>
Subject: Re: [xml-dev] Something altogether different?
From: Liam Quin <liam@w3.org>
Date: Mon, 25 Apr 2005 19:37:50 -0400
Cc: 'XML Developers List' <xml-dev@lists.xml.org>
In-reply-to: <15725CF6AFE2F34DB8A5B4770B7334EE07206EA6@hq1.pcmail.ingr.com>
References: <15725CF6AFE2F34DB8A5B4770B7334EE07206EA6@hq1.pcmail.ingr.com>
User-agent: Mutt/1.5.6+20040907i

On Mon, Apr 25, 2005 at 03:49:35PM -0500, Bullard, Claude L (Len) wrote:
[...]
> So where we do understand how the vector model 
> works for text analysis,
If you mean the cosine vector similarity model espoused by
the late Dr. Gerald Salton and others, I think what we know
is that it was an interesting theory that supported a lot of
useful research, but has a number of practical difficulties.

I don't know how Dr Cohen (cited earlier by Steve DeRose) has
dealt with them.  Difficulties include the fact that humans
attribute significance (in English) to word order, and also
use colocation of terms to help with sense disambiguation.
Another difficulty with earlier systems like SMART was that
sufficiently large documents contained all the terms -- use of
markup to do term weighting for individual sections (or even
paragraphs) can be a significant win in some environments.

In the extract, Cohen mentions that term weighting can be
"surprisingly effective" and goes on to say that
> One advantage of this "vector space" representation is that the
> similarity of two documents can be easily computed.

Sometimes the thing that's easy to implement gets far enough
of the way that doesn't seem worth implementing anything better.

The use of fuzzy logic (is this a derivative of Zadeh?) is also
interesting.

Liam

-- 
Liam Quin, W3C XML Activity Lead, http://www.w3.org/People/Quin/
http://www.holoweb.net/~liam/

Follow-Ups:
- Re: [xml-dev] Something altogether different?
  - From: "Ken North" <kennorth@sbcglobal.net>

References:
- RE: [xml-dev] Something altogether different?
  - From: "Bullard, Claude L (Len)" <len.bullard@intergraph.com>

Prev by Date: Re: [xml-dev] Does Enterprise Mean anything any more, in terms of XML dev?
Next by Date: Re: [xml-dev] Something altogether different?
Previous by thread: Re: [xml-dev] Something altogether different?
Next by thread: Re: [xml-dev] Something altogether different?
Index(es):
- Date
- Thread