xml-dev - Re: XML Search Engine

Re: XML Search Engine

[ Lists Home | Date Index | Thread Index ]

From: <david@megginson.com>
To: <xml-dev@ic.ac.uk>
Date: Thu, 5 Nov 1998 13:50:52 -0500 (EST)

Tim Bray writes:

 > What I said was:
 > 1. I have not seen any research which demonstrates that word proximity
 >    achieves better results than character proximity based on any
 >    well-known IR metric.
 > 2. Doing word proximity at all is a *very* hard problem in the languages
 >    used by a large majority of the world's population.

I think that there might be a disconnect here.  What we're talking
about is minimal-semantic-unit proximity -- for some
languages/contexts, the minimal semantic unit will always be a single
grapheme, and for others, it will be a cluster of one or more
graphemes.

This type of clustering is critical for search engines, which often
(usually?) provide inverse indexes only for minimal semantic units,
not for all graphemes.  The argument, then, is that proximity testing
should be done by counting the units that were indexed, which may or
may not be single graphemes.

All the best,

David

-- 
David Megginson                 david@megginson.com
           http://www.megginson.com/

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)

Follow-Ups:
- Software for markup? (was Re: XML Search Engine)
  - From: Peter Murray-Rust <peter@ursus.demon.co.uk>

References:
- Re: XML Search Engine
  - From: Tim Bray <tbray@textuality.com>

Prev by Date: Re: XML Search Engine
Next by Date: Re: html, xml
Previous by thread: Re: XML Search Engine
Next by thread: Software for markup? (was Re: XML Search Engine)
Index(es):
- Date
- Thread