OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help



   Re: XML Search Engine

[ Lists Home | Date Index | Thread Index ]
  • From: <david@megginson.com>
  • To: <xml-dev@ic.ac.uk>
  • Date: Thu, 5 Nov 1998 13:50:52 -0500 (EST)

Tim Bray writes:

 > What I said was:
 > 1. I have not seen any research which demonstrates that word proximity
 >    achieves better results than character proximity based on any
 >    well-known IR metric.
 > 2. Doing word proximity at all is a *very* hard problem in the languages
 >    used by a large majority of the world's population.

I think that there might be a disconnect here.  What we're talking
about is minimal-semantic-unit proximity -- for some
languages/contexts, the minimal semantic unit will always be a single
grapheme, and for others, it will be a cluster of one or more

This type of clustering is critical for search engines, which often
(usually?) provide inverse indexes only for minimal semantic units,
not for all graphemes.  The argument, then, is that proximity testing
should be done by counting the units that were indexed, which may or
may not be single graphemes.

All the best,


David Megginson                 david@megginson.com

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS