OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

 


 

   Re: XML Search Engine

[ Lists Home | Date Index | Thread Index ]
  • From: Fernando Cabral <fernando@pix.com.br>
  • To: xml-dev@ic.ac.uk
  • Date: Thu, 05 Nov 1998 18:13:30 +0200

Borden, Jonathan wrote:

> For example, suppose I am searching for big apples:
>
> "This is a little green apple. Big deal."
>
> will "Big near apple" match?
> how about "Big applied to apple"

This will not be a poblem with any "decent" text retrieval engine because:

a) proximity search can be performed either "ordered" or "non-ordered". This is
    quite powerful because it allows you to search for "big near potato" in the
sentece

        "This is a small potato, big brother"

        either to find both "potato, big"  as well as "big, potato" or only one
of the two.
        Some search engine, like Stairs (the grandfather of all text-retrieval
engines)
        and BRS have two operator like "near" (or "prox") and "ADJacent", the
        first one being unordered, the second one being ordered.

b) Usually search engine know what  phrases and paragraphs are. I don't think
     proximity should go beyond a period or any other punctuation that ends
     a sentence. If you want to search in larger units, like a paragraph, then
     you could always define something like "apple SAME PARAGRAPH big"
     or "apple SAME SENTENCE big", both of with extend the idea
     of "nearness" providing a more logical view of the terms.

c) finally, growing from the very close vicinity (near/adjacent) to a little
    further (same sentence/same paragraph) you can go to the whole
    "universe" with AND, OR, XOR, etc. What this means is that
    you can have a very good control not only on which words you
    want, but also where they, how far apart they can be, which one
    comes first...

d) XML allows you to use all the above operators adding a very
    useful feature: tag-qualification.

- fernando

--
Fernando Cabral                         Padrao iX Sistemas Abertos
mailto:fernando@pix.com.br              http://www.pix.com.br
                                        mailto:Pix@Pix.com.br
Fone: +55 61 321-2433                   Fax: +55 61 225-3082
15º 45' 04.9" S                         47º 49' 58.6" W
19º 37' 57.0" S                         45º 17' 13.6" W



xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)





 

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS