OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help



   Re: xml search engine?

[ Lists Home | Date Index | Thread Index ]
  • From: David Megginson <david@megginson.com>
  • To: xml-dev@XML.ORG
  • Date: 01 Apr 2000 10:04:22 -0400

Walter Underwood <wunder@infoseek.com> writes:

> >I think that anyone who lived through the excitement, hope, and
> >disappointment of the AI craze in the 1980's (academic) and early
> >1990's (commercial) would have to be very foolish to make any
> >different claim unless they could back it up with running,
> >production-grade software.
> Ultraseek Server is definately production-grade. We added XML
> as a supported document type in September 1998. It's a modern,
> high-quality IR engine, that scales to millions of documents
> and millions of queries per day. And you can buy it now.

I might have clipped Tim Bray's original posting a little too
aggressively by removing Michael Kay's comment.  Here's Tim's original 
message (minus the asbestos-underwear comment):

  At 10:46 AM 3/30/00 +0100, Kay Michael wrote:

  >I really think you need to distinguish between a query engine and a
  >search engine. Query engines answer questions like "find me all
  >documents that have an <xyz> element as the third grandchild of an
  ><abc> element". Search engines answer questions like "have you got
  >anything about the causes of hyperinflation in inter-war Germany?"

  I think what you're talking about would normally be called an
  Information- Retrieval (IR) system.  Such a system is distinguished
  from traditional search engines in the general case in that nobody
  has ever successfully built one that, in the general case, works.

He wasn't talking about recognizing XML markup as context or anything
like that, but about actually understanding the information well
enough to answer general questions.  

Admittedly, with clever weighting, search/query engines can make good
guesses a lot of the time.  For example, I tried posing Michael Kay's
question verbatim to infoseek.go.com, and got the following top five

1. A page defining "hyperinflation"
2. A paper on the breakdown of democracy in interwar Europe
3. A paper on economics in interwar France.
4. A paper on A.P. Taylor and the origins of WWII.
5. The outline of a book chapter on hyperinflation.

That's pretty good, but it's also not really fair, because the terms
"inter-war" and "hyperinflation" make it easy to narrow things down.

Try something like "what free operating systems have mp3 support?" and
you have to wade through many more hits before you find useful

All the best,


David Megginson                 david@megginson.com

This is xml-dev, the mailing list for XML developers.
To unsubscribe, mailto:majordomo@xml.org&BODY=unsubscribe%20xml-dev
List archives are available at http://xml.org/archives/xml-dev/


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS