[
Lists Home |
Date Index |
Thread Index
]
- From: Walter Underwood <wunder@infoseek.com>
- To: Paul <prescod@prescod.net>
- Date: Fri, 29 Oct 1999 09:15:55 -0800
At 08:14 AM 10/29/99 -0500, Paul wrote:
>On Thu, 28 Oct 1999, Walter Underwood wrote:
>> It may be that markup is not the right hammer for this problem.
>> Our search engine handles multiple DTDs by mapping the elements
>> into common search meta data elements.
>>
>> DC:Creator -> author
>> GILS:Originator -> author
>> TEI:docAuthor -> author
>
>That's relatively easy for a flat model, but what about a deeply
>hierarchical one? Can you do a search for "address 1" vs. "Street" but
>only in "Publisher"? Even more sophisticated, can you recognize that
>"name in publisher" is "publisher name"?
Nope. To do that, you need an XQL-like engine or a repository.
We're aimed at the other 99% of the market.
Also, when I was researching published DTDs, nearly all of them
qualified the sub-elements or used entirely different names, so
that context wasn't necessary: <docAuthor>, <bibAuthor>, <byline>,
whatever. The only tag that was occasionally reused in different
contexts was <title>. There is a heuristic (hack?) to use the
first occurance as the title for the results page. A better
solution than expecting customers to know XPath, then trying
to teach them over the phone.
Our house style is to err on the side of simplicity and ease of
use, because it almost impossible to remove features, even if they
confuse almost everyone and benefit almost no one.
I actually spent more time making sure that sentences were extracted
properly from things like this (with multiple mappings possible):
<title>The <hi type="italic">Ghastly</hi> Happenings at
<event><trademark>Infoseek</trademark>'s Halloween
Party</event></title>
I've got nothing against complex searches, but they don't benefit
our users. In the internet search world, people who type two-word
queries are power users. Really.
wunder
--
Walter R. Underwood
wunder@infoseek.com
wunder@best.com (home)
http://software.infoseek.com/cce/ (my product)
http://www.best.com/~wunder/
1-408-543-6946
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
To unsubscribe, mailto:majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
|