OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

 


 

   Re: "Multiple" Namespaces? (but NOT for HTML)

[ Lists Home | Date Index | Thread Index ]
  • From: Walter Underwood <wunder@infoseek.com>
  • To: Paul <prescod@prescod.net>
  • Date: Fri, 29 Oct 1999 09:15:55 -0800

At 08:14 AM 10/29/99 -0500, Paul wrote:
>On Thu, 28 Oct 1999, Walter Underwood wrote:
>> It may be that markup is not the right hammer for this problem.
>> Our search engine handles multiple DTDs by mapping the elements
>> into common search meta data elements.
>> 
>>    DC:Creator      -> author
>>    GILS:Originator -> author
>>    TEI:docAuthor   -> author
>
>That's relatively easy for a flat model, but what about a deeply 
>hierarchical one? Can you do a search for "address 1" vs. "Street" but 
>only in "Publisher"? Even more sophisticated, can you recognize that 
>"name in publisher" is "publisher name"?

Nope. To do that, you need an XQL-like engine or a repository.
We're aimed at the other 99% of the market.

Also, when I was researching published DTDs, nearly all of them 
qualified the sub-elements or used entirely different names, so 
that context wasn't necessary: <docAuthor>, <bibAuthor>, <byline>, 
whatever. The only tag that was occasionally reused in different 
contexts was <title>. There is a heuristic (hack?) to use the 
first occurance as the title for the results page. A better 
solution than expecting customers to know XPath, then trying
to teach them over the phone.

Our house style is to err on the side of simplicity and ease of 
use, because it almost impossible to remove features, even if they
confuse almost everyone and benefit almost no one.

I actually spent more time making sure that sentences were extracted
properly from things like this (with multiple mappings possible):

   <title>The <hi type="italic">Ghastly</hi> Happenings at 
      <event><trademark>Infoseek</trademark>'s Halloween
      Party</event></title>

I've got nothing against complex searches, but they don't benefit
our users. In the internet search world, people who type two-word
queries are power users. Really.

wunder
--
Walter R. Underwood
wunder@infoseek.com
wunder@best.com (home)
http://software.infoseek.com/cce/ (my product)
http://www.best.com/~wunder/
1-408-543-6946

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
To unsubscribe, mailto:majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)






 

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS