xml-dev - Re: xml search engine?

Re: xml search engine?

[ Lists Home | Date Index | Thread Index ]

From: Jerome McDonough <jmcdonou@library.berkeley.edu>
To: xml-dev@xml.org
Date: Wed, 29 Mar 2000 11:43:21 -0500

At 11:43 AM 03/29/2000 +0200, Reinout van Rees wrote:
>There is a problem I see for xml search engines. How are they going to
>cope with all the various DTD's? They ARE going to cope, but what will
>be the result? Will we have lots of small search engines searching for
>information in all reinforced_concrete_supplier.dtd xml files it can
>find and another for all medicine.dtd info? Will there be a few
>standard elements in most DTD's to comply to some emerging behaviour
>of all search engines? There are so many ways this could work out. Any
>opinions? 
>

I suspect you will see both small, topic/discipline-specific search engines
and
a variety of efforts to build more encompassing systems that allow
cross-domain
searching either by explicit agreements on mappings between different DTD
vocabularies or by use of probabilistic techniques to try to translate users'
queries
into appropriate terminology for use in searching against different
vocabularies
(Profs. Buckland and Larson of my program have been doing work on this
form of bridging between entry vocabularies; see
http://sims.berkeley.edu/research/metadata
if you want more details).  Which is to say, there are times when I would want
a search engine that only searched medicine.dtd documents and provided unique
features that exploited knowledge of the DTD to assist me searching, and other
times when I'd be willing to settle for somewhat less sophistication in order
to
achieve a wider search.  One of the first things I learned in library school
was that
it was a mistake to assume that there was one best type of access/search
mechanism, and
that in fact it was usually better to provide several different ways of
getting
at data so
that users can employ a search mechanism more tailored to their current needs.
I think the future is likely to hold a mix of small, custom-tailored search
engines and
larger, more generic systems.

In terms of allowing the more global search engines to assist users in
exploiting XML's
potential, I'm hoping that HTTP will eventually be abandoned in favor of a
protocol
which provides better support for information retrieval.  I don't expect the
world to
make a mad rush to adopt Z39.50, but its concept of use attributes, which
allow
you to specify, for example, that you specifically want to search for a
corporate author
or a geographic name associated with a resource, provides a useful common
language 
for expressing searches that can then be mapped to the elements within a
particular DTD
by those making the information available.  It would be nice if search engines
which were
indexing a data repository could start by asking the repository for
information
on how the
XML elements used in documents within the repository map to some standard set
of
search attributes.  With any luck, as XML becomes increasingly available over
the WWW,
we'll see some movement towards adopting communication protocols which allow
us to exploit its potential more fully.

Jerome McDonough -- jmcdonou@library.Berkeley.EDU  |  (......)
Library Systems Office, 386 Doe, U.C. Berkeley     |  \ *  * /
Berkeley, CA 94720-6000    (510) 642-5168          |  \  <>  /
"Well, it looks easy enough...."                   |   \ -- /  SGNORMPF!!!
         -- From the Famous Last Words file        |    ||||

***************************************************************************
This is xml-dev, the mailing list for XML developers.
To unsubscribe, mailto:majordomo@xml.org&BODY=unsubscribe%20xml-dev
List archives are available at http://xml.org/archives/xml-dev/
***************************************************************************

References:
- xml search engine?
  - From: Reinout van Rees <rr@cti036.citg.tudelft.nl>

Prev by Date: RE: media types of stylesheets
Next by Date: Re: xml search engine?
Previous by thread: Re: xml search engine?
Next by thread: Re: xml search engine?
Index(es):
- Date
- Thread