xml-dev - Advanced text searching vs XML??? (was Re: [xml-dev] Note from theTroll)

Advanced text searching vs XML??? (was Re: [xml-dev] Note from theTroll)

[ Lists Home | Date Index | Thread Index ]

To: xml-dev@lists.xml.org
Subject: Advanced text searching vs XML??? (was Re: [xml-dev] Note from theTroll)
From: Mike Champion <mc@xegesis.org>
Date: Mon, 28 Oct 2002 15:07:03 -0500
In-reply-to: <CFE22634-EAAC-11D6-BCF1-0030657E2F34@mac.com>

10/28/2002 2:38:19 PM, tblanchard@mac.com wrote:

>
>The current solution is to use a free form text indexer like verity, 
>autonomy, or the google appliance to handle resumes and other 
>documents, and relational db for structured info.  Text indexers based 
>on interesting fuzzy match and bayesian techniques are rapidly reducing 
>the requirement for markup in document management I think.  Google is 
>an excellent example (and now you can get it in a box).

Hmm, this sounds like a more interesting topic than sectarian squabbles
over the hermeneutics of the XSLT spec.  <hint, hint>

On one hand, one major value of XML to me (employee of XML DBMS vendor)
is to avoid the necessity of separating the "text" from the "structured
info."  While XML DB's don't have the advanced fuzzy/baysean capabilities
of high-end text DBs (yet!), they do have the ability to query for
text matches IN THE CONTEXT OF the structure.  Given a certain amount
of predictability about the tagging of a resume, one could look for 
people with actual EXPERIENCE with some technology combination (Java
on Linux, for example) rather than just "Java" and "Linux" mentioned
somewhere near each other or whatever.  

(I don't know if Verity can do this too with whatever knowledge of tags
that it has, but I sure can't figure out how to do with with Google!)

On the other hand, I must say that for me in daily life, Google allows
all sorts of useful queries that 5 years ago I thought  would
require the widespread adoption of XML and XML-based format standards
(e.g., for resumes).  Certainly many of the claims/proposals of metadata
advocates 5 years ago look a bit shopworn in hindsight now that we
see how well Google does by ignoring all (most?) metadata other than the
linking patterns.  Likewise (playing troll and jumping out from 
under one of my favorite bridges) the Semantic Web vision seems a lot
less compelling after experiencing Google for a few years than it might
in a Google-less world.  Why invest in all that metadata when Google 
a) will ignore it anyway and b) does 80% of what the metadata would
allow with ZERO additional effort by web authors/developers?

Do others think that this trend will continue (for the Web, not for
aircraft maintenance manuals or public safety agencies, please!) ?
To what extent does putting heuristic smarts in the indexing/search
engine rather than structured tags in the text take us where we want to
go?  Or are we headed toward a local maxima that merely distracts
ordinary users from the need to learn markup/metadata and Google
from the need to support XML and/or RDF to achieve a more global
optima?

References:
- Re: [xml-dev] Note from the Troll
  - From: tblanchard@mac.com

Prev by Date: Re: [xml-dev] What is "semantic markup"?
Next by Date: Re: [xml-dev] Advanced text searching vs XML??? (was Re: [xml-dev] Note from ...
Previous by thread: Re: [xml-dev] Note from the Troll
Next by thread: Re: [xml-dev] Note from the Troll
Index(es):
- Date
- Thread