OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help



   Advanced text searching vs XML??? (was Re: [xml-dev] Note from theTroll)

[ Lists Home | Date Index | Thread Index ]

10/28/2002 2:38:19 PM, tblanchard@mac.com wrote:

>The current solution is to use a free form text indexer like verity, 
>autonomy, or the google appliance to handle resumes and other 
>documents, and relational db for structured info.  Text indexers based 
>on interesting fuzzy match and bayesian techniques are rapidly reducing 
>the requirement for markup in document management I think.  Google is 
>an excellent example (and now you can get it in a box).

Hmm, this sounds like a more interesting topic than sectarian squabbles
over the hermeneutics of the XSLT spec.  <hint, hint>

On one hand, one major value of XML to me (employee of XML DBMS vendor)
is to avoid the necessity of separating the "text" from the "structured
info."  While XML DB's don't have the advanced fuzzy/baysean capabilities
of high-end text DBs (yet!), they do have the ability to query for
text matches IN THE CONTEXT OF the structure.  Given a certain amount
of predictability about the tagging of a resume, one could look for 
people with actual EXPERIENCE with some technology combination (Java
on Linux, for example) rather than just "Java" and "Linux" mentioned
somewhere near each other or whatever.  

(I don't know if Verity can do this too with whatever knowledge of tags
that it has, but I sure can't figure out how to do with with Google!)

On the other hand, I must say that for me in daily life, Google allows
all sorts of useful queries that 5 years ago I thought  would
require the widespread adoption of XML and XML-based format standards
(e.g., for resumes).  Certainly many of the claims/proposals of metadata
advocates 5 years ago look a bit shopworn in hindsight now that we
see how well Google does by ignoring all (most?) metadata other than the
linking patterns.  Likewise (playing troll and jumping out from 
under one of my favorite bridges) the Semantic Web vision seems a lot
less compelling after experiencing Google for a few years than it might
in a Google-less world.  Why invest in all that metadata when Google 
a) will ignore it anyway and b) does 80% of what the metadata would
allow with ZERO additional effort by web authors/developers?

Do others think that this trend will continue (for the Web, not for
aircraft maintenance manuals or public safety agencies, please!) ?
To what extent does putting heuristic smarts in the indexing/search
engine rather than structured tags in the text take us where we want to
go?  Or are we headed toward a local maxima that merely distracts
ordinary users from the need to learn markup/metadata and Google
from the need to support XML and/or RDF to achieve a more global


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS