Lists Home |
Date Index |
On Aug 23, 2005, at 5:53 PM, Jason Hunter wrote:
> Wolfgang Hoschek wrote:
>> If all you need is to index the document's flat text instead of
>> XML documents with their structure, that's straighforward, yes.
>> But presumably you'll want to combine structured search (e.g.
>> XPath navigation and predicates) with unstructured fulltext
>> search, and now you're in database terrain, wrt. choosing cost
>> effective persistent index data structures and execution plans
>> for a mix of the main expected queries/data types/access patterns/
>> read&write frequency, plus static and/or dynamic query
>> optimization, including materialized view maintenance,
>> transactional updates, etc. All well known problems, now in the
>> context of XML and fulltext search, but without easy solutions
> In case people aren't aware, Mark Logic is doing exactly this.
> They (er, we, as they're my current employer) combine indexed XPath
> evaluation with full text search, scale beyond the limits of
> memory, do the management of transactional updates, and so on. I'm
> glad to see discussion about this idea here because it's a very
> cool one and something that I hope catches on widely as a meme.
> More info and a free low-end version at http://xqzone.marklogic.com.
Jason, one question is how far "beyond the limits of [main] memory",
and under what circumstances? How expressive is the fulltext search
language? How *exactly* is it doing it? I find it difficult to
believe that scheme xyz (including marklogic's) cannot easily be
driven into resource exhaustion given the huge parameter space,
unless working within a very carefully planned set of simplifying
constraints and assumptions. In other words, an easy general-purpose
solution doesn't exist in this area.