OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

 


 

   Re: [xml-dev] indexing and querying XML (not XQuery)

[ Lists Home | Date Index | Thread Index ]
  • To: Michael Kay <mike@saxonica.com>
  • Subject: Re: [xml-dev] indexing and querying XML (not XQuery)
  • From: Wolfgang Hoschek <whoschek@lbl.gov>
  • Date: Wed, 24 Aug 2005 12:09:36 -0700
  • Cc: "'Robert Koberg'" <rob@koberg.com>, "'XML Developers List'" <xml-dev@lists.xml.org>
  • In-reply-to: <200508241855.j7OItRDs018779@postala.lbl.gov>
  • References: <200508241855.j7OItRDs018779@postala.lbl.gov>

Yes, they're different in nature. XQuery is for structured querying,  
fulltext search is for fuzzy similarity search.

However, there are interesting emerging synergies when combining them  
in compound queries: "Find me the facts (structured), where some  
aspects of the (textual) criteria are inherently fuzzy and blurred  
(fulltext)". A lot of data integrated over diverse autonomous data  
sources with unknown schemas turns out to be blurred, making classic  
structured search difficult to apply.

I'd see structured and unstructured search as complementary  
capabilities rather than mutually exclusive.

Jason can probably tell you what their customers are looking for when  
they combine the two capabilities.

Wolfgang.

On Aug 24, 2005, at 11:50 AM, Michael Kay wrote:

> I've only been skimming this thread, but I think a point that needs  
> to be
> made is that XQuery and full-text are trying to do very different  
> things.
> Full-text queries take the form "find me documents about sales of  
> tomatoes",
> database queries take the form "how many tomatoes did we sell last  
> month?".
> They are thus quite different animals: "find me what's been written  
> on the
> subject" versus "tell me the facts".
>
> Michael Kay
> http://www.saxonica.com/
>
>
>
>
>> -----Original Message-----
>> From: Wolfgang Hoschek [mailto:whoschek@lbl.gov]
>> Sent: 23 August 2005 19:59
>> To: Robert Koberg
>> Cc: XML Developers List
>> Subject: Re: [xml-dev] indexing and querying XML (not XQuery)
>>
>> A starting point may be MemoryIndex (http://dsd.lbl.gov/nux/api/
>> index.html), which can be used for Lucene fulltext search over
>> comparatively small *transient main memory* XML documents, for
>> example as in Nux XQuery. Note, however, that it is not
>> straightforward to extend XML fulltext quering over transient main
>> memory to XML fulltext quering over huge persistent XML document
>> collections; the underlying technology is bound to be vastly
>> different wrt. data management, indexing and transactional
>> properties, though the high level search API may indeed remain
>> indentical or similar.
>>
>> Wolfgang.
>>
>> On Aug 23, 2005, at 6:05 AM, Robert Koberg wrote:
>>
>>
>>
>>> Hi,
>>>
>>> Someone on the Lucene user's list posted a link to this paper:
>>>
>>> http://www.idealliance.org/papers/xmle02/dx_xmle02/papers/
>>> 03-02-08/03-02-08.html
>>>
>>> that talks about indexing and searching XML documents. I have been
>>> doing something similar for a while (3 years, I think) but it is
>>> specific to our configuration/content which probably doesn't have
>>> wider applicability. I have also found it to be:
>>>
>>> "a fast, reliable XML search engine, which has exceeded our
>>> expectations in terms of flexibility and low development cost."
>>>
>>> I was thinking the article would be of interest to many people
>>> here. I was also wondering about your thoughts on this method of
>>> dealing with XML. I have not looked in depth at XQuery, and I am
>>> wondering what strengths/benefits XQuery would have over using
>>> something like Lucene to index/query XML.
>>>
>>> It would be interesting to see what folk from this list would come
>>> up with if they put their brains to work on ways to handle
>>>
>> indexing/
>>
>>> searching with something like Lucene.
>>>
>>> best,
>>> -Rob
>>>
>>>
>>> -----------------------------------------------------------------
>>> The xml-dev list is sponsored by XML.org <http://www.xml.org>, an
>>> initiative of OASIS <http://www.oasis-open.org>
>>>
>>> The list archives are at http://lists.xml.org/archives/xml-dev/
>>>
>>> To subscribe or unsubscribe from this list use the subscription
>>> manager: <http://www.oasis-open.org/mlmanage/index.php>
>>>
>>>
>>>
>>
>>
>>
>> -----------------------------------------------------------------
>> The xml-dev list is sponsored by XML.org <http://www.xml.org>, an
>> initiative of OASIS <http://www.oasis-open.org>
>>
>> The list archives are at http://lists.xml.org/archives/xml-dev/
>>
>> To subscribe or unsubscribe from this list use the subscription
>> manager: <http://www.oasis-open.org/mlmanage/index.php>
>>
>>
>>
>
>
>
> -----------------------------------------------------------------
> The xml-dev list is sponsored by XML.org <http://www.xml.org>, an
> initiative of OASIS <http://www.oasis-open.org>
>
> The list archives are at http://lists.xml.org/archives/xml-dev/
>
> To subscribe or unsubscribe from this list use the subscription
> manager: <http://www.oasis-open.org/mlmanage/index.php>
>





 

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS