OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

 


 

   Re: [xml-dev] indexing and querying XML (not XQuery)

[ Lists Home | Date Index | Thread Index ]

* Robert Koberg <rob@koberg.com> [2005-08-23 09:06]:
> Hi,
> 
> Someone on the Lucene user's list posted a link to this paper:

> http://www.idealliance.org/papers/xmle02/dx_xmle02/papers/03-02-08/03-02-08.html

> that talks about indexing and searching XML documents. I have been doing 
> something similar for a while (3 years, I think) but it is specific to 
> our configuration/content which probably doesn't have wider 
> applicability. I have also found it to be:

> "a fast, reliable XML search engine, which has exceeded our expectations 
> in terms of flexibility and low development cost."

> I was thinking the article would be of interest to many people here. I 
> was also wondering about your thoughts on this method of dealing with 
> XML. I have not looked in depth at XQuery, and I am wondering what 
> strengths/benefits XQuery would have over using something like Lucene to 
> index/query XML.

> It would be interesting to see what folk from this list would come up 
> with if they put their brains to work on ways to handle 
> indexing/searching with something like Lucene.

    Len was in a thread a while back, on Web 2.0, where I posited
    the notion of a REST interface to full text search of syndicated
    feeds, or blogs.

    While we're at it, Len, did you think about that any further?

    Reading through the article, the thing that strikes me is that
    it that full text search of an XML document depends so much on
    the structure of the document. If that document can be divided
    into chapters, messages, articles, pages, etc, then it's best to
    create a full-text index with application specific documents.

    So, perhaps, the scaleable solution, is full-text engine that
    is fed a XML documents, and a full-text indexing schema.

    The existing schema langauges like to atomize documents, while a
    full-text indexing schema might group their elements into
    concepts, like paths, links, articles, and clues for ranking
    articles based on conditions specified in XPath.

    I've wanted to explore the use of Lucene in my document object
    model, so I'd like to hear more about this.

--
Alan Gutierrez - alan@engrm.com
    - http://engrm.com/blogometer/index.html
    - http://engrm.com/blogometer/rss.2.0.xml




 

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS