OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: XML-Based Search Engine

> I plan to design an XML-Based Search Engine. Anyone can provide me some
> resources about that? Is it better to search information based on XML than
> DB? Why?
I could go on for days on this one but I will try to be brief.

Building an XML search engine is trivial,  if you ignore the context
provided by XML.  Onthe other hand,  if you plan to make use of the
contextual structure of XML,  you are looking at one of the most difficult
tasks one could ever encounter.

When we built www.goxml.com starting in 1998,  we preserved all the CDATA,
the structure and the relationships between the two.  The result was a true
contextual search engine with an index that grew by 400% the size of each
document indexed - highly undesirable.

The current incarnation (GoXML v 3.01), is writen in C and C++ and uses a
completely different internal mechanism for storing the information.  It is
no longer dependant on any third party database product as the first version
was.  Accordingly,  we can index up to 2 terrabytes of XML data and the
index grows by as small as 3/4 the size of hte original document indexed.

My advice - take a look at the user interface


and use that as a starting point.  A lot of information on Version 2.0 is
available on


Cheers and good luck!!!!

Duane Nickull