OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

 


 

   RE: [xml-dev] XML indexing/search engine

[ Lists Home | Date Index | Thread Index ]

If an XML-enabled RDBMS which supports importation and exportation of XML is an option then  the combination of SQL Server and SQLXML does the job very well. More info at 
 
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dnexxml/html/xml07162001.asp

	-----Original Message----- 
	From: Tim Bray [mailto:tbray@textuality.com] 
	Sent: Fri 8/23/2002 11:04 AM 
	To: Roth, Scott (ITD) 
	Cc: xml-dev@lists.xml.org 
	Subject: Re: [xml-dev] XML indexing/search engine
	
	

	Roth, Scott (ITD) wrote:
	> Hi -
	>
	> I am starting to design an application that will be a calendaring/event engine for the State of Massachusetts and all of its agencies (Department of Public Health, Registry of Motor Vehicles, etc...).  We plan on putting an appropriate calendar event schema in place, and then starting to generate 1 XML file per event (public hearing, course, forum, workshop, whatever...).  This will build up quite a large amount of small XML files quickly.  My question is this - what is the best way to store these files for easy indexing and searching?  The actual files will be stored in our content management system, so I am not worried about updating the information - merely being able to efficiently query the collection.  Apache's Xindice seems to be the frontrunner so far.  I am envisioning storing the collection in Xindice and returning a nodeset to my XSL that contains file names that match whatever the query was.  The XSL is then free to iterate through each matching file using the d
	ocument function and grab whatever information for display that the current page requires.  Is there other software that I should be considering?  Other approaches?
	
	The idea of making this information available in XML is a good one and I
	salute Massachusets for this progressive and sensible move.  Publishing
	the schema is smart too.  Of course, just because you're going to make
	it available in XML doesn't mean you have to store/maintain the data in
	XML.  Could you put an output filter on your content-management system
	and hook it up to the web with one of the many gateway products?
	
	Of course, many CM systems don't take kindly to a high volume of queries
	& exports (as in choke, fall over, die, lock up)... maybe you could
	batch-dump this stuff into a simple rdbms (oracle, mysql, whatever), and
	gateway to that while XMLifying the export; these things tend to search
	well and hold up under query loads.  Does the retrieval really need to
	be full-text or could fielded query search out of an RDBMS handle it?
	
	Summary: XML for export and interchange is totally the way to go.  How
	you get there?  Acronyms that begin with X aren't that relevant.
	
	Now all the XDBMS vendors are going to complain about my lack of
	fidelity to the religion of the XML data model, oh well.
	
	> I am anxious to get this right, as this will be the model for other statewide templatizing applications - for example, press releases.
	
	It shouldn't be *that* hard.  Once you do it, let us know how it went,
	or submit a paper to one of the conferences or something.  -Tim
	
	
	-----------------------------------------------------------------
	The xml-dev list is sponsored by XML.org <http://www.xml.org>, an
	initiative of OASIS <http://www.oasis-open.org>
	
	The list archives are at http://lists.xml.org/archives/xml-dev/
	
	To subscribe or unsubscribe from this list use the subscription
	manager: <http://lists.xml.org/ob/adm.pl>
	
	





 

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS