OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Indexing XML


Thanks for the help with my question on searching, I am now the 
proud owner of a copy of fgrep and the dbxml source which makes 
for great bedtime reading ;).

Having looked into things in more details I see that more than just 
searching through documents for instances of specific text strings I 
would like to index documents with keywords. More than that the 
keywords need to be hierarchical and the indexes 
importable/exportable between systems.

For example:
Suppose I have a large number of cooking recipes in the form of 
XML documents (it makes a change form the ubiquitous 'customer 
orders' example :)) and I want to categorise them according to 
place of origin. This would allow a user to search for all Italian 
recipes or all British recipes.
A particular user may be a bit of a connoissuer and wants to be 
more specific with his indexing so he adds catagories for Naples, 
Rome and Sicily. He can now search for Neopolitan recipes 
specifically but if he searches on Italy the Naples recipes will still be 
found (i.e the categories/keywords are hierarchical).
Furthermore his friend wants borrow his recipe for Neopolitan ice-
cream (bad example but I can't think of any other Neopolitan 
dishes!) but he doesn't have the category/keyword for Naples so 
the indexing information needs to be exported as well. Another 
possibility is that the friend doesn't have the keyword Naples but 
does have the keyword/category Southern Italy so there is a 
question about merging user-defined categories (although I see 
this as requiring user-input).

There are a number of issues here, including whether to store the 
indexing information with the document or in a separate file (and 
recombine them for exporting) and whether to use single or multiple 
elements to implement the 'hierarchical' indexing scheme.
Any tips/hints/ideas/resources/etc for implementation schemes will 
be gratefully received.

Many thanks,

Phil Ruelle