OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: Indexing XML

I would recommend having a look at the XML Topic Maps specification [1] and
various topic map tutorials [2][3]. Topic maps were designed from the get-go
to enable the merging of indexes and also support the idea of hierarchical
indexing with ease.

Basically, you could create topics for Italy, Southern Italy Naples, Rome
and Sicily, create associations such as "Southern Italy is in Italy",
"Sicily is in Southern Italy", "<your recipe here> is regional recipe of
Sicily", enabling a query for "recipes from italy" to be "topics playing the
role of recipe in a 'is regioanl recipe of' relationship with a topic P
where P is in a transitive 'is in' relationship with the topic Italy".

Using the Published Subject Indicators feature of XTM would give your users
a way of establishing that the topic named Italy is in fact the European
country and not the international football team, for example.

[1] http://www.topicmaps.org/xtm/1.0/
[2] http://topicmaps.bond.edu.au/tutorial1/topics.mc
[3] http://www.ontopia.net/topicmaps/learn_more.html



> -----Original Message-----
> From: Phil Ruelle [mailto:philr@iplbath.com]
> Sent: 21 May 2001 09:28
> To: xml-dev@lists.xml.org
> Subject: Indexing XML
> All,
> Thanks for the help with my question on searching, I am now the
> proud owner of a copy of fgrep and the dbxml source which makes
> for great bedtime reading ;).
> Having looked into things in more details I see that more than just
> searching through documents for instances of specific text strings I
> would like to index documents with keywords. More than that the
> keywords need to be hierarchical and the indexes
> importable/exportable between systems.
> For example:
> Suppose I have a large number of cooking recipes in the form of
> XML documents (it makes a change form the ubiquitous 'customer
> orders' example :)) and I want to categorise them according to
> place of origin. This would allow a user to search for all Italian
> recipes or all British recipes.
> A particular user may be a bit of a connoissuer and wants to be
> more specific with his indexing so he adds catagories for Naples,
> Rome and Sicily. He can now search for Neopolitan recipes
> specifically but if he searches on Italy the Naples recipes will still be
> found (i.e the categories/keywords are hierarchical).
> Furthermore his friend wants borrow his recipe for Neopolitan ice-
> cream (bad example but I can't think of any other Neopolitan
> dishes!) but he doesn't have the category/keyword for Naples so
> the indexing information needs to be exported as well. Another
> possibility is that the friend doesn't have the keyword Naples but
> does have the keyword/category Southern Italy so there is a
> question about merging user-defined categories (although I see
> this as requiring user-input).
> There are a number of issues here, including whether to store the
> indexing information with the document or in a separate file (and
> recombine them for exporting) and whether to use single or multiple
> elements to implement the 'hierarchical' indexing scheme.
> Any tips/hints/ideas/resources/etc for implementation schemes will
> be gratefully received.
> Many thanks,
> Phil Ruelle
> ------------------------------------------------------------------
> The xml-dev list is sponsored by XML.org, an initiative of OASIS
> <http://www.oasis-open.org>
> The list archives are at http://lists.xml.org/archives/xml-dev/
> To unsubscribe from this elist send a message with the single word
> "unsubscribe" in the body to: xml-dev-request@lists.xml.org