xml-dev - RE: [xml-dev] indexing and querying XML (not XQuery)

RE: [xml-dev] indexing and querying XML (not XQuery)

[ Lists Home | Date Index | Thread Index ]

To: 'Robert Koberg' <rob@koberg.com>
Subject: RE: [xml-dev] indexing and querying XML (not XQuery)
From: "Bullard, Claude L (Len)" <len.bullard@intergraph.com>
Date: Tue, 23 Aug 2005 14:01:34 -0500
Cc: 'XML Developers List' <xml-dev@lists.xml.org>

That's an implementation issue for searching the docs. 
I have to look at Lucene.  I assumed you meant loading 
the schema itself.  This is the tricky bit: if the 
schema is not really informative about the content of 
text nodes, you have to use full-text searching and 
that is a very large job for any sizable collection. 
deRose is the expert on this topic though.

I'm referring to a system that loads the Schema(s) in 
order to create a search GUI, then let's you save 
the search criteria.  Ad hoc search requirements for 
relational systems do this.  Typical implementations 
scour the table objects, get the table and field 
names, then load the GUI with those.   An application
specific version of that preloads those and might 
attach conditions based on other rules such as security, 
need to know, role-based stuff, etc.  In most systems 
I've worked on, one has sorted out the tables by roles 
into directories, so it is easy to load Crystal by 
pointing it at the directory.   I've written apps 
that do the same thing by querying the table objects 
and creating table tables and field tables with 
relationships (eg, use the table names and field names 
plus their keys to do the sorting and relating). I 
did it in the context of creating a schema document generator 
for a relational systemm, that is, the output is a 
hypertext document in multiple formats for the table 
schemas to enable someone to write queries and do 
conversion work.  Taking those same metatables and 
feeding a query generator is easy enough and adding 
a topic table to that to save the queries to and feeding 
a treeview object from that is a piece of cake.

Loading all of that from all of the docs is like searching 
all of the available tables to get that info.  Doable but 
not for the faint of resources.  Add full-text to that and it 
becomes a job for Google farms.  How well would Google 
work if they weren't cacheing the web?

len

From: Robert Koberg [mailto:rob@koberg.com]

Bullard, Claude L (Len) wrote:
> Then what you want are the equivalent of 
> parameterizable stored queries, somewhat 
> the equivalent of the way engines like 
> Crystal Reports work.  Feed them a schema 
> and they will give you back a UI with all of 
> the queriable values from which you select, 
> parameterize and store the queries.  It's 
> a report generator for XML documents.

But wouldn't that require loading *all* the XML docs to be searched into 
memory? (I don't know Crystal Reports) (my work is in a webapp 
environment). Using Lucene provides an extremely fast result set using 
minimal memory.

Follow-Ups:
- Re: [xml-dev] indexing and querying XML (not XQuery)
  - From: Robert Koberg <rob@koberg.com>
- Re: [xml-dev] indexing and querying XML (not XQuery)
  - From: Robert Koberg <rob@koberg.com>

Prev by Date: Re: [xml-dev] indexing and querying XML (not XQuery)
Next by Date: Re: [xml-dev] indexing and querying XML (not XQuery)
Previous by thread: Re: [xml-dev] indexing and querying XML (not XQuery)
Next by thread: Re: [xml-dev] indexing and querying XML (not XQuery)
Index(es):
- Date
- Thread