Lists Home |
Date Index |
This is getting off-topic from what I originally hoped for :)
I was trying to start a conversation about using an XML indexing system
in a particular problem domain - not something as grand as what you and
Alan are are talking about.
To me using something like Lucene provides for a much faster and much
less memory consuming process to search large collections of XML
documents or relational databases (whether they are story XML or not).
Bullard, Claude L (Len) wrote:
> IF it is really a full-text indexing system,
> it scans and infers topics the same way a human
> scans and tags. It would require a rule base
> perhaps similar to a Schematron assertion engine.
> Past attempts made several passes over content
> to create a series of tagged documents that are
> successively refined. However, as in memory-based
> patterning systems, the more abstract the links,
> the more opinionated the system. Such systems
> can become very superstitious in exactly the same
> way people do. How was it phrased: "A schema is
> an opinion about a document..."
> From: Robert Koberg [mailto:firstname.lastname@example.org]
>> Which is why I'd propose defining a full-text schema language,
>> so XML content can be described to a full-text search engine.
> It does sound very interesting. How would it work? What would it look
> like? I have tried doing this with XML Schema but gave up. I had tried
> to use annotations to give weight to different things, then I tried to
> make a type system. For me, it was just easier to write java to handle
> it. Now I write org.xml.sax.ext.DefaultHandler2's that suit my needs. I
> know, not very scalable or user friendly.