[
Lists Home |
Date Index |
Thread Index
]
'Alan Gutierrez' wrote:
> * Robert Koberg <rob@koberg.com> [2005-08-23 10:42]:
>
>>Bullard, Claude L (Len) wrote:
>>
>>>Index what? Ideas, ideas emerging from conversations, the conversations?
>>>So far, what you are describing seems to be Google. Can you out Google
>>>Google?
>>
>>It is not like google. Google indexes HTML and it gives better rankings
>>to well marked up (according to google) HTML (which is why small
>>companies like us can get page rankings as high or higher than much
>>larger companies).
>>
>>With an XML indexer, you can index glossentries, faqs, quizes, whatever
>>and keep them separate so if you want to run a query against just faqs,
>>you can.
>>
>>You can do a search to get all external links (we distinguish between
>>external, internal and whatever other kind of links there might be) and
>>validate them.
>>
>>You can also use the searches to do things you might do with XQuery
>>(again, I don't know XQuery...). For example, in our CMS we have the
>>concept of page regions. Content pieces are assigned to folder/page
>>regions. Say I want to find out where a content piece has been assigned.
>>I can run a query on all assignments to return references to the
>>pages/folders where it has been assigned. You can do searches for all
>>users in a particular group, all projects that a user has access to,
>>etc.. etc...
>
>
> Which is why I'd propose defining a full-text schema language,
> so XML content can be described to a full-text search engine.
It does sound very interesting. How would it work? What would it look
like? I have tried doing this with XML Schema but gave up. I had tried
to use annotations to give weight to different things, then I tried to
make a type system. For me, it was just easier to write java to handle
it. Now I write org.xml.sax.ext.DefaultHandler2's that suit my needs. I
know, not very scalable or user friendly.
best,
-Rob
>
> The langauge would permit ranking based on markup, define what
> constitues a document, what constitutes a document collection, etc.
>
> --
> Alan Gutierrez - alan@engrm.com
> - http://engrm.com/blogometer/index.html
> - http://engrm.com/blogometer/rss.2.0.xml
>
|