[
Lists Home |
Date Index |
Thread Index
]
- From: Walter Underwood <wunder@infoseek.com>
- To: Paul Prescod <paul@prescod.net>, xml-dev <xml-dev@ic.ac.uk>
- Date: Wed, 01 Sep 1999 10:52:42 -0700
At 07:31 AM 9/1/99 -0400, Paul Prescod wrote:
>David Megginson wrote:
>>
>> Paul Prescod writes:
>>
>> > What is the virtue in discovering XHTML data in an arbitrary
>> > document if there are *no rules* about what that information will
>> > look like? Are you really going to write processors that do not
>> > care whether images occur within titles or tables within images?
>>
>> Sure -- a search engine is a very good example of one.
>
>Really? Search engines don't care whether <title>s have images in them?
>Or whether <h1>'s have <table>'s in them? I'm sure that there are some
>that don't but I'm equally sure that there are some that do.
Ours doesn't. It recognizes some tags as a place to break sentences
for natural language processing, and it looks for the first undecorated
text in the document to use as a summary. It also saves text from
inside an <a> tag to index with the referenced document (no, Google
didn't do it first).
But it doesn't care whether <title> has an image, or which kind of
sentence-breaking tag is used (<p>, <blockquote>, <td>, ...).
Hmm, the "strict" variant makes looking for undecorated text
more difficult. I doubt that we'll interpret a stylesheets in
order to index text. So anbody who wants to use "strict" had
better be ready to put in "description" meta tags.
wunder
--
Walter R. Underwood
wunder@infoseek.com
wunder@best.com (home)
http://software.infoseek.com/cce/ (my product)
http://www.best.com/~wunder/
1-408-543-6946
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
|