OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

 


 

   Re: Who needs XHTML Namespace?

[ Lists Home | Date Index | Thread Index ]
  • From: Walter Underwood <wunder@infoseek.com>
  • To: Paul Prescod <paul@prescod.net>, xml-dev <xml-dev@ic.ac.uk>
  • Date: Wed, 01 Sep 1999 10:52:42 -0700

At 07:31 AM 9/1/99 -0400, Paul Prescod wrote:
>David Megginson wrote:
>> 
>> Paul Prescod writes:
>> 
>>  > What is the virtue in discovering XHTML data in an arbitrary
>>  > document if there are *no rules* about what that information will
>>  > look like? Are you really going to write processors that do not
>>  > care whether images occur within titles or tables within images?
>> 
>> Sure -- a search engine is a very good example of one.
>
>Really? Search engines don't care whether <title>s have images in them?
>Or whether <h1>'s have <table>'s in them? I'm sure that there are some
>that don't but I'm equally sure that there are some that do.

Ours doesn't. It recognizes some tags as a place to break sentences
for natural language processing, and it looks for the first undecorated
text in the document to use as a summary. It also saves text from
inside an <a> tag to index with the referenced document (no, Google
didn't do it first).

But it doesn't care whether <title> has an image, or which kind of
sentence-breaking tag is used (<p>, <blockquote>, <td>, ...).

Hmm, the "strict" variant makes looking for undecorated text
more difficult. I doubt that we'll interpret a stylesheets in 
order to index text. So anbody who wants to use "strict" had
better be ready to put in "description" meta tags.

wunder

--
Walter R. Underwood
wunder@infoseek.com
wunder@best.com (home)
http://software.infoseek.com/cce/ (my product)
http://www.best.com/~wunder/
1-408-543-6946

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)






 

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS