OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]
Re: [xml-dev] Can Searchbots Find Web Pages That Aren't Linked To?

>  From: "Costello, Roger L." <costello@mitre.org>
>  I was interested in knowing if searchbots can find web pages that
>  aren't link to.
>  So, I conducted a simple experiment:
You asked a simple question and you got a simple answer which is
correct but doesn't really help. The discussion about sitemaps reveals
a more complex picture:

a) Rather than asking "if searchbots can find" a web page, I would ask
whether the searchbot has actually found and crawled a web page. This
can be answered by looking at the server log files. If a searchbot
_has_ crawled a web page it may include it in the index (sooner or
later) _or_ not.

b) You are offering 2 pages with duplicate content, one xhtml and one
xml. Now the best choice a search engine can make is to index both but
show only the xhtml to the user. An option to show even the 2nd page
would be perfect ("repeat the search with the omitted results

Hope this helps,


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 1993-2007 XML.org. This site is hosted by OASIS