Re: [xml-dev] Can Searchbots Find Web Pages That Aren't Linked To?

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]

From: Liam Quin <liam@w3.org>
To: "Costello, Roger L." <costello@mitre.org>
Date: Sun, 9 Mar 2008 19:58:27 -0400

On Sun, Mar 09, 2008 at 07:45:03AM -0400, Costello, Roger L. wrote:
> Hi Folks,
> 
> I was interested in knowing if searchbots can find web pages that
> aren't link to.
> 
> So, I conducted a simple experiment:
> 
> http://www.xfront.com/can-searchbots-find-unlinked-web-pages/index.html

The conclusion doesn't entirely follow.

First, as others have said, check your server logs.

Second, there are multiple discovery methods that Web search
engines can use.  Following links is one of them and of course
the best known, but others include

* if there are ads on the page, the URL becomes known when the
  adverts are shown

* if there are links on the "hidden" page to Web pages on other servers,
  and you follow them, most Web browsers send the linked-to server 
  the URI from which the link was followed, as part of the HTTP
  header (it's called the referrer, the HTTP Referer [sic] header).

* if you mention the URI in email it can get found :-)

* if the files are in a directory that can be listed, search
  engines will find them -- e.g. if you make
      http://www.example.org/people/friendly/david.html
  public, the search engines might well look for
      http://www.example.org/people/friendly/
  and
      http://www.example.org/people/
  so if any of those gives a listing of directory contents, the
  search engines will explore.

* some search engines (and many viruses) probe for common filenames.

Without such hints, though, neither the search engines nor anyone
else will find the file, regardless of format.

Liam

-- 
Liam Quin, W3C XML Activity Lead, http://www.w3.org/People/Quin/
http://www.holoweb.net/~liam/ * http://www.fromoldbooks.org/

Follow-Ups:
- Re: [xml-dev] Can Searchbots Find Web Pages That Aren't Linked To?
  - From: "bryan rasmussen" <rasmussen.bryan@gmail.com>

References:
- Can Searchbots Find Web Pages That Aren't Linked To?
  - From: "Costello, Roger L." <costello@mitre.org>

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]