Lists Home |
Date Index |
Paul T wrote:
> > > Google does *exactly* this (and also Google
> > > provides a cached copy of the original content)
> > >
> > > That means:
> > >
> > > Either both HGRAB and Google should be sued,
> > > because they both sell the content
> > > *which does not belong to them*, or both
> > > HGRAB and Google should be considered
> > > 'just a service'.
> > Have a look at http://www.google.com/robots.txt
> I don't understand your point. Could you pelase
> Because HGRAB, for example, is
> usually polling only home page of the website,
> they are all allowed for polling.
Not all. Some sites "Disallow: /".
> Also, I'm not sure if search engines do
> really care about the robot.txt, but that's another
Googlebot does , and that answers your question about the difference
between it and HGRAB.
> Also, the interesting twist is that when the
> robot encounters the website with *no*
> robots.txt ( most of the sites have no robots.txt )
> the robot assumes that it is *safe* for him to
> 'steal' the content.
No twist here; "if it [robots.txt] was not present [then] all robots
will consider themselves welcome" .
> I think this is really gray area and
> robots.txt is not a solution.
> At the moment, at least.
It isn't. It is just a machine-readable version of , kindly provided
by Google for your crawling convenience. robots.txt has no legal meaning
; you probably can't be sued for disregarding it. But you can for
breaking sites' TOS agreements.