Stay organized with collections
Save and categorize content based on your preferences.
Wednesday, September 20, 2006
Lately I've heard a couple smart
people ask that
search engines provide a way know that a bot is authentic. After all, any spammer could name
their bot "Googlebot" and claim to be Google, so which bots do you trust and which do you block?
The common request we hear is to post a list of Googlebot IP addresses in some public place. The
problem with that is that if/when the IP ranges of our crawlers change, not everyone will know
to check. In fact, the crawl team migrated Googlebot IPs a couple years ago and it was a real
hassle alerting webmasters who had hard-coded an IP range. So the crawl folks have provided
another way to authenticate Googlebot. Here's an answer from one of the crawl people (quoted
with their permission):
Telling webmasters to use DNS to verify on a case-by-case basis seems like the best way to go. I
think the recommended technique would be to do a reverse DNS lookup, verify that the name is in
the googlebot.com domain, and then do a corresponding forward DNS->IP lookup using that
googlebot.com name. For example:
> host 66.249.66.1
1.66.249.66.in-addr.arpa domain name pointer crawl-66-249-66-1.googlebot.com.
> host crawl-66-249-66-1.googlebot.com
crawl-66-249-66-1.googlebot.com has address 66.249.66.1
I don't think just doing a reverse DNS lookup is sufficient, because a spoofer could set up
reverse DNS to point to crawl-a-b-c-d.googlebot.com.
This answer has also been provided to our help-desk, so I'd consider it an official way to
authenticate Googlebot. In order to fetch from the "official" Googlebot IP range, the bot has to
respect robots.txt and our internal hostload conventions so that Google doesn't crawl you too
hard.
(Thanks to N. and J. for help on this answer from the crawl side of things.)
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Missing the information I need","missingTheInformationINeed","thumb-down"],["Too complicated / too many steps","tooComplicatedTooManySteps","thumb-down"],["Out of date","outOfDate","thumb-down"],["Samples / code issue","samplesCodeIssue","thumb-down"],["Other","otherDown","thumb-down"]],[],[[["This blog post may contain outdated information, including broken links or missing images."],["Google recommends verifying Googlebot by performing a reverse DNS lookup followed by a forward DNS lookup to confirm the IP address and hostname match."],["Simply relying on reverse DNS lookup isn't enough, as spammers can potentially spoof it."],["Googlebot operates within official IP ranges and adheres to robots.txt rules and hostload conventions to avoid overloading websites."],["Webmasters can use these verification methods to ensure that only legitimate Googlebot accesses their sites."]]],[]]