How Google crawls locale-adaptive pages

If your site has locale-adaptive pages (that is, your site returns different content based on the perceived country or preferred language of the visitor), Google might not crawl, index, or rank all your content for different locales. This is because the default IP addresses of the Googlebot crawler appear to be based in the USA. In addition, the crawler sends HTTP requests without setting Accept-Language in the request header.

Geo-distributed crawling

Googlebot crawls with IP addresses based outside the USA, in addition to the US-based IP addresses.

As we have always recommended, when Googlebot appears to come from a certain country, treat it like you would treat any other user from that country. This means that if you block USA-based users from accessing your content, but allow visitors from Australia to see it, your server should block a Googlebot that appears to be coming from the USA, but allow access to a Googlebot that appears to come from Australia.

Other considerations

  • Googlebot uses the same user agent string for all crawling configurations. Learn more about the user agent strings used by Google crawlers.
  • You can verify Googlebot geo-distributed crawls using reverse DNS lookups.
  • Make sure that your site applies the robots exclusion protocol consistently for every locale. This means that robots meta tags and the robots.txt file must specify the same rules in each locale.