To infinity and beyond? No!

Tuesday, August 05, 2008

When Googlebot crawls the web, it often finds what we call an "infinite space". These are very large numbers of links that usually provide little or no new content for Googlebot to index. If this happens on your site, crawling those URLs may use unnecessary bandwidth, and could result in Googlebot failing to completely index the real content on your site.

Recently, we started notifying site owners when we discover this problem on their web sites. Like most messages we send, you'll find them in Webmaster Tools in the Message Center. You'll probably want to know right away if Googlebot has this problem—or other problems—crawling your sites. So verify your site with Webmaster Tools, and check the Message Center every now and then.

webmaster tools message informing the owner of a site about an infinite space

Examples of an infinite space

The classic example of an "infinite space" is a calendar with a "Next Month" link. It may be possible to keep following those "Next Month" links forever! Of course, that's not what you want Googlebot to do. Googlebot is smart enough to figure out some of those on its own, but there are a lot of ways to create an infinite space and we may not detect all of them.

example calendar page showing no event for a specific day

Another common scenario is websites which provide for filtering a set of search results in many ways. A shopping site might allow for finding clothing items by filtering on category, price, color, brand, style, etc. The number of possible combinations of filters can grow exponentially. This can produce thousands of URLs, all finding some subset of the items sold. This may be convenient for your users, but is not so helpful for the Googlebot, which just wants to find everything&mdashl;once!

Correcting infinite space issues

Our Webmaster Tools Help article describes more ways infinite spaces can arise, and provides recommendations on how to avoid the problem. One fix is to eliminate whole categories of dynamically generated links using your robots.txt file. The Help Center has lots of information on how to use robots.txt. If you do that, don't forget to verify that Googlebot can find all your content some other way. Another option is to block those problematic links with a nofollow link attribute. If you'd like more information on nofollow links, check out the Webmaster Help Center.