Tuesday, November 01, 2011
As the web evolves, Google's crawling and indexing capabilities also need to progress. We
improved our indexing of Flash, built
a more robust
infrastructure called Caffeine,
and we even started
crawling forms where it makes
sense. Now, especially with the growing popularity of JavaScript and, with it, AJAX, we're
finding more web pages requiring POST
requests—either for the entire content of
the page or because the pages are missing information and/or look completely broken without the
resources returned from POST
. For Google Search this is less than ideal, because when
we're not properly discovering and indexing content, searchers may not have access to the most
comprehensive and relevant results.
We generally advise to use
GET
for fetching resources a page needs, and this is by far our preferred method of crawling. We've
started experiments to rewrite POST
requests to GET
, and while this
remains a valid strategy in some cases, often the contents returned by a web server for
GET
vs. POST
are completely different. Additionally, there are
legitimate reasons to use POST
(for example, you can attach more
data to a POST
request than a GET
). So, while GET
requests
remain far more common, to surface more content on the web, Googlebot may now perform
POST
requests when we believe it's safe and appropriate.
We take precautions to avoid performing any task on a site that could result in executing an
unintended user action. Our POST
requests are primarily for crawling resources that
a page requests automatically, mimicking what a typical user would see when they open the URL in
their browser. This will evolve over time as we find better heuristics, but that's our current
approach.
Let's run through a few POST
request scenarios that demonstrate how we're improving
our crawling and indexing to evolve with the web.
Examples of Googlebot's POST
requests
- Crawling a page via a POST redirect
<html> <body onload="document.foo.submit();"> <form name="foo" action="request.php" method=