How HTTP status codes, and network and DNS errors affect Google Search
This page describes how different
HTTP status codes,
network errors, and DNS errors affect Google Search. We cover the top 20 status codes that
Googlebot
encountered on the web, and the most prominent network and DNS errors. More exotic status
codes, such as
418 (I'm a teapot)
,
aren't covered. All issues mentioned on this page generate a corresponding error or warning in
Search Console's
Page Indexing report.
HTTP status codes
HTTP status codes are generated by the server that's hosting the site when it responds to a request made by a client, for example a browser or a crawler. Every HTTP status code has a different meaning, but often the outcome of the request is the same. For example, there are multiple status codes that signal redirection, but their outcome is the same.
Search Console generates error messages for status codes in the 4xx–5xx
range,
and for failed redirections (3xx
). If the server responded with a
2xx
status code, the content received in the response may be considered for
indexing.
The following table contains the most encountered HTTP status codes by Googlebot and an explanation how Google handles each status code.
HTTP status codes | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
|
Google considers the content for indexing. If the content suggests an error, for example
an empty page or an error message, Search Console will show a
|
|||||||||||
|
Googlebot follows up to 10 redirect hops. If the crawler doesn't receive content within 10 hops, Search Console will show a redirect error in the site's Page Indexing report. The number of hops Googlebot follows is user agent dependent; for example, Googlebot Smartphone may have a different value than Googlebot Image.
In case of robots.txt, Googlebot follows at least five redirect hops as defined by
RFC 1945 and then
stops and treats it as a Any content Googlebot received from the redirecting URL is ignored, and the final target URL's content is considered for indexing.
|
|||||||||||
|
Google's indexing pipeline doesn't consider URLs that return a
Any content Googlebot received from URLs that return a
|
|||||||||||
|
If the robots.txt file returns a server error status code for more than 30 days, Google will use the last cached copy of the robots.txt. If unavailable, Google assumes that there are no crawl restrictions.
Any content Googlebot received from URLs that return a
|
soft 404
errors
A soft 404
error is when a URL that returns a page telling the user that the page does
not exist and also a
200 (success)
status code. In some cases, it might be a page with no main content or empty page.
Such pages may be generated for various reasons by your website's web server or content management system, or the user's browser. For example:
- A missing server-side include file.
- A broken connection to the database.
- An empty internal search result page.
- An unloaded or otherwise missing JavaScript file.
It's a bad user experience to return a 200 (success)
status code, but then
display or suggest an error message or some kind of error on the page. Users may think the
page is a live working page, but then are presented with some kind of error. Such pages are
excluded from Search.
When Google's algorithms detect that the page is actually an error page based on its content,
Search Console will show a soft 404
error in the site's
Page Indexing report.
Fix soft 404
errors
Depending on the state of the page and the desired outcome, you can solve soft 404
errors in multiple ways:
- The page and content are no longer available.
- The page or content is now somewhere else.
- The page and content still exist.
Try to determine which solution would be the best for your users.
The page and content are no longer available
If you removed the page and there's no replacement page on your site with similar content,
return a
404 (not found)
or 410 (gone)
response (status) code for the page. These status codes indicate to search engines that the
page doesn't exist and the content should not be indexed.
If you have access to your server's configuration files, you can make these error pages useful
to users by customizing them. A good custom 404
page helps people find the
information they're looking for, and also provides other helpful content that encourages
people to explore your site further. Here are some tips for designing a useful custom
404
page:
- Tell visitors clearly that the page they're looking for can't be found. Use language that is friendly and inviting.
-
Make sure your
404
page has the same look and feel (including navigation) as the rest of your site. - Consider adding links to your most popular articles or posts, as well as a link to your site's home page.
- Think about providing a way for users to report a broken link.
Custom 404
pages are created solely for users. Since these pages are useless from
a search engine's perspective, make sure the server returns a 404
HTTP status
code to prevent having the pages indexed.
The page or content is now somewhere else
If your page has moved or has a clear replacement on your site, return a
301 (permanent redirect)
to redirect the user. This will not interrupt their browsing experience and it's also a great
way to tell search engines about the new location of the page. Use the
URL Inspection tool
to verify whether your URL is actually returning the correct code.
The page and content still exist
If an otherwise good page was flagged with a soft 404
error, it's likely it
didn't load properly for Googlebot, it was missing critical resources, or it displayed a
prominent error message during rendering. Use the
URL Inspection tool
to examine the rendered content and the returned HTTP code. If the rendered page is blank,
nearly blank, or the content has an error message, it could be that your page references many
resources that can't be loaded (images, scripts, and other non-textual elements), which can be
interpreted as a soft 404
.
Reasons that resources can't be loaded include blocked resources (blocked by
robots.txt), having too many
resources on a page, various server errors, or slow loading or very large resources.
Network and DNS errors
Network and DNS errors have quick, negative effects on a URL's presence in Google Search.
Googlebot treats network timeouts, connection reset, and DNS errors similarly to
5xx
server errors. In case of network errors, crawling immediately starts
slowing down, as a network error is a sign that the server may not be able to handle the
serving load. Since Googlebot couldn't reach the server hosting the site, Google also hasn't
received any content from the server. The lack of content means that Google can't index the
crawled URLs, and already indexed URLs that are unreachable will be removed from Google's
index within days. Search Console may generate errors for each respective error.
Debug network errors
These errors happen before Google starts crawling a URL or while Google is crawling the URL. Since the errors may occur before the server can respond and so there's no status code that can hint at issues, diagnosing these errors can be more challenging. To debug timeout and connection reset errors:
- Look at your firewall settings and logs. There may be an overly-broad blocking rule set. Make sure that Googlebot IP addresses are not blocked by any firewall rule.
- Look at the network traffic. Use tools like tcpdump and Wireshark to capture and analyze TCP packets, and look for anomalies that point to a specific network component or server module.
- If you can't find anything suspicious, contact your hosting company.
The error may be in any server component that handles network traffic. For example, overloaded
network interfaces may drop packets leading to timeouts (inability to establish a connection)
and reset connections (RST
packet sent because a port was mistakenly closed).
Debug DNS errors
DNS errors are most commonly caused by misconfiguration, but they may be also caused by a firewall rule that's blocking Googlebot DNS queries. To debug DNS errors, do the following:
-
Inspect your firewall rules. Make sure that
none of Google's IPs
are blocked by any firewall rule, and that both
UDP
andTCP
requests are allowed. -
Look at your DNS records. Double check that your
A
andCNAME
records are pointing to the right IP addresses and hostname, respectively. For example:dig +nocmd example.com a +noall +answer
dig +nocmd www.example.com cname +noall +answer
-
Check that all your name servers are pointing to the correct IP addresses of your
site. For example:
dig +nocmd example.com ns +noall +answer
example.com. 86400 IN NS a.iana-servers.net. example.com. 86400 IN NS b.iana-servers.net.dig +nocmd @a.iana-servers.net example.com +noall +answer
example.com. 86400 IN A 93.184.216.34dig +nocmd @b.iana-servers.net example.com +noall +answer
... - If you've made changes to your DNS configuration within the last 72 hours, you may need to wait for your changes to propagate across the global DNS network. To speed up propagation, you can flush Google's Public DNS cache.
- If you're running your own DNS server, make sure it's healthy and that it's not overloaded.