Google 搜索是一款全自动搜索引擎,会使用名为“网页抓取工具”的软件定期探索网络,找出可添加到 Google 索引中的网页。实际上,Google 搜索结果中收录的大多数网页都不是手动提交的,而是我们的网页抓取工具在探索网络时找到并自动添加的。本文档从网站的角度介绍了 Google 搜索运作方式的各个阶段。掌握这些基础知识可以帮助您解决抓取问题、让您的网页编入索引,并且了解如何优化您的网站在 Google 搜索结果中的呈现效果。
开始之前的一些注意事项
在深入了解 Google 搜索的运作方式之前,请务必注意,Google 不会通过收取费用来提高网站抓取频率或网站排名。任何与此不符的消息均是子虚乌有。
Google 不保证一定会抓取您的网页、将其编入索引或在搜索结果中显示您的网页,即使您的网页遵循 Google 搜索要素也是如此。
第一阶段是找出网络上存在哪些网页。不存在包含所有网页的中央注册表,因此 Google 必须不断搜索新网页和更新过的网页,并将其添加到已知网页列表中。此过程称为“网址发现”。由于 Google 之前已经访问过某些网页,因此这些网页是 Google 已知的网页。在提取已知网页上指向新网页的链接时,Google 会发现其他网页,例如类别网页等中心页会链接到新的博文。当您以列表形式(站点地图)提交一系列网页供 Google 抓取时,Google 也会发现其他网页。
Google 发现网页的网址后,可能会访问(或“抓取”)该网页以了解其中的内容。我们使用大量计算机抓取网络上的数十亿个网页。执行抓取任务的程序叫做 Googlebot(也称为抓取工具、漫游器或“蜘蛛”程序)。Googlebot 使用算法流程确定要抓取的网站、抓取频率以及要从每个网站抓取的网页数量。Google 的抓取工具也经过编程,确保不会过快地抓取网站,避免网站收到过多请求。此机制基于网站的响应(例如,HTTP 500 错误意味着“降低抓取速度”)。
[[["易于理解","easyToUnderstand","thumb-up"],["解决了我的问题","solvedMyProblem","thumb-up"],["其他","otherUp","thumb-up"]],[["没有我需要的信息","missingTheInformationINeed","thumb-down"],["太复杂/步骤太多","tooComplicatedTooManySteps","thumb-down"],["内容需要更新","outOfDate","thumb-down"],["翻译问题","translationIssue","thumb-down"],["示例/代码问题","samplesCodeIssue","thumb-down"],["其他","otherDown","thumb-down"]],["最后更新时间 (UTC):2025-08-04。"],[[["\u003cp\u003eGoogle Search discovers, analyzes, and ranks web pages to deliver relevant search results to users.\u003c/p\u003e\n"],["\u003cp\u003eThe three stages of Google Search are crawling, indexing, and serving search results.\u003c/p\u003e\n"],["\u003cp\u003eCrawling involves discovering and fetching web pages using automated programs called crawlers.\u003c/p\u003e\n"],["\u003cp\u003eIndexing involves analyzing the content and metadata of web pages to understand their topic and relevance.\u003c/p\u003e\n"],["\u003cp\u003eServing search results involves selecting and ranking relevant pages from the index based on user queries and various factors.\u003c/p\u003e\n"]]],["Google Search operates in three stages: crawling, indexing, and serving. Crawling involves automated web crawlers (Googlebot) discovering and downloading content (text, images, videos) from web pages. Indexing analyzes this content, determining its relevance and canonical status, storing it in Google's database. Serving involves matching user queries with indexed pages and displaying the most relevant results, considering factors like user location and device. Google does not accept payment for crawling, indexing or ranking and can't guarantee that the content will be crawled, indexed or served.\n"],null,["# In-Depth Guide to How Google Search Works | Google Search Central\n\nIn-depth guide to how Google Search works\n=========================================\n\n\nGoogle Search is a fully-automated search engine that uses software known as web crawlers that\nexplore the web regularly to find pages to add to our index. In fact, the vast majority of\npages listed in our results aren't manually submitted for inclusion, but are found and added\nautomatically when our web crawlers explore the web. This document explains the stages of how\nSearch works in the context of your website. Having this base knowledge can help you fix\ncrawling issues, get your pages indexed, and learn how to optimize how your site appears in\nGoogle Search.\n| Looking for something less technical? Check out our [How Search Works site](https://www.google.com/search/howsearchworks/), which explains how Search works from a searcher's perspective.\n\nA few notes before we get started\n---------------------------------\n\n\nBefore we get into the details of how Search works, it's important to note that Google doesn't\naccept payment to crawl a site more frequently, or rank it higher. If anyone tells you\notherwise, they're wrong.\n\n\nGoogle doesn't guarantee that it will crawl, index, or serve your page, even if your page\nfollows the [Google Search Essentials](/search/docs/essentials).\n\nIntroducing the three stages of Google Search\n---------------------------------------------\n\nGoogle Search works in three stages, and not all pages make it through each stage:\n\n1. [**Crawling:**](#crawling) Google downloads text, images, and videos from pages it found on the internet with automated programs called crawlers.\n2. [**Indexing:**](#indexing) Google analyzes the text, images, and video files on the page, and stores the information in the Google index, which is a large database.\n3. [**Serving search results:**](#serving) When a user searches on Google, Google returns information that's relevant to the user's query.\n\nCrawling\n--------\n\n\nThe first stage is finding out what pages exist on the web. There isn't a central registry of\nall web pages, so Google must constantly look for new and updated pages and add them to its\nlist of known pages. This process is called \"URL discovery\". Some pages are known because\nGoogle has already visited them. Other pages are discovered when Google extracts a link from a\nknown page to a new page: for example, a hub page, such as a category page, links to a new\nblog post. Still other pages are discovered when you submit a list of pages (a\n[sitemap](/search/docs/crawling-indexing/sitemaps/overview)) for Google to crawl. \n\n\nOnce Google discovers a page's URL, it may visit (or \"crawl\") the page to find out what's on\nit. We use a huge set of computers to crawl billions of pages on the web. The program that\ndoes the fetching is called [Googlebot](/search/docs/crawling-indexing/googlebot)\n(also known as a crawler, robot, bot, or spider). Googlebot uses an algorithmic process to\ndetermine which sites to crawl, how often, and how many pages to fetch from each site.\n[Google's crawlers](/search/docs/crawling-indexing/overview-google-crawlers)\nare also programmed such that they try not to crawl the site too fast to avoid overloading it.\nThis mechanism is based on the responses of the site (for example,\n[HTTP 500 errors mean \"slow down\"](/search/docs/crawling-indexing/http-network-errors#http-status-codes)).\n\n\nHowever, Googlebot doesn't crawl all the pages it discovered. Some pages may be\n[disallowed for crawling](/search/docs/crawling-indexing/robots/robots_txt#disallow) by the\nsite owner, other pages may not be accessible without logging in to the site.\n\n\nDuring the crawl, Google renders the page and\n[runs any JavaScript it finds](/search/docs/crawling-indexing/javascript/javascript-seo-basics#how-googlebot-processes-javascript)\nusing a recent version of\n[Chrome](https://www.google.com/chrome/), similar to how your\nbrowser renders pages you visit. Rendering is important because websites often rely on\nJavaScript to bring content to the page, and without rendering Google might not see that\ncontent.\n\n\nCrawling depends on whether Google's crawlers can access the site. Some common issues with\nGooglebot accessing sites include:\n\n- [Problems with the server handling the site](/search/docs/crawling-indexing/http-network-errors#http-status-codes)\n- [Network issues](/search/docs/crawling-indexing/http-network-errors#network-and-dns-errors)\n- [robots.txt rules preventing Googlebot's access to the page](/search/docs/crawling-indexing/robots/intro)\n\nIndexing\n--------\n\n\nAfter a page is crawled, Google tries to understand what the page is about. This stage is\ncalled indexing and it includes processing and analyzing the textual content and key content\ntags and attributes, such as\n[`\u003ctitle\u003e` elements](/search/docs/appearance/title-link)\nand alt attributes,\n[images](/search/docs/appearance/google-images),\n[videos](/search/docs/appearance/video), and\nmore. \n\n\nDuring the indexing process, Google determines if a page is a\n[duplicate of another page on the internet or canonical](/search/docs/crawling-indexing/consolidate-duplicate-urls).\nThe canonical is the page that may be shown in search results. To select the canonical, we\nfirst group together (also known as clustering) the pages that we found on the internet that\nhave similar content, and then we select the one that's most representative of the group. The\nother pages in the group are alternate versions that may be served in different contexts, like\nif the user is searching from a mobile device or they're looking for a very specific page from\nthat cluster.\n\n\nGoogle also collects signals about the canonical page and its contents, which may be used in\nthe next stage, where we serve the page in search results. Some signals include the language\nof the page, the country the content is local to, and the usability of the page.\n\n\nThe collected information about the canonical page and its cluster may be stored in the Google\nindex, a large database hosted on thousands of computers. Indexing isn't guaranteed; not every\npage that Google processes will be indexed.\n\n\nIndexing also depends on the content of the page and its metadata. Some common indexing issues\ncan include:\n\n- [The quality of the content on page is low](/search/docs/essentials)\n- [Robots `meta` rules disallow indexing](/search/docs/crawling-indexing/block-indexing)\n- [The design of the website might make indexing difficult](/search/docs/crawling-indexing/javascript/javascript-seo-basics)\n\nServing search results\n----------------------\n\n| Google doesn't accept payment to rank pages higher, and ranking is done programmatically. [Learn more about ads on Google Search](https://www.google.com/search/howsearchworks/our-approach/ads-on-search/).\n\n\nWhen a user enters a query, our machines search the index for matching pages and return the\nresults we believe are the highest quality and most relevant to the user's query. Relevancy is\ndetermined by hundreds of factors, which could include information such as the user's\nlocation, language, and device (desktop or phone). For example, searching for \"bicycle repair\nshops\" would show different results to a user in Paris than it would to a user in Hong Kong. \n\n\nBased on the user's query the search features that appear on the search results page also\nchange. For example, searching for \"bicycle repair shops\" will likely show local results and\nno [image results](/search/docs/appearance/visual-elements-gallery#image-result),\nhowever searching for \"modern bicycle\" is more likely to show image results, but not local\nresults. You can explore the most common UI elements of Google web search in our\n[Visual Element gallery](/search/docs/appearance/visual-elements-gallery).\n\n\nSearch Console might tell you that a page is indexed, but you don't see it in search results.\nThis might be because:\n\n- [The content on the page is irrelevant to users' queries](/search/docs/fundamentals/seo-starter-guide#expect-search-terms)\n- [The quality of the content is low](/search/docs/essentials)\n- [Robots `meta` rules prevent serving](/search/docs/crawling-indexing/block-indexing)\n\n\nWhile this guide explains how Search works, we are always working on improving our algorithms.\nYou can keep track of these changes by following the\n[Google Search Central blog](/search/blog)."]]