[[["易于理解","easyToUnderstand","thumb-up"],["解决了我的问题","solvedMyProblem","thumb-up"],["其他","otherUp","thumb-up"]],[["没有我需要的信息","missingTheInformationINeed","thumb-down"],["太复杂/步骤太多","tooComplicatedTooManySteps","thumb-down"],["内容需要更新","outOfDate","thumb-down"],["翻译问题","translationIssue","thumb-down"],["示例/代码问题","samplesCodeIssue","thumb-down"],["其他","otherDown","thumb-down"]],["最后更新时间 (UTC):2009-08-01。"],[[["\u003cp\u003eGooglebot has limited resources and can only crawl and index a portion of the web's content, so site architecture is crucial for efficient crawling.\u003c/p\u003e\n"],["\u003cp\u003eWell-structured URLs help search engines easily access and understand website content, while disorganized URLs waste crawl resources.\u003c/p\u003e\n"],["\u003cp\u003eRemoving unnecessary URL parameters, managing infinite crawl spaces, and disallowing irrelevant actions for Googlebot improves crawl efficiency.\u003c/p\u003e\n"],["\u003cp\u003eEnsure each unique piece of content has one corresponding URL, using canonicalization if needed, to optimize crawling and indexing.\u003c/p\u003e\n"],["\u003cp\u003eOptimizing your website's crawlability allows Googlebot to discover and index valuable content more effectively.\u003c/p\u003e\n"]]],["Search engine crawlers navigate websites via URLs; simplifying these URLs is crucial for efficient crawling. Key actions include removing irrelevant URL parameters, managing infinite crawl spaces like calendars or excessive pagination, and disallowing non-functional pages (e.g., login pages) in `robots.txt`. Ideally, each URL should lead to unique content. Using cookies for session data, employing `301` redirects for cleaner URLs, and the `rel=\"canonical\"` tag can streamline crawling and indexing processes.\n"],null,["# Optimize your crawling and indexing\n\nMonday, August 10, 2009\n| It's been a while since we published this blog post. Some of the information may be outdated (for example, some images may be missing, and some links may not work anymore). For current information, check out our [Advanced guide to how Search works](/search/docs/fundamentals/how-search-works).\n\n\nMany questions about website architecture, crawling and indexing, and even ranking issues can be\nboiled down to one central issue:\n**How easy is it for search engines to crawl your site?**\nWe've spoken on this topic at a number of recent events, and below you'll find our presentation\nand some key takeaways on this topic.\n\n\n[The Internet is a *big*place](https://googleblog.blogspot.com/2008/07/we-knew-web-was-big.html);\nnew content is being created all the time. Google has a finite number of resources, so when faced\nwith the nearly-infinite quantity of content that's available online, Googlebot is only able to\nfind and crawl a percentage of that content. Then, of the content we've crawled, we're only able\nto index a portion.\n\n\nURLs are like the bridges between your website and a search engine's crawler: crawlers need to be\nable to find and cross those bridges (that is, find and crawl your URLs) in order to get to your\nsite's content. If your URLs are complicated or redundant, crawlers are going to spend time\ntracing and retracing their steps; if your URLs are organized and lead directly to distinct\ncontent, crawlers can spend their time accessing your content rather than crawling through empty\npages, or crawling the same content over and over via different URLs.\n\n\nIn the slides above you can see some examples of what *not* to do---real-life examples\n(though names have been changed to protect the innocent) of homegrown URL hacks and encodings,\nparameters masquerading as part of the URL path, infinite crawl spaces, and more. You'll also\nfind some recommendations for straightening out that labyrinth of URLs and helping crawlers find\nmore of your content faster, including:\n\n- **Remove user-specific details from URLs.** URL parameters that don't change the content of the page---like session IDs or sort order---can be removed from the URL and put into a cookie. By putting this information in a cookie and [`301` redirecting](/search/docs/crawling-indexing/301-redirects) to a \"clean\" URL, you retain the information and reduce the number of URLs pointing to that same content.\n- **Rein in infinite spaces.** Do you have a calendar that links to an infinite number of past or future dates (each with their own unique URL)? Do you have paginated data that returns a [status code of `200`](/search/docs/crawling-indexing/http-network-errors) when you add `&page=3563` to the URL, even if there aren't that many pages of data? If so, you have an [infinite crawl space](/search/blog/2008/08/to-infinity-and-beyond-no) on your website, and crawlers could be wasting their (and your!) bandwidth trying to crawl it all. Consider [these tips](https://www.google.com/support/webmasters/bin/answer.py?answer=76401) for reining in infinite spaces.\n- **Disallow actions Googlebot can't perform.** Using your [robots.txt file](/search/docs/crawling-indexing/robots/intro), you can disallow crawling of login pages, contact forms, shopping carts, and other pages whose sole functionality is something that a crawler can't perform. (Crawlers are notoriously cheap and shy, so they don't usually \"Add to cart\" or \"Contact us.\") This lets crawlers spend more of their time crawling content that they can actually do something with.\n- **One man, one vote. One URL, one set of content.** In an ideal world, there's a one-to-one pairing between URL and content: each URL leads to a unique piece of content, and each piece of content can only be accessed via one URL. The closer you can get to this ideal, the more streamlined your site will be for crawling and indexing. If your CMS or current site setup makes this difficult, you can [use the `rel=\"canonical\"` element](/search/docs/crawling-indexing/consolidate-duplicate-urls) to indicate the preferred URL for a particular piece of content.\n\n\nIf you have further questions about optimizing your site for crawling and indexing, check out some\nof our [previous writing](/search/help/crawling-index-faq) on the subject, or stop by\nour\n[Help Forum](https://support.google.com/webmasters/community).\n\n\nPosted by\n[Susan Moskwa](/search/blog/authors/susan-moskwa),\nWebmaster Trends Analyst"]]