Monday, October 05, 2009
Duplicate content has been a hot topic among webmasters and our blog for over three years. One of
our first posts on the
subject came out in December of '06, and our
most recent post was last
week. Over the past three years, we've been providing tools and tips to help webmasters control
which URLs we crawl and index, including a) use of 301
redirects, b)
www vs. non-www preferred domain setting,
c) change of address option, and d)
rel="canonical"
.
We're happy to announce another feature to assist with managing duplicate content: parameter handling. Parameter handling allows you to view which parameters Google believes should be ignored or not ignored at crawl time, and to overwrite our suggestions if necessary.
Let's take our old example of a site selling Swedish fish. Imagine that your preferred version of the URL and its content looks like this:
https://www.example.com/product.php?item=swedish-fish
However, you may also serve the same content on different URLs depending on how the user navigates around your site, or your content management system may embed parameters such as < code>sessionid:
https://www.example.com/product.php?item=swedish-fish&category=gummy-candy https://www.example.com/product.php?item=swedish-fish&trackingid=1234&sessionid=5678
With the "Parameter Handling" setting, you can now provide suggestions to our crawler to ignore the parameters category, trackingid, and sessionid. If we take your suggestion into account, the net result will be a more efficient crawl of your site, and fewer duplicate URLs.
Since we launched the feature, here are some popular questions that have come up:
Are the suggestions provided a hint or a command?
Your suggestions are considered hints. We'll do our best to take them into account; however, there may be cases when the provided suggestions may do more harm than good for a site.
When do I use parameter handling vs rel="canonical"
?
rel="canonical"
is a great tool to manage duplicate content issues, and has had huge
adoption. The differences between the two options are:
-
rel="canonical"
has to be put on each page, whereas parameter handling is set at the host level -
rel="canonical"
is respected by many search engines, whereas parameter handling suggestions are only provided to Google
Use which option works best for you; it's fine to use both if you want to be very thorough.
As always, your feedback on our new feature is appreciated.