Faceted navigation best (and 5 of the worst) practices

Wednesday, February 12, 2014

Selecting filters with faceted navigation can cause many URL combinations
Selecting filters with faceted navigation can cause many URL combinations, such as https://www.example.com/category.php?category=gummy-candies&price=5-10&price=over-10

Faceted navigation, such as filtering by color or price range, can be helpful for your visitors, but it's often not search-friendly since it creates many combinations of URLs with duplicative content. With duplicative URLs, search engines may not crawl new or updated unique content as quickly, and/or they may not index a page accurately because indexing signals are diluted between the duplicate versions. To reduce these issues and help faceted navigation sites become as search-friendly as possible, we'd like to:

Background

In an ideal state, unique content—whether an individual product/article or a category of products/articles— would have only one accessible URL. This URL would have a clear click path, or route to the content from within the site, accessible by clicking from the home page or a category page.

  • Clear path that reaches all individual product/article pages

    On the left is potential user navigation on the site (or, the click path), on the right are the pages accessed.
  • One representative URL for category page https://www.example.com/category.php?category=gummy-candies

    Category page for gummy candies
  • One representative URL for individual product page https://www.example.com/product.php?item=swedish-fish

    One representative URL for individual product page

Undesirable duplication caused with faceted navigation

  • Numerous URLs for the same article/product

    • Canonical: example.com/product.php?item=swedish-fish

      Canonical URL for the same product
    • Duplicate: example.com/product.php?item=swedish-fish&category=gummy-candies&price=5-10

      Duplicate URL for the same product

    The same product page for swedish fish can be available on multiple URLs.

  • Numerous category pages that provide little or no value to searchers and search engines), as demontrated in the following table:

    URL example.com/category.php?category=gummy-candies&taste=sour&price=5-10 example.com/category.php?category=gummy-candies&taste=sour&price=over-10
    category page for gummy candies in the price range $5-10 category page for gummy candies that are over $10
    Issues
    • No added value to Google searchers given users rarely search for "sour gummy candy price five to ten dollars".
    • No added value for search engine crawlers that discover same item ("fruit salad") from parent category pages (either "gummy candies" or "sour gummy candies").
    • Negative value to site owner who may have indexing signals diluted between numerous versions of the same category.
    • Negative value to site owner with respect to serving bandwidth and losing crawler capacity to duplicative content rather than new or updated pages.
    • No value for search engines (should have 404 response code).
    • Negative value to searchers.

Worst (search un-friendly) practices for faceted navigation

Worst practice #1: Non-standard URL encoding for parameters, like commas or brackets, instead of key=value& pairs.

Worst practices:

example.com/category?[category:gummy-candy][sort:price-low-to-high][sid:789]
  • Key-value pairs marked with : rather than =.
  • Multiple parameters appended with [ ] rather than &.
example.com/category?category,gummy-candy,,sort,lowtohigh,,sid,789
  • Key-value pairs marked with a , rather than =.
  • Multiple parameters appended with ,, rather than &.

Best practice:

example.com/category?category=gummy-candy&sort=low-to-high&sid=789

While humans may be able to decode odd URL parameters, such as ,,, crawlers have difficulty interpreting URL parameters when they're implemented in a non-standard fashion. Software engineer on Google's Crawling Team, Mehmet Aktuna, says "Using non-standard encoding is just asking for trouble." Instead, connect key-value pairs with an equal sign (=) and append multiple parameters with an ampersand (&).

Worst practice #2: Using directories or file paths rather than parameters to list values that don't change page content.

Worst practice:

Where /c123/ is a category, /s789/ is a session ID that doesn't change page content:

example.com/c123/s789/product?swedish-fish

Good practice:

The directory, /gummy-candy/, changes the page content in a meaningful way:

example.com/gummy-candy/product?item=swedish-fish&sid=789

Best practice:

URL parameters allow more flexibility for search engines to determine how to crawl efficiently.

example.com/product?item=swedish-fish&category=gummy-candy&sid=789

It's difficult for automated programs, like search engine crawlers, to differentiate useful values (for example, gummy-candy) from the useless ones (for example, sessionID) when values are placed directly in the path. On the other hand, URL parameters provide flexibility for search engines to quickly test and determine when a given value doesn't require the crawler access all variations.

Common values that don't change page content and should be listed as URL parameters include:

  • Session IDs
  • Tracking IDs
  • Referrer IDs
  • Timestamp

Worst practice #3: Converting user-generated values into (possibly infinite) URL parameters that are crawlable and indexable, but not useful in search results.

Worst practices:

For example, user-generated values like longitude/latitude or "days ago" as crawlable and indexable URLs:

example.com/find-a-doctor?radius=15&latitude=40.7565068&longitude=-73.9668408
example.com/article?category=health&days-ago=7

Best practices:

example.com/find-a-doctor?city=san-francisco&neighborhood=soma
example.com/articles?category=health&date=january-10-2014

Rather than allow user-generated values to create crawlable URLs—which leads to infinite possibilities with very little value to searchers—perhaps publish category pages for the most popular values, then include additional information so the page provides more value than an ordinary search results page. Alternatively, consider placing user-generated values in a separate directory and then robots.txt disallow crawling of that directory.

example.com/filtering/find-a-doctor?radius=15&latitude=40.7565068&longitude=-73.9668408
example.com/filtering/articles?category=health&days-ago=7

with robots.txt:

User-agent: *
Disallow: /filtering/

Worst practice #4: Appending URL parameters without logic.

Worst practices:

example.com/gummy-candy/lollipops/gummy-candy/gummy-candy/product?swedish-fish
example.com/product?cat=gummy-candy&cat=lollipops&cat=gummy-candy&cat=gummy-candy&item=swedish-fish

Better practice:

example.com/gummy-candy/product?item=swedish-fish

Best practice:

example.com/product?item=swedish-fish&category=gummy-candy

Extraneous URL parameters only increase duplication, causing less efficient crawling and indexing. Therefore, consider stripping unnecessary URL parameters and performing your site's "internal maintenance" before generating the URL. If many parameters are required for the user session, perhaps hide the information in a cookie rather than continually append values like:

cat=gummy-candy&cat=lollipops&cat=gummy-candy&...

Worst practice #5: Offering further refinement (filtering) when there are zero results.

Worst practice:

Allowing users to select filters when zero items exist for the refinement.

category page that lets users select filters when zero items exist

Best practice:

Only create links/URLs when it's a valid user-selection (items exist). With zero items, grey out filtering options. To further improve usability, consider adding item counts next to each filter.

categopry page that greys out filters if there aren't any items

Prevent useless URLs and minimize the crawl space by only creating URLs when products exist. This helps users to stay engaged on your site (fewer clicks on the back button when no products exist), and helps minimize potential URLs known to crawlers. Furthermore, if a page isn't just temporarily out-of-stock, but is unlikely to ever contain useful content, consider returning a 404 status code. With the 404 response, you can include a helpful message to users with more navigation options or a search box to find related products.

Best practices for new faceted navigation implementations or redesigns

New sites that are considering implementing faceted navigation have several options to optimize the "crawl space" (the totality of URLs on your site known to Googlebot) for unique content pages, reduce crawling of duplicative pages, and consolidate indexing signals.

  • Determine which URL parameters are required for search engines to crawl every individual content page (for example, determine what parameters are required to create at least one click-path to each item). Required parameters may include item-id,