In a recent episode of Google’s Search Off the Record podcast, Allan Scott from the “Dups” team explained how Google decides which URL to consider as the main one when there are duplicate pages.
He revealed that Google looks at about 40 different signals to pick the main URL from a group of similar pages.
Around 40 Signals For Canonical URL Selection
Duplicate content is a common problem for search engines because many websites have multiple pages with the same or similar content.
To solve this, Google uses a process called canonicalization. This process allows Google to pick one URL as the main version to index and show in search results.
Google has discussed the importance of using signals like rel=”canonical” tags, sitemaps, and 301 redirects for canonicalization. However, the number of signals involved in this process is more than you may expect.
Scott revealed during the podcast:
“I’m not sure what the exact number is right now because it goes up and down, but I suspect it’s somewhere in the neighborhood of 40.”
Some of the known signals mentioned include:
rel=”canonical” tags
301 redirects
HTTPS vs. HTTP
Sitemaps
Internal linking
URL length
The weight and importance of each signal may vary, and some signals, like rel=”canonical” tags, can influence both the clustering and canonicalization process.
Balancing Signals
With so many signals at play, Allan acknowledged the challenges in determining the canonical URL when signals conflict.
He stated:
“If your signals conflict with each other, what’s going to happen is the system will start falling back on lesser signals.”
This means that while strong signals like rel=”canonical” tags and 301 redirects are crucial, other factors can come into play when these signals are unclear or contradictory.
As a result, Google’s canonicalization process involves a delicate balancing act to determine the most appropriate canonical URL.
Best Practices For Canonicalization
Clear signals help Google identify the preferred canonical URL.
Best practices include:
Use rel=”canonical” tags correctly.
Implement 301 redirects for permanently moved content.
Ensure HTTPS versions of pages are accessible and linked.
Submit sitemaps with preferred canonical URLs.
Keep internal linking consistent.
These signals help Google find the correct canonical URLs, improving your site’s crawling, indexing, and search visibility.
Mistakes To Avoid
Here are a few common mistakes to watch out for.