Google's John Mueller explains why Google indexes blocked pages and why related Search Console reports can be safely ignored.

Why Google Indexes Blocked Web Pages

Google’s John Mueller answered a question about why Google indexes pages that are disallowed from crawling by robots.txt and why the it’s safe to ignore the related Search Console reports about those crawls.

Bot Traffic To Query Parameter URLs

The person asking the question documented that bots were creating links to non-existent query parameter URLs (?q=xyz) to pages with noindex meta tags that are also blocked in robots.txt. What prompted the question is that Google is crawling the links to those pages, getting blocked by robots.txt (without seeing a noindex robots meta tag) then getting reported in Google Search Console as “Indexed, though blocked by robots.txt.”

The person asked the following question:

“But here’s the big question: why would Google index pages when they can’t even see the content? What’s the advantage in that?”

Google’s John Mueller confirmed that if they can’t crawl the page they can’t see the noindex meta tag. He also makes an interesting mention of the site:search operator, advising to ignore the results because the “average” users won’t see those results.

He wrote:

“Yes, you’re correct: if we can’t crawl the page, we can’t see the noindex. That said, if we can’t crawl the pages, then there’s not a lot for us to index. So while you might see some of those pages with a targeted site:-query, the average user won’t see them, so I wouldn’t fuss over it. Noindex is also fine (without robots.txt disallow), it just means the URLs will end up being crawled (and end up in the Search Console report for crawled/not indexed — neither of these statuses cause issues to the rest of the site). The important part is that you don’t make them crawlable + indexable.”

Takeaways:

1. Mueller’s answer confirms the limitations in using the Site:search advanced search operator for diagnostic reasons. One of those reasons is because it’s not connected to the regular search index, it’s a separate thing altogether.

Google’s John Mueller commented on the site search operator in 2021:

Tinggalkan Balasan

Alamat email Anda tidak akan dipublikasikan. Ruas yang wajib ditandai *