Another hard block to watch out for are what Google calls “random errors” which is when a server sends a 200 response code, which means that the response was good (even though it’s serving an error page with that 200 response). Google will interpret those error pages as duplicates and drop them from the search index. This is a big problem because it can take time to recover from this kind of error.
A soft block can happen if the CDN shows one of those “Are you human?” pop-ups (bot interstitials) to Googlebot. Bot interstitials should send a 503 server response so that Google knows that this is a temporary issue.
Google’s new documentation explains:
“…when the interstitial shows up, that’s all they see, not your awesome site. In case of these bot-verification interstitials, we strongly recommend sending a clear signal in the form of a 503 HTTP status code to automated clients like crawlers that the content is temporarily unavailable. This will ensure that the content is not removed from Google’s index automatically.”
See also: 9 Tips To Optimize Crawl Budget For SEO
Debug Issues With URL Inspection Tool And WAF Controls
Google recommends using the URL Inspection Tool in the Search Console to see how the CDN is serving your web pages. If the CDN firewall, called a Web Application Firewall (WAF), is blocking Googlebot by IP address you should be able to check for the blocked IP addresses and compare them to Google’s official list of IPs to see if one of them are on the list.
Google offers the following CDN-level debugging advice:
“If you need your site to show up in search engines, we strongly recommend checking whether the crawlers you care about can access your site. Remember that the IPs may end up on a blocklist automatically, without you knowing, so checking in on the blocklists every now and then is a good idea for your site’s success in search and beyond. If the blocklist is very long (not unlike this blog post), try to look for just the first few segments of the IP ranges, for example, instead of looking for 192.168.0.101 you can just look for 192.168.”
Read Google’s documentation for more information:
Crawling December: CDNs and crawling
Featured Image by Shutterstock/JHVEPhoto