“The short answer is that a site: query is not meant to be complete, nor used for diagnostics purposes.
A site query is a specific kind of search that limits the results to a certain website. It’s basically just the word site, a colon, and then the website’s domain.
This query limits the results to a specific website. It’s not meant to be a comprehensive collection of all the pages from that website.”
2. Noindex tag without using a robots.txt is fine for these kinds of situations where a bot is linking to non-existent pages that are getting discovered by Googlebot.
3. URLs with the noindex tag will generate a “crawled/not indexed” entry in Search Console and that those won’t have a negative effect on the rest of the website.
Read the question and answer on LinkedIn:
Why would Google index pages when they can’t even see the content?
Featured Image by Shutterstock/Krakenimages.com