Google Revamps Entire Crawler Documentation

Google Revamps Entire Crawler Documentation

Google has launched a major revamp of its Crawler documentation, shrinking the main overview page and splitting content into three new, more focused pages.  Although the changelog downplays the changes there is an entirely new section and basically a rewrite of the entire crawler overview page. The additional pages allows Google to increase the information density of all the crawler pages and improves topical coverage.

What Changed?

Google’s documentation changelog notes two changes but there is actually a lot more.

Here are some of the changes:

Added an updated user agent string for the GoogleProducer crawler
Added content encoding information
Added a new section about technical properties

The technical properties section contains entirely new information that didn’t previously exist. There are no changes to the crawler behavior, but by creating three topically specific pages Google is able to add more information to the crawler overview page while simultaneously making it smaller.

This is the new information about content encoding (compression):

“Google’s crawlers and fetchers support the following content encodings (compressions): gzip, deflate, and Brotli (br). The content encodings supported by each Google user agent is advertised in the Accept-Encoding header of each request they make. For example, Accept-Encoding: gzip, deflate, br.”

There is additional information about crawling over HTTP/1.1 and HTTP/2, plus a statement about their goal being to crawl as many pages as possible without impacting the website server.

What Is The Goal Of The Revamp?

The change to the documentation was due to the fact that the overview page had become large. Additional crawler information would make the overview page even larger. A decision was made to break the page into three subtopics so that the specific crawler content could continue to grow and making room for more general information on the overviews page. Spinning off subtopics into their own pages is a brilliant solution to the problem of how best to serve users.

This is how the documentation changelog explains the change:

“The documentation grew very long which limited our ability to extend the content about our crawlers and user-triggered fetchers.

…Reorganized the documentation for Google’s crawlers and user-triggered fetchers. We also added explicit notes about what product each crawler affects, and added a robots.txt snippet for each crawler to demonstrate how to use the user agent tokens. There were no meaningful changes to the content otherwise.”

The changelog downplays the changes by describing them as a reorganization because the crawler overview is substantially rewritten, in addition to the creation of three brand new pages.

While the content remains substantially the same, the division of it into sub-topics makes it easier for Google to add more content to the new pages without continuing to grow the original page. The original page, called Overview of Google crawlers and fetchers (user agents), is now truly an overview with more granular content moved to standalone pages.

Google published three new pages:

Common crawlers
Special-case crawlers
User-triggered fetchers

1. Common Crawlers

As it says on the title, these are common crawlers, some of which are associated with GoogleBot, including the Google-InspectionTool, which uses the GoogleBot user agent. All of the bots listed on this page obey the robots.txt rules.

Tinggalkan Balasan

Alamat email Anda tidak akan dipublikasikan. Ruas yang wajib ditandai *