What you need to know in 2025

What you need to know in 2025

Crawl budget is a common source of concern and confusion in SEO. 

This guide will explain everything you need to know about crawl budget and how it may impact your technical SEO efforts in 2025.

Why would search bots limit crawling?

Google’s Gary Illyes provided an excellent explanation about crawl budget, describing how Googlebot strives to be a “good citizen of the web.” This principle is key to understanding the concept and why it exists.

Think of when you last saw tickets to your favorite band go on sale. 

Too many users flood the website, overwhelming the server and causing it not to respond as intended. This is frustrating and often prevents users from buying tickets.

This can also happen with bots. Remember when you forgot to adjust the crawling speed or number of simultaneous connections allowed on your favorite site crawler and brought down the website you were crawling on? 

Googlebot could also do this. It could hit a website too frequently or through too many “parallel connections” and cause the same effect, essentially overwhelming the server. 

As a “good citizen,” it is designed to avoid that happening.

Google sets its “crawl capacity limit” for a site based on what the site can handle. 

If the site responds well to the crawl, it will continue at that pace and increase the volume of connections. 

If it responds poorly, then the speed of fetching and connections used will be lowered.

The cost of crawling

Crawling, parsing and rendering use up resources, and there are financial considerations involved in the process.

Yes, that’s one reason Google and other search engines may adjust how they crawl a site to benefit it. 

However, I imagine some financial cost calculation goes into determining how frequently a URL should be crawled.

What is crawl budget?

Crawl budget refers to the amount of time and resources Googlebot allocates to crawling a website. It is determined by two key factors: the crawl capacity limit and crawl demand. 

The crawl capacity limit reflects how much crawling a site can handle without performance issues.

Crawl demand is based on Googlebot’s assessment of the website’s content, including individual URLs, and the need to update its understanding of those pages.

More popular pages are crawled more frequently to ensure the index remains up-to-date.

Google calculates this budget to balance the resources it can afford to spend on crawling with the need to protect both the website and its own infrastructure.

What causes issues with crawl budget

Not all sites will ever notice any impact of having a crawl budget. 

Google clearly says only three types of websites need to manage their crawl budget actively. These are:

Now, I would advise caution before dismissing your website as none of the above: crawl your site.

You may feel that your small ecommerce store only has a couple of thousand SKUs and a handful of informational pages. 

In reality, though, with faceted navigation and pagination, you may have ten times the volume of URLs you thought you would have.

Don’t forget that having more than one language or location targeted at your domain may yield multiples of each page.

Set your crawling tool to crawl as Googlebot or Bingbot and let it loose on all pages that these search bots would be able to access. This will give you a more accurate picture of the size of your website as they know it.

Why crawl budget is important

Why is Google recommending that the above three types of sites consider their crawl budget? Why is it important to monitor and manage it?

If your crawl budget is too low to allow the search bots to discover all the new URLs you’ve added to your site or to revisit URLs that have changed, then they won’t know about the content on them.

That means the pages may not be indexed or if they are, they may not rank as well as they could if the bots could crawl them.

How crawl budget issues happen

Three main factors that can cause crawl budget issues: 

The quality of URLs.

The volume of URLs.

Their accessibility.

Quality

We know that Google considers other pages on a website when deciding whether to crawl new pages it has discovered. 

Googlebot may decide a page isn’t worth the resources to crawl if it anticipates its content will not be of high enough value to index. This can be due to:

High volumes of on-site duplicate content.

Hacked pages with poor-quality content.

Internally created low-quality and spam content. 

Poor-quality pages may have been intentionally created, either internally or by external bad actors. They may also be an unintended side effect of poor design and copy.

Volume

Your site may have more URLs than you realize, often due to common technical issues like faceted navigation and infinite URL creation.

Faceted navigation

Faceted navigation is usually found on ecommerce websites. 

If you have a category page like www.example-pet-store.com/cats/toys, you may have a filtering system to help users narrow down the products on that page. 

If you want to narrow down the cat toy products in this fictitious pet store, you may select the “contains cat nip” filter. 

That may then yield a URL that looks something like this: 

www.example-pet-store.com/cats/toys?contains=catnip

This is faceted navigation.

Now, consider if the users want to narrow the search down even further to toys that have feathers. 

They might end up on a URL like this one: 

www.example-pet-store.com/cats/toys?contains=catnip&design=feathers 

What about if they want to sort the list by price? 

Clicking the sort button may take them to a new URL: 

www.example-pet-store.com/cats/toys?contains=catnip&design=feathers&sort=low

You can see how quickly additional URLs are created stemming from one category page. 

If Googlebot can find these pages, either through internal or external links, or perhaps they have been included in the XML sitemap, it may crawl them.

Pretty soon, instead of crawling your site’s 200 category pages and individual product pages, Googlebot might be aware of thousands of variants of the category pages. 

As these filtering systems lead to new URLs being created, they can all be crawled unless you stop the bots from doing so or they deem the pages too low-value to do so.

Infinite URL creation

Events calendar. Book a table. Reserve a space.

These types of date-based systems on websites that allow users to click through to future days or months can cause “bot traps.”

Picture an events calendar. It shows the whole month with a highlight on the days with events. 

It sits on the URL /events-calendar and if you are looking at the month of January 2025, the URL will contain /events-calendar/january-2025. This is pretty common practice.

If that calendar also has a button at the top that allows users to click through to the next month’s events, that wouldn’t be abnormal either. 

Clicking once to view the next month’s events might take you to a URL containing /events-calendar/February. 

Click again, and you might end up on /events-calendar/march-2025.

However, the real fun comes when there is no limit to how far into the future you can click. 

Click on “view next month’s events” enough times, and you could end up on /events-calendar/december-2086.

If the calendar is set up in such a way that the “view next month’s events” link changes on each page to be the next URL in the sequence of months, then the search bots could also end up following the links all the way through to /events-calendar/december-2086 – and beyond.

Tinggalkan Balasan

Alamat email Anda tidak akan dipublikasikan. Ruas yang wajib ditandai *