Site icon SEOPARK

What you need to know in 2025

What you need to know in 2025

Crawl budget is a common source of concern and confusion in SEO. 

This guide will explain everything you need to know about crawl budget and how it may impact your technical SEO efforts in 2025.

Why would search bots limit crawling?

Google’s Gary Illyes provided an excellent explanation about crawl budget, describing how Googlebot strives to be a “good citizen of the web.” This principle is key to understanding the concept and why it exists.

Think of when you last saw tickets to your favorite band go on sale. 

Too many users flood the website, overwhelming the server and causing it not to respond as intended. This is frustrating and often prevents users from buying tickets.

This can also happen with bots. Remember when you forgot to adjust the crawling speed or number of simultaneous connections allowed on your favorite site crawler and brought down the website you were crawling on? 

Googlebot could also do this. It could hit a website too frequently or through too many “parallel connections” and cause the same effect, essentially overwhelming the server. 

As a “good citizen,” it is designed to avoid that happening.

Google sets its “crawl capacity limit” for a site based on what the site can handle. 

If the site responds well to the crawl, it will continue at that pace and increase the volume of connections. 

If it responds poorly, then the speed of fetching and connections used will be lowered.

The cost of crawling

Crawling, parsing and rendering use up resources, and there are financial considerations involved in the process.

Yes, that’s one reason Google and other search engines may adjust how they crawl a site to benefit it. 

However, I imagine some financial cost calculation goes into determining how frequently a URL should be crawled.

What is crawl budget?

Crawl budget refers to the amount of time and resources Googlebot allocates to crawling a website. It is determined by two key factors: the crawl capacity limit and crawl demand. 

The crawl capacity limit reflects how much crawling a site can handle without performance issues.

Crawl demand is based on Googlebot’s assessment of the website’s content, including individual URLs, and the need to update its understanding of those pages.

More popular pages are crawled more frequently to ensure the index remains up-to-date.

Google calculates this budget to balance the resources it can afford to spend on crawling with the need to protect both the website and its own infrastructure.

What causes issues with crawl budget

Not all sites will ever notice any impact of having a crawl budget. 

Google clearly says only three types of websites need to manage their crawl budget actively. These are:

Now, I would advise caution before dismissing your website as none of the above: crawl your site.

You may feel that your small ecommerce store only has a couple of thousand SKUs and a handful of informational pages. 

In reality, though, with faceted navigation and pagination, you may have ten times the volume of URLs you thought you would have.

Don’t forget that having more than one language or location targeted at your domain may yield multiples of each page.

Set your crawling tool to crawl as Googlebot or Bingbot and let it loose on all pages that these search bots would be able to access. This will give you a more accurate picture of the size of your website as they know it.

Why crawl budget is important

Why is Google recommending that the above three types of sites consider their crawl budget? Why is it important to monitor and manage it?

If your crawl budget is too low to allow the search bots to discover all the new URLs you’ve added to your site or to revisit URLs that have changed, then they won’t know about the content on them.

That means the pages may not be indexed or if they are, they may not rank as well as they could if the bots could crawl them.

How crawl budget issues happen

Three main factors that can cause crawl budget issues: 

The quality of URLs.

The volume of URLs.

Their accessibility.

Quality

We know that Google considers other pages on a website when deciding whether to crawl new pages it has discovered. 

Googlebot may decide a page isn’t worth the resources to crawl if it anticipates its content will not be of high enough value to index. This can be due to:

High volumes of on-site duplicate content.

Hacked pages with poor-quality content.

Internally created low-quality and spam content. 

Poor-quality pages may have been intentionally created, either internally or by external bad actors. They may also be an unintended side effect of poor design and copy.

Volume

Your site may have more URLs than you realize, often due to common technical issues like faceted navigation and infinite URL creation.

Faceted navigation

Faceted navigation is usually found on ecommerce websites. 

If you have a category page like www.example-pet-store.com/cats/toys, you may have a filtering system to help users narrow down the products on that page. 

If you want to narrow down the cat toy products in this fictitious pet store, you may select the “contains cat nip” filter. 

That may then yield a URL that looks something like this: 

www.example-pet-store.com/cats/toys?contains=catnip

This is faceted navigation.

Now, consider if the users want to narrow the search down even further to toys that have feathers. 

They might end up on a URL like this one: 

www.example-pet-store.com/cats/toys?contains=catnip&design=feathers 

What about if they want to sort the list by price? 

Clicking the sort button may take them to a new URL: 

www.example-pet-store.com/cats/toys?contains=catnip&design=feathers&sort=low

You can see how quickly additional URLs are created stemming from one category page. 

If Googlebot can find these pages, either through internal or external links, or perhaps they have been included in the XML sitemap, it may crawl them.

Pretty soon, instead of crawling your site’s 200 category pages and individual product pages, Googlebot might be aware of thousands of variants of the category pages. 

As these filtering systems lead to new URLs being created, they can all be crawled unless you stop the bots from doing so or they deem the pages too low-value to do so.

Infinite URL creation

Events calendar. Book a table. Reserve a space.

These types of date-based systems on websites that allow users to click through to future days or months can cause “bot traps.”

Picture an events calendar. It shows the whole month with a highlight on the days with events. 

It sits on the URL /events-calendar and if you are looking at the month of January 2025, the URL will contain /events-calendar/january-2025. This is pretty common practice.

If that calendar also has a button at the top that allows users to click through to the next month’s events, that wouldn’t be abnormal either. 

Clicking once to view the next month’s events might take you to a URL containing /events-calendar/February. 

Click again, and you might end up on /events-calendar/march-2025.

However, the real fun comes when there is no limit to how far into the future you can click. 

Click on “view next month’s events” enough times, and you could end up on /events-calendar/december-2086.

If the calendar is set up in such a way that the “view next month’s events” link changes on each page to be the next URL in the sequence of months, then the search bots could also end up following the links all the way through to /events-calendar/december-2086 – and beyond.

It’s not useful content on page /events-calendar/december-2086. There probably haven’t been any events organized yet.

All of the resources wasted on those empty calendar pages could have been utilized by the bots on new products just uploaded to the site.

Accessibility

Search bots may reduce the frequency of crawling a URL if it returns a server response code other than 200. 

For example, a 4XX code indicates that the page cannot or should not be found, leading to less frequent crawling of that page. 

Similarly, if multiple URLs return codes like 429 or 500, bots may reduce the crawling of those pages and eventually drop them from the index.

Redirects can also impact crawling, albeit to a smaller extent. However, excessive use, such as long chains of redirects, can have a cumulative effect over time.

Get the newsletter search marketers rely on.

How to identify crawl budget problems

It’s impossible to determine if your site is suffering from crawl budget issues by looking at it alone.

See what the search engines are reporting

The first step to identifying if search bots are having issues crawling your site is to use their webmaster tools. 

For example, look at the “Crawl stats” report in Google Search Console. 

This will help you identify if a problem on your site may have caused Googlebot to increase or decrease its crawling.

Also, have a look at the “Page indexing” report. Here, you will see the ratio between your site’s indexed and unindexed pages. 

When looking through the reasons for not indexing pages, you may also see crawl issues reported, such as “Discovered – currently not indexed.” 

This can be your first indication that pages on your site do not meet Google’s crawling criteria.

Dig deeper: Decoding Googlebot crawl stats data in Google Search Console

Log files

Another way to tell if the search bots are struggling to crawl your pages as much as they would like to is to analyze your log files. 

Log files report any human users or bots that have “hit” your website.

By reviewing your site’s log files, you can understand which pages have not been crawled by the search bots for a while. 

If these are pages that are new or updated regularly, this can indicate that there may be a crawl budget problem.

Dig deeper. Crawl efficacy: How to level up crawl optimization

How to fix crawl budget problems

Before trying to fix a crawl budget issue, ensure you have one. 

Some of the fixes I’m about to suggest are good practices for helping search bots focus on the pages you want them to crawl. 

Others are more serious and could have a negative impact on your crawling if not applied carefully.

Another word of warning

Carefully consider whether you’re addressing a crawling or indexing issue before making changes.

I’ve seen many cases where pages are already in the index, and someone wants them removed, so they block crawling of those pages.

This approach won’t remove the pages from the index – at least not quickly. 

Worse, they sometimes double down by adding a noindex meta tag to the pages they’ve already blocked in the robots.txt file.

The problem? 

If crawling is blocked, search bots can’t access the page to see the noindex tag, rendering the effort ineffective.

To avoid such issues, don’t mix crawling and indexing solutions. 

Determine whether your primary concern is with crawling or indexing, and address that issue directly.

Fixing crawl budget issues through the robots.txt

The robots.txt is a very valid way of helping the search bots determine which pages you do not want them crawling. 

The “disallow” command essentially prevents good bots from crawling any URLs that match the disallow command.

Bad bots can and do ignore the disallow command, so if you find your site is getting overwhelmed by bots of another nature, such as competitors scraping it, they may need to be blocked in another way.

Check if your robots.txt file is blocking URLs that you want search bots to crawl. I’ve used the robots.txt tester from Dentsu to help with this.

Improving the quality and load speed of pages

If search bots struggle to navigate your site, speeding up page loading can help. 

Load speed is important for crawling, both the time it takes for the server to respond to a search bot’s request and the time it takes to render a page. 

Test the templates used on URLs that aren’t being crawled regularly and see if they are slow-loading.

Another reason you may not see pages being crawled, even for the first time, is because of quality. 

Audit the pages not being crawled and those that perhaps share the same sub-folder but have been crawled. 

Make sure that the content on those pages isn’t too thin, duplicated elsewhere on the site or spammy.

Control crawling through robots.txt

You can stop search bots from crawling single pages and entire folders through the robots.txt. 

Using the “disallow” command can help you decide which parts of your website you want bots to visit.

For example, you may not want the search bots wasting crawl budget on your filtered category page results. 

You could disallow the bots from crawling any page with the sorting or filtering parameters in the URL, like “?sort=” or “?content=.”

Another way to prevent bots from crawling certain pages is to add the “nofollow” attribute to the link tag. 

With the events calendar example earlier, each “View next month’s events” link could have the “nofollow” attribute. That way, human visitors could still click the link, but bots would not be able to follow it.

Remember to add the “nofollow” attribute to the links wherever they appear on your site. 

If you don’t do this or someone adds a link to a deeper page in the events calendar system from their own site, the bots could still crawl that page.  

Navigating crawl budget for SEO success in 2025

Most sites won’t need to worry about their crawl budget or whether bots can access all the pages within the allocated time and resources. 

However, that doesn’t mean they should ignore how bots are crawling the site. 

Even if you’re not running out of crawl budget, there may still be issues preventing search bots from crawling certain pages, or you might be allowing them to crawl pages you don’t want them to.

It’s important to monitor the crawling of your site as part of its overall technical health. 

This way, if any issues arise that could hinder bots from crawling your content, you’ll be aware and can address them promptly.

Dig deeper: Top 6 technical SEO action items for 2025

Contributing authors are invited to create content for Search Engine Land and are chosen for their expertise and contribution to the search community. Our contributors work under the oversight of the editorial staff and contributions are checked for quality and relevance to our readers. The opinions they express are their own.

Exit mobile version