Marketing | Creative | Strategy

Your faithful partner in the discovery of endless digital opportunity.

Take the First Step

The importance of Crawl Budget management for SEO purposes

Introduction

Effective crawl budget management is a crucial aspect of search engine optimization (SEO) for large websites. The term "crawl budget" refers to the number of pages that search engines, particularly Google, will crawl on your website within a specific timeframe. This concept is especially important for extensive sites with thousands of pages, as it directly impacts how quickly and comprehensively search engines can index your content. Understanding and optimizing your crawl budget can improve indexing, visibility, and higher search rankings. This article explores the importance of crawl budget management and provides advanced strategies to optimize it for SEO purposes.

What is Crawl Budget?

Google has official articles on the Crawl Budget:

Crawl budget refers to the number of pages that search engines crawl for a single website, all within a certain timeframe.

Because search engines have limited resources, they need to make sure they can allocate enough resources for each website – not too much, not too little. For this, a crawl budget helps prioritize the efforts.

Another term to know is crawl capacity – how many simultaneous parallel connections Googlebot (the web crawling bot Google uses) can have for a single website.

Crawl demand – which URLs on your site should be crawling – which are worth crawling.

For crawl demand, factors Google could use are:

  • Freshness: updated content raises a page’s importance in Google’s eyes.
  • Number and importance of links) – the more links your webpage has, the merrier. Of course, having links from important websites (websites that, themselves, have lots of links) matters even more.

Why does it matter to know about Crawl Budget?

If you have an extensive website (thousands of pages or more), Google will likely not crawl your website. Another situation is when you add a lot of pages at once. Google might not consider it important to include. If your website has technical issues, even if you have a smaller website and you don’t have many pages at once, some pages might not be crawled.

How to check the crawl budget?

If you want to see individual pages—whether they have been crawled—a good solution is Google Search Console.

Google Search Console with 'slicedbread.agency' in the URL inspection search bar, showing the Overview section.

You can use the “Inspect URL” at the top of the page of your website.

Google will give you some details about indexing status:

Google Search Console showing a URL is indexed on Google with page indexing and HTTPS status.

For additional details, click on this icon:

Google Search Console showing 'Page is indexed' status.

And you’ll get some extra information:

Google Search Console showing extra information.

You might want to have more information. Go to Settings -> Crawl Stats in Google Search Console, and you’ll see something like this (note - the images below are not from the Slicedbread Agency website):

Google Search Console showing total crawl requests.

You can also see the server response time:

Google Search Console showing average-response-time

Generally, if a website starts responding to Googlebot with poorer frequency (so Googlebot visits the website, but the website starts delivering pages very slowly), Googlebot might decide to reduce the number of pages crawled.

The response time in the image above is not that great; having it lower than 400 would have been better.

You can also see how the server responds to queries—you might have errors (404 and 4xx) that you should look into, for example, by crawling the website with a tool like Screaming Frog.

Screaming Frog showing server responds to queries.

On a website, there could be multiple file types:

Screenshot: multiple file types.

As you see in the image, there are quite a few JavaScript. This might not be good for indexing; it’s much simpler to crawl and index HTML and images, in general.

In the image below, “refresh” means that pages are already in the index but might be updating, while “discovery” generally refers to new pages:

Table titled 'By Purpose' showing two categories: Refresh and Discovery.
Table titled 'By Googlebot type' showing categories: Smartphone, Page resource load, Imaeg, Desctop, etc.

As you see, some resources can load from different devices. Most of the time, the Googlebot crawls with a bot resembling a smartphone.

Googlebot generally crawls from data centers in California. It's not that far from Slicedbread agency, it seems.

How to have a better crawl budget

If you want Google to crawl your website more, consider our tips.

Prioritize pages on your website

Your homepage is comprised of many links and is likely to be considered the most important one by search engines. But, if you have an online store, there might be categories that bring a lot of sales. Or you might have products that you want to shine. If you don’t have an online store, you might have some internal pages that you want Google to consider necessary.

What to do?

Consider linking to the most important pages. Create links from one page to another.

For example, in the descriptive text on a category page, consider linking to some products, linking from one product or service page to another. 

Noindex some pages

If you have pages like Login pages, internal search results (pages that can be reached by searching for something in your search box), thank you pages. These pages should be seen just by the user (my account, my favorites, my orders), pages resulting after submitting a form, you should consider things like:

  • Noindex them;
  • Using the x-robots tag (in HTTP headers);
  • Adding them to robots.txt.

Have a quality website

If Google decides that the visitors don’t like your website, they might consider not crawling your website as much.

What makes your website a quality one? It starts with content (texts and images, perhaps other media like videos and audio), but it also refers to usability (easy-to-read text, well-thought-out interactions, good links between pages). Of course, even things like design, information structure, and site speed matter.

As a small conclusion, make your visitors happy, and Google will crawl your website.

Take a close look at pages with parameters

We are not 100% against pages with parameters. They can work just fine.

For example, if all of your products look like this, and generally you don’t use other parameters, this isn’t a big issue:

www.website.com/product?id=57738

www.website.com/product?id=39694

www.website.com/product?id=27934

What happens more often than not is that parameters tend to be abused, and if ignored, this might lead to quite some problems.

If you have a category page and all the sorting/filtering is done via parameters, you can have a URL like this:

www.website.com/category-red-dresses/?sort_by=sales&filter1=brand_x&filter2=color_white&filter_3=price_min&filter_4=price_max

You should allocate some resources to check the proper way to index and use rel=canonical on URLs like these. Slicedbread agency can help.

Avoid duplicate content

If you have a lot of pages that are similar (or, even worse, very similar) one to the other, Google might decide not to crawl them.

Consider adding unique content to your website and using noindex/rel=canonical on pages that are not important to you.

Have a fast server and a good site speed

You’ll need to make sure that your pages load fast. For this, make sure your pages load fast (if you want some insights on improving, see https://pagespeed.web.dev/), and also that your website is hosted on a fast server.

This will have multiple benefits, not just crawl issues.

Try to look for a solution for pages not loading 

OK, this is obvious – if Googlebot crawls your website and starts seeing errors (pages that don’t load, give a 404 error, or similar ones – 4xx, 5xx), it’s quite an issue.

The solution? Crawl your website using a tool like Screaming Frog and look for 4xx and 5xx errors.

Then, 301-redirect such pages to new destinations. Or remove the “bad” links pointing to 4xx/5xx errors entirely.

Have a good ratio between indexable and non-indexable (but crawlable) pages

Let’s say you have a category page that looks like this:

www.website.com/category-red-dresses/?sort_by=sales&filter1=brand_x&filter2=color_white&filter_3=price_min&filter_4=price_max

From it, you might have all sorts of sorting/filtering options all sorts of URLs.

Even if you mark those URLs as non-indexable or add rel=canonical to them, Googlebot might still need to crawl them first to see if they deserve indexing.

Consider blocking links to URLs you know you don’t want Google to crawl via robots.txt.

In this way, Googlebot will only crawl a few non-indexable pages.

All in all, you should aim to have a good ratio between indexable and non-indexable (but crawlable) pages.

Avoid Soft 404s

Let’s say you have a category page with zero products. 

If you don’t remove it, visitors and Google will visit that category page and see no products.

Consider removing pages that have little or no value.

Make sure your Sitemap.xml is working fine

In your sitemap file(s), you should include all the URLs you want Google to crawl and none of the URLs you don’t want Google to crawl.

Get some links

Building backlinks (links from other websites to yours) can improve the importance Google places on your website.

Conclusion

Managing your crawl budget is vital for ensuring that search engines efficiently index the most important pages of your website. You can optimize your crawl budget by prioritizing key pages, eliminating unnecessary or redundant content, and maintaining a high-quality site with fast loading times. 

Regular monitoring using tools like Google Search Console and periodic audits will help you stay on top of potential issues and ensure your website remains search-engine friendly. 

Adopting these advanced practices will enhance your site's visibility and contribute to a more robust and successful SEO strategy.

close[x]

Get A Quote

Ready to take the next step? Please fill out the form below. We can't wait to hear from you!

Once submitted, a Slicedbread representative will be back in touch with you as soon as possible.

* Required Fields