Crawl budget can be described as the level of attention search engines give your site. This level of attention is based on how frequently they’d like to crawl and how frequently a website can be crawled.

If you’re wasting crawl budget, search engines won’t be able to crawl your website efficiently, which would end up hurting your SEO performance.

What’s the definition of crawl budget in SEO?

Crawl budget is a term invented by the SEO industry to indicate a number of related concepts and systems that search engines use when deciding how many pages, and which pages, to crawl. It’s basically the attention that search engines will give your website.

Why do search engines assign crawl budget to websites?

Because they don’t have unlimited resources, and they divide their attention across millions of websites. So they need a way to prioritize their crawling effort. Assigning crawl budget to each website helps them do this.

How do they assign crawl budget to websites?

That’s based on two factors, crawl limit and crawl demand:

  1. Crawl limit / host load: how much crawling can a website handle, and what are its owner’s preferences?
  2. Crawl demand / crawl scheduling: which URLs are worth (re)crawling the most, based on its popularity and how often it’s being updated.

Crawl budget is a common term within SEO. Crawl budget is sometimes also referred to as crawl space or crawl time.

Is crawl budget just about pages?

It’s not actually, for the sake of ease we’re talking about pages, but in reality it’s about any document that search engines crawl. Some examples of other documents: JavaScript and CSS files, mobile page variants, hreflang variants and PDF files.

How does crawl limit / host load work in practice?

Crawl limit, or host load if you will, is an important part of crawl budget. Search engines crawlers are designed to prevent overloading a web server with requests so they’re careful about this.How search engines determine the crawl limit of a website? There are a variety of factors influencing the crawl limit. To name a few:

  • Signs of platform in bad shape: how often requested URLs timeout or return server errors.
  • The amount of websites running on the host: if your website is running on a shared hosting platform with hundreds of other websites, and you’ve got a fairly large website the crawl limit for your website is very limited as crawl limit is determined on a host level. You have to share the host’s crawl limit with all of the other sites running on it. In this case you’d be way better of on a dedicated server, which will most likely also massively decrease load times for your visitors.

Another thing to consider is having separate mobile and desktop sites running on the same host. They have a shared crawl limit too. So keep this in mind.

Spend your crawl budget wisely

Are search engines crawling the most important parts of your website? Run a quick test with ContentKing!

Start your free trial.NO CREDIT CARD NEEDED

How does crawl demand / crawl scheduling work in practice?

Crawl demand, or crawl scheduling, is about determining the worth of re-crawling URLs. Again, many factors influence crawl demand among which:

  • Popularity: how many inbound internal and inbound external links a URL has, but also the amount of queries it’s ranking for.
  • Freshness: how often the URL’s being updated.
  • Type of page: is the type of page likely to change. Take for example a product category page, and a terms and conditions page – which one do you think changes most often and deserves to be crawled more frequently?

Why should you care about crawl budget?

You want search engines to find and understand as many as possible of your indexable pages, and you want them to do that as quickly as possible. When you add new pages and update existing ones, you want search engines to pick these up as soon as possible. The sooner they’ve indexed the pages, the sooner you can benefit from them.

If you’re wasting crawl budget, search engines won’t be able to crawl your website efficiently. They’ll spend time on parts of your site that don’t matter, which can result in important parts of your website being left undiscovered. If they don’t know about pages, they won’t crawl and index them, and you won’t be able to bring visitors in through search engines to them.

You can see where this is leading to: wasting crawl budget hurts your SEO performance.

Please note that crawl budget is generally only something to worry about if you’ve got a large website, let’s say 10,000 pages and up.

What is the crawl budget for my website?

Out of all the search engines, Google is the most transparent about their crawl budget for your website.

Crawl budget in Google Search Console

If you have your website verified in Google Search Console, you can get some insight into your website’s crawl budget for Google.

Follow these steps:

  1. Log in to Google Search Console and choose a website.
  2. Go to Crawl > Crawl Stats. There you can see the number of pages that Google crawls per day.

During the summer of 2016, our crawl budget looked like this:

Google Search Console Crawl Stats – Summer 2016

We see here that the average crawl budget is 27 pages / day. So in theory, if this average crawl budget stays the same, you would have a monthly crawl budget of 27 pages x 30 days = 810 pages.

Fast forward 2 years, and look at what our crawl budget is right now:

Google Search Console Crawl Stats – Summer 2018

Our average average crawl budget is 253 pages / day, so you could say that our crawl budget went up 10X in 2 years’ time.

Go to the source: server logs

It’s very interesting to check your server logs to see how often Google’s crawlers are hitting your website. It’s interesting to compare these statistics to the ones being reported in Google Search Console. It’s always better to rely on multiple sources.

How do you optimize your crawl budget?

Optimizing your crawl budget comes down to making sure no crawl budget is wasted. Essentially, fixing the reasons for wasted crawl budget. We monitor thousands of websites; if you were to check each one of them for crawl budget issues, you’d quickly see a pattern: most websites are suffering from the same kind of issues.

Common reasons for wasted crawl budget that we encounter:

  1. Accessible URLs with parameters: an example of a URL with a parameter is https://www.example.com/toys/cars?color=black. In this case, the parameter is used to store a visitor’s selection in a product filter.
  2. Duplicate content: we call pages that are highly similar, or exactly the same, “duplicate content.” Examples are: copied pages, internal search result pages, and tag pages.
  3. Low-quality content: pages with very little content, or pages that don’t add any value.
  4. Broken and redirecting links: broken links are links referencing pages that don’t exist anymore, and redirected links are links to URLs that are redirecting to other URLs.
  5. Including incorrect URLs in XML sitemaps: non-indexable pages and non-pages such as 3xx, 4xx and 5xx URLs shouldn’t be included in your XML sitemap.
  6. Pages with high load time / time-outs: pages that take a long time to load, or don’t load at all, have a negative impact on your crawl budget, because it’s a sign to search engines that your website can’t handle the request, and so they may adjust your crawl limit.
  7. High numbers of non-indexable pages: the website contains a lot of pages that aren’t indexable.
  8. Bad internal link structure: if your internal link structure isn’t set up correctly, search engines may not pay enough attention to some of your pages.

 


Mark Crutch

At the age of 12 Mark purchase, an old at the time TRS-80 loving known as (Trash-80). They would spend many knights programming stick figures to move on the screen.