Crawl Budget
Crawl Budget is the number of URLs search engine crawlers - primarily Googlebot - will crawl on a site over a given period. Every site has a crawl budget, determined by the site’s authority, server capacity, freshness signals, and importance. On small sites, crawl budget is effectively unlimited and not a concern. On large sites (100K+ URLs), crawl budget becomes a real constraint that can prevent important content from being crawled and indexed.
What determines crawl budget
Four factors Google has named:
Crawl capacity. How much the site can handle without slowing. Googlebot backs off from servers showing stress.
Crawl demand. How much Google wants to crawl the site based on its perceived importance, freshness, popularity.
Server response. Fast, healthy responses increase crawl rate; slow or error-prone responses decrease it.
Content freshness. Sites that update frequently get crawled more often than static sites.
When crawl budget becomes a problem
Four scenarios where it matters:
Large sites with many URLs. E-commerce with 100K+ product pages, media sites with years of archived content, SaaS with user-generated content. Crawler can’t reach everything.
Sites with many low-value URLs. Faceted navigation generating thousands of parameter combinations. Internal search pages. Tag archives. Crawler spends budget on these instead of content that matters.
Sites with server-side pagination issues. Pagination that generates infinite URLs. Session-parameter URLs. Crawler traps.
Sites with slow servers. Server response times over 1โ2 seconds reduce crawl rate significantly.
How to audit crawl budget
Five signals to check:
Google Search Console Crawl Stats report. Shows how many URLs Googlebot is crawling daily, response times, file types. The authoritative source.
Server log analysis. Filter for Googlebot user-agent; see exactly what’s being crawled and how often.
Indexed-vs-submitted URL gap. URLs in your sitemap that never appear in the index. Often a crawl-budget problem.
Coverage report exclusions. ‘Crawled - currently not indexed’ and ‘Discovered - currently not indexed’ flags. Signals of crawl budget pressure.
Crawler traps. Check for infinite URL generation - faceted navigation combinations, broken calendar widgets, paginated loops.
How to optimise crawl budget
Seven practical moves:
Block low-value URLs. robots.txt disallow for faceted-navigation variants, internal search, session parameters. Crawler doesn’t waste time on them.
Canonical-tag duplicate URLs. Proper canonicals consolidate crawl signal onto the canonical version.
Fix crawl-trap URLs. Remove links to infinite paginations, dead calendar loops, parameter combinations that don’t need indexing.
Improve server response time. Faster server = more crawl per unit time. TTFB optimisation directly affects crawl budget.
Prune low-value content. Sites with 500,000 thin pages benefit from aggressive pruning. Fewer pages, higher average quality, better crawl allocation.
Submit and maintain XML sitemaps. Clear signal to Google about what’s worth crawling. Well-maintained sitemaps improve crawl prioritisation.
Use 304 Not Modified responses. Pages that haven’t changed return 304, not full content. Conserves crawl bandwidth.
Crawl budget for small sites
Three practical points:
Sites under 10K URLs rarely hit crawl budget limits. Googlebot crawls small sites comprehensively.
If indexing is slow, it’s usually not crawl budget. Small-site indexing delays are more often quality or signal issues than crawl-budget constraints.
Focus on signals that drive crawl demand. Fresh content, authoritative backlinks, user engagement. These increase demand more than optimisation reduces spend.
Crawl budget myths
Three common misconceptions:
‘Every site needs crawl-budget optimisation.’ False. Most sites don’t.
‘High crawl rate = good SEO.’ False. High crawl rate can indicate a site with lots of low-value URLs Google is chewing through. Crawl efficiency matters more than volume.
‘Crawl budget affects ranking.’ Indirectly - pages not crawled can’t be indexed or ranked. But the crawl itself isn’t a ranking signal.
Penfriend and crawl budget
Penfriend-produced content typically consolidates content programmes (rather than inflating them with thin content). For sites that previously had many thin pages, switching to Penfriend often improves crawl budget naturally - fewer, higher-quality pages mean better crawl allocation. The glossary pattern specifically (distinct pages for each term with clean cross-linking) produces a discoverable, efficiently-crawlable structure.
Related terms
- robots.txt - the primary tool for blocking low-value URLs from crawl
- XML Sitemap - the primary tool for signalling crawl priority
- Googlebot - the crawler crawl budget describes
- SEO Audit - the discipline that diagnoses crawl-budget issues
- Index - the downstream outcome crawl budget enables
