The Complete Technical SEO Audit

Last week I promised to share the weird technical gremlins I’ve found after doing 100+ audits.

Not the generic “check your H1 tags” nonsense.

I’m talking about the nasty little critters that silently eat your traffic while you sleep.

So here we go. This is one of the biggest pieces of work I’ve ever done. You better be grateful for it.

_{The TLDR of technical seo?}

Don’t Feed Your Site After Midnight: Hunting Down The Technical SEO Nasties

Remember that client whose traffic doubled after we fixed their hreflang tags?

They’d been pointing to a staging site.

For TWO YEARS.

Their dev team never noticed.

Their previous SEO agency never noticed.

Google noticed. And silently punished them for it.

HORROR STORY #1: One client had a WordPress plugin silently creating 4,000+ duplicate pages from only 50 actual articles. Google was indexing ALL of them. Their rankings tanked harder than the Titanic.

Let’s get to the good stuff.

The Complete Technical SEO Audit Process

This is the exact process I’ve used on 100+ sites.

Print it. Bookmark it. Tattoo it on your arm.

Just promise me you’ll actually use it.

Before we get into this, there are two parts to this guide.
There’s the what you should be looking for, and then there’s the companion guide, this has all the spreadsheets, the how to’s, the timelines, the who should be doing the job etc. It’s a chonky doc.

Lemme know where I need to email it too. You will be added to the newsletter. That’s how this works.
At least I’m honest…

PHASE 1: Reconnaissance

(Know Your Enemy)

First, we need to understand what we’re dealing with.

This is where most SEOs rush. Big mistake.

Spend time here and you’ll find 80% of the issues before you even start the deep dive.

The “What The Hell Are We Dealing With” Checklist:

Crawl the site with Screaming Frog (set it to respect robots.txt for now)
Look at the crawl stats first. Total pages found vs internal links. Huge difference? Red flag.
Check indexation with site:yourdomain.com in Google
Compare the number with your sitemap. Way more pages indexed than you have? Something’s creating duplicates.
Review Google Search Console for coverage issues, mobile usability, and core web vitals
Don’t just glance. Export the data. Look for patterns in the URLs with issues.
Review Google Analytics for sudden traffic drops and anomalies
Filter by channel. If only organic dropped, it’s likely an SEO issue, not a tracking problem.
Check server headers and status codes
Pay special attention to 302s that should be 301s, and sneaky soft 404s returning 200 status codes.
Review the robots.txt file to see what’s being blocked
Also check if GSC has crawl anomalies related to robots.txt – often the first sign of problems.
Look for multiple versions of the site (www vs non-www, http vs https)Try all combinations. Don’t forget subdomains that might be duplicating content.
Check for recent site changes that coincide with traffic drops
CMS updates, plugin installations, new features – 90% of SEO disasters happen right after “minor updates”.

Don’t skip this. This is how you find the first layer of gremlins.

The secret? Look for what’s unexpected. Too many pages. Too few pages. Weird status codes. Anything that makes you go “huh, that’s odd.”

Those little “huh” moments are where the biggest issues hide.

HORROR STORY #2: Found a site with THREE different versions all getting indexed. The www version, the non-www version, and a subdomain called "new" that was supposed to be their staging site. Triple the content, one-third the rankings.

PHASE 2: The Robots.txt Inspection

(Are You Accidentally Blocking Traffic?)

You’d be shocked how many sites accidentally tell Google “nothing to see here!”

It’s 15 lines of text that can tank an entire website.

And I’ve seen rookie mistakes on enterprise sites bringing in millions.

Common mistakes I’ve seen:

Blocking entire sections of the site that should be indexed
Classic example: “Disallow: /blog” when the blog is their main traffic driver.
Blocking CSS and JS files (Google needs these to understand your site)
Looks like: “Disallow: *.js$” or “Disallow: /wp-includes/” – Google can’t render your site properly without these.
Using the wrong syntax (spaces matter!)
“Disallow:/products” (no space) is ignored completely. One missing space = entire section not blocked.
Forgetting to update after migrating from staging
Staging robots.txt often has “Disallow: /” to block everything. Copy that to production = SEO suicide.
Wrong wildcard usage
“Disallow: *product*” doesn’t work. Correct is “Disallow: /*product*” – that missing slash causes chaos.
Not using “Allow” directives for exceptions
If you block a directory but want specific files indexed, you need explicit “Allow:” rules.
Mismatched User-agent directives
Different rules for different bots must be in separate sections. Mix them up and Google ignores everything.

The worst part? You won’t get a notification when your robots.txt is wrecking your SEO.

Google silently obeys your terrible instructions. Like a genie granting your wish to destroy your own traffic.

How to properly check:

Use GSC’s robots.txt tester against SPECIFIC URLs (not just the homepage)
Check your log files to see if Googlebot is being blocked from important directories
Compare organic landing pages with robots.txt directives (are your top performers blocked?)
Look for “Crawled – currently not indexed” issues in GSC (classic symptom of CSS/JS blocking)

Fix this first. I’ve seen traffic double overnight just by removing one bad robots.txt line.

HORROR STORY #3: An e-commerce site had "Disallow: /products/" in their robots.txt for SIX MONTHS. They couldn't figure out why their product pages weren't ranking. Removing one line doubled their traffic in 3 weeks.

PHASE 3: The Canonical Conspiracy

(Are Your Pages Fighting Each Other?)

Canonical tags tell Google which version of a page is the “real” one.

Get these wrong, and your pages start cannibalizing each other’s rankings.

It’s like entering the same horse in multiple races and wondering why it can’t win all of them.

Common canonical disasters:

Self-referencing canonicals pointing to the wrong URL
Example: URL is example.com/product-blue but canonical points to example.com/product
Relative URLs in canonicals (always use absolute URLs)
Relative: <link rel=”canonical” href=”/product” /> - Google might interpret this wrong
Conflicting signals (canonical says one thing, hreflang says another)
Canonical points to US version but hreflang says this is the UK version. Google gets confused.
Canonicalizing to non-indexed pages (face-palm moment)
Telling Google “this is the canonical” then also telling it “don’t index this” = SEO disaster
Broken canonical chains
Page A canonicals to Page B, which canonicals to Page C… but Page C doesn’t exist anymore.
Homepage canonicalization
Every product page pointing to the homepage as canonical. Yes, I’ve seen this multiple times.
Parameter handling gone wrong
URLs with sorting parameters canonicalizing to filtered URLs instead of the clean base URL.
Paginated content misfires
Page 2 canonicalizing to page 1, causing all paginated content to disappear from index.

Check EVERY canonical tag. Especially on:

Paginated pages
Each page in pagination should self-canonicalize, NOT point to page 1.
Product variations
Color/size variants should usually canonical to the main product, not themselves.
Filtered category pages
Filter for “red shoes” should typically canonical to the main “shoes” category.
Mobile versions
If you still have separate mobile URLs (m.example.com), these must canonical to desktop.
Print pages/print/ versions should canonical to the main article.
AMP pages
AMP must canonical to the regular HTML version, never the other way around.

How to audit canonicals properly:

Crawl the site and export all canonical tags
Check for pages that canonical to something other than themselves
Verify all canonical targets actually exist (not 404s)
Check if canonical targets are blocked by robots.txt
Look for circular canonical references
Compare canonical tags with sitemap URLs (they should match!)
If you have Query Console access, check server logs for canonical header response mismatches

One e-commerce site I worked with had 40% of their product pages canonicalizing to category pages. They fixed it and saw a 76% increase in product page traffic in 6 weeks.

HORROR STORY #4: Found a site where EVERY page had a canonical tag pointing to the homepage. Their dev thought this was how you "help SEO." Thousands of pages telling Google "ignore me, look at the homepage instead." Tragic.

PHASE 4: The Mobile Madness

(Is Your Site Actually Mobile-Friendly?)

Google uses mobile-first indexing.

This means if your mobile site sucks, your rankings suck too. Period.

And yet I still see sites in 2025 where the mobile experience feels like an afterthought.

Like they designed for desktop, then just squished everything down and called it a day.

Critical mobile issues to check:

Different content on mobile vs desktop
Google only indexes the MOBILE version. If key content is hidden on mobile, it doesn’t exist to Google.
Elements hidden on mobile
Common with “accordion” elements and tabs that only show on click. Google may not see this content.
Tiny tap targets (links/buttons too close together)
Google’s standard: tap targets should be at least 48px × 48px with 8px between them.
Text too small to read without zooming
Base font should be at least 16px. Anything smaller = user frustration and Google penalties.
Viewport not configured properly
Missing or incorrect: <meta name=”viewport” content=”width=device-width, initial-scale=1″>
Interstitials (popups) blocking content
Google specifically penalizes sites where popups obscure the main content on mobile.
Touch elements too close to screen edge
Elements within 8px of screen edge are hard to tap and frustrate users.
Mobile-specific rendering issues
Content that breaks layout, horizontal scrolling, or scripts that fail on mobile browsers.
Lazy-loaded primary content
If main content only loads on scroll, Google might not see it. Lazy-load below-the-fold only.
Font size inconsistency
Font scaling issues where some text becomes tiny while other text stays readable.

Don’t trust how it looks on your fancy new iPhone. Test on old devices too.

How to properly test mobile-friendliness:

Use Google’s Mobile-Friendly Test on key pages (not just homepage!)
Check GSC’s Mobile Usability report for specific issues
Compare rendered HTML between mobile and desktop versions
Test on actual mid-range Android devices (not just your flagship phone)
Use Chrome DevTools’ device emulation with throttled network speeds
Check Core Web Vitals specifically for mobile (often worse than desktop)
Verify all important links/buttons are easily tappable on small screens
Test common user flows (checkout, form submission) on actual mobile devices

Real example: A news site hid their sidebar content on mobile to save space. Problem? Their internal linking structure was in that sidebar. Mobile Googlebot never saw those links, causing crawling issues across the site. Traffic increased 17% after fixing.

HORROR STORY #5: Client's mobile menu was broken on Android devices. Half their traffic couldn't navigate the site. Nobody had checked because the whole team used iPhones. Simple fix, 27% traffic increase.
And by "client" I mean me.

I did this. On this site. The one you're reading now. People in the newsletter actually pointed it out to me.
I straight up didn't know...

PHASE 5: The Schema Surgery

(Is Your Structured Data Actually Working?)

Schema markup helps Google understand your content.

But bad schema is worse than no schema at all.

It’s like handing Google a blueprint to your house where the bathroom is labeled as the kitchen.

Confusing at best, disaster at worst.

Common schema disasters:

Multiple conflicting schema types on one page
Like having both Product and Article schema on a blog post that mentions products.
Missing required properties
Product schema without price or availability. Recipe without ingredients. Google ignores incomplete schema.
Incorrect property types
Using text for price (“$19.99”) instead of number (19.99) with a separate currency property.
Outdated schema formats
Using data-vocabulary.org markup instead of schema.org (Google stopped supporting the former).
JSON-LD that doesn’t match visible content
Schema claims 5-star rating but page shows 4 stars. Google may see this as deceptive.
Incorrectly nested entities
Organization schema inside Product schema, creating structure that makes no logical sense.
Mismatched URLs
Schema URL properties that don’t match the actual page URL, creating confusion.
Review schema violations
Self-serving reviews (reviews of your own business) violate Google’s guidelines.
Aggregate rating without sufficient reviews
Using aggregateRating with only 1-2 reviews. Google expects statistically significant numbers.
Wrong schema for the page type
Using Article schema on a product page or using WebPage on everything because “it’s a web page.”

Test EVERY page type with Google’s Rich Results Test.

Better yet, follow this schema audit process:

Identify all schema types used across the site (Screaming Frog can extract these)
Create a matrix of page types vs. correct schema types
Test sample URLs of each page type in Rich Results Test
Compare visible content with schema properties (especially dates, prices, ratings, availability)
Check for schema implementation method consistency (JSON-LD is preferred)
Verify nested entities make logical sense
Keep an eye on GSC’s Rich Results report for issues
Test key pages in multiple schema validators (Google’s can miss things)

Most common schema-related traffic boosts I’ve seen:

Adding proper VideoObject schema to video content (huge for video featured snippets)
Fixing FAQ schema on key landing pages (position zero opportunities)
Repairing broken Product schema (better shopping results)
Adding HowTo schema to instruction-based content
Implementing proper LocalBusiness schema for multi-location businesses

True story: E-commerce client fixed their broken aggregateRating schema (it was using decimals where integers were required). Next day, star ratings appeared in search results. CTR increased 34% over the next month.

HORROR STORY #6: E-commerce site had product schema on EVERY page - including blog posts, about us, contact page. Plus, their product schema listed products as "in stock" when they were actually sold out. Google trust = zero.

PHASE 6: The Hreflang Hellscape

(Is Your International SEO Working?)

Hreflang tells Google which language/country each version of your page targets.

And almost EVERYONE gets it wrong.

Even major brands with nine-figure revenues.

It’s the single most technically complex element in SEO, and it shows.

It’s one of the only technical SEO I don’t do. I hate hreflang tags. They confuse me. I had a bunch of help with this section to ensure it was correct.

Common hreflang horrors:

Missing return links (every language version must link to all others)
If EN links to DE and FR, both DE and FR must link back to EN and each other. Miss one = all ignored.
Incorrect language codes
Using “en-UK” instead of “en-GB” or “es” for all Spanish rather than specific variants like “es-MX”.
Mixing up language and country codes
Using “english” instead of “en” or “England” instead of “GB”. Only ISO codes work.
Self-referencing hreflang missing
Each page MUST include a hreflang tag pointing to itself. Many forget this critical element.
Using hreflang in both HTML and sitemap (pick one)
Sending mixed signals if they’re not identical. Google gets confused about which to trust.
Canonicalizing across languages
FR page canonicalizing to EN page while also having hreflang. Contradictory signals.
Bad syntax in implementation
Forgetting quotes, using wrong attribute names, or incorrect tag structure.
Pointing to redirects or error pages
Href URLs must be the final destination, not redirecting URLs.
Inconsistent implementation across the site
Having hreflang on some pages but not others creates partial implementation confusion.
Over-complicated targeting
Creating separate versions for every country-language pair when language-only would work fine.

If you have multiple language versions, audit ALL of them.

How to audit hreflang properly:

Crawl each language version separately (all of them!)
Export all hreflang annotations from each version
Create a matrix to verify the “return links” principle (every page links to all alternates and itself)
Check all hreflang URLs respond with 200 status (not redirects or errors)
Verify language/country codes against the ISO standards
Check for conflicts with canonical tags
If using XML sitemaps, ensure the hreflang there matches the HTML implementation
Compare the content of each language version (are they actually translations or just the same content?)

Biggest hreflang wins I’ve seen:

E-commerce site fixing self-referencing hreflang issue: 43% increase in international traffic
Travel site fixing canonical/hreflang conflicts: 27% increase in non-US bookings
SaaS company implementing proper regional targeting instead of just language targeting: 92% increase in LATAM signups

Remember that client whose hreflang pointed to staging for two years? They’d been wondering why their German site got almost no traffic despite having better content than competitors. After fixing, German traffic increased 215% in three months.

The best part? Their dev team had insisted “the hreflang is fine” for months. The evidence proved otherwise.

HORROR STORY #7: Client's hreflang tags pointed to their staging site for TWO YEARS. Every time Google tried to understand their international site structure, it got redirected to a 404 page. Traffic doubled after fixing this one issue.

PHASE 7: The Page Speed Pandemic

(Is Your Site Slower Than Molasses?)

Slow sites kill rankings.

And 90% of sites are MUCH slower than their owners realize.

“But it loads fast on my computer!”

Yeah, on your fiber connection with a cleared cache and $3000 MacBook. Try it on a 4G connection with a mid-range phone in rural Kansas.

That’s what Google sees.

Go beyond Google PageSpeed Insights:

Test real user metrics in Google Search Console
Core Web Vitals report shows ACTUAL user experience, not lab data. This is what matters for rankings.
Check server response time (TTFB)
Should be under 200ms. If not, server issues or database queries are killing you before content even starts loading.
Look for render-blocking resources
CSS/JS that prevents the page from rendering until it loads. Delay JavaScript, inline critical CSS.
Check for huge images
Still see 2MB+ hero images? Properly size and compress all images, use WebP/AVIF formats.
Review JavaScript execution time
Heavy JS frameworks can paralyze mobile devices. 30% of sites have JS that takes 3+ seconds to execute on mobile.
Evaluate third-party scripts
Analytics, chat widgets, heatmaps, ad pixels – they add up fast. Each one is a performance tax.
Check font loading strategy
Web fonts can cause Layout Shift and blank text. Use font-display:swap and preload critical fonts.
Look for render-path optimizations
The sequence matters: HTML → CSS → Initial JS → Content → Non-critical JS
Investigate Cumulative Layout Shift (CLS
)Elements jumping around as the page loads frustrates users and hurts rankings. Set image/video dimensions!
Check mobile-specific performance issues
Mobile CPU processing power is 3-10x weaker than desktop. JavaScript that runs fine on your laptop can cripple a phone.

Don’t just test the homepage. Check product pages, category pages, and blog posts.

How to properly audit page speed:

Start with GSC’s Core Web Vitals report to identify problem page groups
Test top landing pages from organic search (not just your homepage!)
Use WebPageTest.org for detailed waterfall analysis
Run Lighthouse tests in an incognito window with extensions disabled
Test on actual mid-range Android devices
Set up Real User Monitoring (RUM) to track actual visitor experience
Analyze your server response time with uptime monitoring tools
Create a performance budget for each page type

Biggest speed wins I’ve seen:

Moving to a proper CDN: 40-60% improvement in TTFB globally
Optimizing images: 30-50% reduction in page weight
Implementing proper lazy loading: 20-40% improvement in initial load time
Server-side rendering critical content: 70% improvement in First Contentful Paint
Cleaning up third-party scripts: 15-25% reduction in Total Blocking Time

HORROR STORY #8: Client's site loaded in 2 seconds on desktop but 19 seconds on 3G connections. They were loading a 12MB background video on mobile. 12MB! Their developer "forgot" to disable it for mobile. No wonder their bounce rate was 89%.

PHASE 8: The Redirect Wormhole

(Is Your Site Stuck in an Endless Loop?)

Redirects should be simple.

Old URL → New URL. Done.

But I’ve seen some truly nightmarish situations:

Redirect chains (A → B → C → D)
Each hop loses link equity. Google may stop following after 4-5 redirects. Track all hops!
Redirect loops (A → B → C → A)
These eventually timeout and show as server errors to users and bots. Death spiral for rankings.
Temporary redirects (302s) used for permanent changes
302s don’t pass full link equity. Been “temporarily” redirecting for 3 years? That should be a 301.
Redirecting to pages that then 404
The SEO equivalent of sending someone on a wild goose chase that ends in a brick wall.
Different redirects for Googlebot vs users
Cloaking alert! This can get your site penalized. Redirects should be consistent for all visitors.
Inconsistent protocol redirects
HTTP → HTTPS for some URLs but not others. Pick a protocol and stick with it site-wide.
Mobile redirect disasters
Desktop URL → Mobile URL → Desktop URL in an endless loop when user agent detection goes wrong.
Cross-domain redirect chaos
Old site → New site but preserving URL structure inconsistently. Map everything 1:1 or use catch-alls.
Parameter-based redirect failures
URLs with UTM or other tracking parameters failing to redirect properly.
Case sensitivity issuesexample.com/Page redirecting to example.com/page, creating duplicate content issues.

Map out EVERY redirect on the site. Look for patterns of madness.

How to properly audit redirects:

Crawl the site with a tool that follows and logs redirects (Screaming Frog, DeepCrawl)
Extract all redirect chains and categorize by type (301, 302, etc.)
Test all inbound links from other sites to ensure they redirect properly
Compare the final destination URLs with your current site architecture
Look for redirect chains longer than 2 hops
Check for URLs that redirect to non-200 status codes
Review server logs for frequent redirect paths
Test redirects with different user agents (mobile vs desktop)
Audit historical redirects after migrations or redesigns

Biggest redirect wins I’ve seen:

Fixing redirect chains from multiple site migrations: 27% increase in organic traffic within 2 weeks
Converting old 302 redirects to 301s: 14% increase in pages appearing in search
Repairing redirects pointing to 404s: 19% decrease in crawl errors
Implementing proper URL case handling: Reduced duplicate content issues by 22%

True story: One major e-commerce site had redirected their entire product catalog through a tracking subdomain for analytics purposes. Every link went:

example.com/product → track.example.com/product → example.com/product

This self-redirecting nightmare was killing their crawl budget and rankings. Fixing it improved crawl efficiency by 70% and organic traffic by 43% in two months.

HORROR STORY #9: Client migrated their site 3 times in 5 years. Each time, they just stacked new redirects on top of old ones. Some URLs were going through SEVEN redirects before reaching the final destination. Fixing this mess improved crawl budget by 43%.

PHASE 9: The JavaScript Jungle

(Is Your Content Actually Visible?)

JavaScript is not Google’s friend.

If your content only appears after JS loads, you’re gambling with your SEO.

Sure, Google’s better at JavaScript than years ago. But “better” isn’t “perfect.”

It’s like telling someone you’re a “better” driver after three DUIs. The bar was low.

Test for:

Content only visible after JavaScript execution
If it doesn’t appear in View Source (Ctrl+U), Google might not see it. Check rendered HTML in DevTools.
Navigation that requires JavaScript
If JavaScript fails, can users still navigate? If not, neither can Googlebot sometimes.
Critical elements hidden until JS interactions
Content in tabs, accordions, or “load more” buttons may not get indexed or weighted properly.
Infinite scroll without proper pagination
Google won’t scroll forever. Implement rel=”next” and rel=”prev” or paginated endpoints.
Client-side rendering without server-side backup
If all rendering happens in-browser, Google’s first pass sees almost nothing.
Lazy-loaded primary content
Fine for images below the fold, terrible for your main content and critical links.
JavaScript redirects
Using window.location instead of proper server-side redirects. Much less reliable for SEO.
AJAX content loading
Content loaded via AJAX calls after page load may be missed during crawling.
Front-end routing issues
Single Page Apps with client-side routing can confuse Google without proper implementation.
Blocked JavaScript files
If robots.txt blocks .js files, Google can’t execute your JavaScript. Double-edged sword.

View source vs. inspect element. If it’s not in the source, Google might not see it.

How to properly audit JavaScript SEO issues:

Compare raw HTML (View Source) with rendered DOM (Inspect Element)
Use “Fetch as Google” in GSC and check the rendered content
Test key pages with JavaScript disabled in your browser
Check for critical content loaded via AJAX or dynamic JS
Test with slower connections to see loading sequence
Use tools like Puppeteer to programmatically test JS rendering
Check coverage report in Chrome DevTools to identify unused JavaScript
Look for excessive JavaScript execution time on mobile devices
Review server logs to see if Googlebot accessing your JS files

Biggest JavaScript SEO wins I’ve seen:

Moving from client-side to server-side rendering: 120% increase in indexed pages
Converting JavaScript-loaded content to static HTML: 54% improvement in rankings
Creating proper HTML fallbacks for JavaScript navigation: 32% increase in internal page crawling
Implementing dynamic rendering for search engines: 87% improvement in indexation

Real case: Fashion e-commerce site had all product details (sizing, materials, care instructions) hidden in JavaScript tabs. Google wasn’t seeing this valuable content. Moving it to visible HTML with CSS-only tabs increased organic product traffic by 67% in two months.

Another client’s React-based site looked beautiful but was essentially invisible to Google. Implementing server-side rendering took their organic traffic from 1,200 visits/month to over 14,000 in just three months.

PHASE 10: The Crawl Budget Black Hole

(Is Google Wasting Time on Garbage?)

Google doesn’t crawl every page, every day.

It allocates a “crawl budget” to your site.

If that budget is wasted on garbage pages, your important content suffers.

It’s like inviting Google to a buffet where 95% of the dishes are empty plates.

Eventually, Google gets tired and leaves before seeing your best content.

Check for:

Faceted navigation creating millions of URLs
Color + Size + Price + Brand + Rating = exponential URL explosion. Use rel=”nofollow” and/or robots.txt.
Calendar systems with infinite past/future pages
Event sites creating pages for every day until the end of time. Limit calendar range and use robots.txt.
Search results pages being indexed
Internal search creates a new URL for every possible query. Block with meta robots or robots.txt.
Session IDs in URLs
Adding unique identifiers to personalize experiences creates infinite URL variants. Use cookies instead.
Duplicate content with different URL parameters
Sort=asc vs sort=desc vs sort=price showing identical content. Use rel=”canonical” properly.
Paginated content without limits
Showing 10 products per page with 10,000 products = 1,000 paginated pages. Consolidate or limit.
Print versions of pages
Creating separate /print/ versions of every article. Use CSS print styles instead.
Development artifacts
Test directories, staging environments, or QA systems accidentally exposed to search engines.
Outdated or deprecated content still linked
Old versions of pages that should be redirected but are still accessible and linked internally.
Tag/category sprawl
CMS systems creating a new page for every possible topic tag, even ones used only once.

Look at your server logs to see what Google is actually crawling.

How to properly audit crawl budget issues:

Analyze server logs to see Googlebot crawl patterns and frequency
Review GSC’s crawl stats report for trends and anomalies
Check coverage report for excessive “Discovered – currently not indexed” pages
Map crawl frequency against page importance (are your key pages crawled often enough?)
Identify URL patterns that consume disproportionate crawl resources
Calculate crawl-to-index ratio (pages crawled vs pages indexed)
Review internal linking to ensure important pages are well-connected
Check for crawl traps (calendar systems, faceted navigation, infinite parameters)
Verify robots.txt is correctly blocking low-value content without blocking assets

Biggest crawl budget wins I’ve seen:

E-commerce site fixing faceted navigation: Reduced crawlable URLs by 97%, new products indexed in 2 days vs 3 weeks
News site implementing proper pagination: Reduced crawlable URLs by 83%, increased crawl frequency of important pages by 4x
Directory site fixing duplicate geographical pages: Reduced URL count by 91%, new listings indexed within 24 hours
Blog implementing proper tag management: Reduced tag pages by 72%, core content crawled 3x more frequently

True story: That fashion site with 42 MILLION URLs? After implementing proper faceted navigation controls, their indexable URL count dropped to about 300,000. Within three weeks, their category pages started ranking for competitive terms they’d never ranked for before.

Their dev team initially pushed back on the fixes, saying “Google is smart enough to figure it out.”
Narrator: Google was not smart enough to figure it out.

Another client’s WordPress site had a plugin creating /amp versions of EVERY page, even though AMP was only configured for posts. Result: 2x the URLs, half the crawl efficiency. Fixing it improved crawl rates dramatically.

The “Fix It Now” Priority List

Can’t fix everything at once? Start here:

Robots.txt errors (Stop blocking important content)
First, make sure your money pages are actually accessible to Google. Nothing else matters if they can’t be crawled.
Broken canonicals (Stop confusing Google about your important pages)
Especially ones pointing to 404s or incorrect URLs. Google wastes crawl budget following these.
Server errors & 404s on important pages
Fix these first, especially if they’re linked from other sites or your navigation.
Mobile usability issues (Remember: mobile-first indexing)
Google only sees your mobile version now. Content invisible on mobile is invisible to Google.
Duplicate content issues
Multiple versions of the same page competing for rankings dilute your authority.
Redirect chains and loops
These waste crawl budget and link equity. Shorten all redirects to one hop maximum.
Crawl budget waste (block those parameter URLs!)
The faster Google can find your good content, the better it ranks. Stop the crawl waste.
Schema markup errors
Bad schema is worse than no schema. Fix the errors before adding more types.
Page speed on top landing pages
Start with your highest-traffic pages first – optimize the pages already getting visitors.
Hreflang errors (if you have international versions)
If you target multiple countries/languages, these errors can tank international performance.

Start at the top, work your way down.

This isn’t random. It’s based on impact vs. effort and dependency order.

No point fixing schema if Google can’t even see your pages due to robots.txt blocks.

No benefit to speeding up pages that have canonical issues sending visitors elsewhere.

The foundation must be solid before you build the house.

The Tools I Actually Use

No fluff, just the tools that find real problems:

Screaming Frog – Still the best crawler
Worth every penny of the £149 license. The Swiss Army knife of technical SEO.
Google Search Console – The source of truth
If Google tells you something is wrong, believe it. No matter what other tools say.
DeepCrawl – For larger sites (100k+ pages)When Screaming Frog chokes on your massive site, DeepCrawl keeps going.
ContentKing – For real-time monitoring
Alerts you when things break before they impact rankings. Like having a 24/7 SEO guard dog.
Sitebulb – For beautiful visualizations and reports
The best tool for explaining technical SEO issues to non-technical stakeholders.
Ahrefs/Semrush – For backlink and ranking data
Essential context for understanding which technical issues are hurting your most valuable pages.
WebPageTest – For detailed page speed analysis
More detailed than Google PageSpeed Insights. Shows the full loading sequence.
Chrome DevTools – For JavaScript debugging
Free and incredibly powerful. The Coverage tab alone is worth its weight in gold.
httpstatus.io – For bulk status code checking
Quickly check hundreds of URLs for redirect chains, status codes, and response headers.
Merkle’s Schema Markup Validator – Better than Google’s tool
Catches schema issues that Google’s Rich Results Test misses. More detailed error messages.
URL Profiler – For data aggregation at scale
Pull data from multiple sources for thousands of URLs. Huge time-saver.
Botify – For enterprise-level crawl analysis
When you need to analyze millions of pages with sophisticated segmentation.
Screaming Frog Log File Analyzer – For server log analysis
See exactly how Googlebot crawls your site. Identify patterns and issues.
OnCrawl – For connecting crawl, log, and ranking data
Shows correlations between technical issues and actual performance/rankings.

You don’t need all of these. Screaming Frog + GSC will find 80% of issues.

Start with the free tools. Invest in paid ones only when you hit their limits.

I’ve tested dozens of SEO tools over the years. These are the ones that consistently find real problems that impact rankings.

No affiliate links here. No kickbacks. Just genuine recommendations from someone who’s been in the trenches.

The Hidden Cost of Technical SEO Problems

Here’s why this matters:

A site with technical problems is like a car with a leaky gas tank.

You can add all the fuel you want (content, links, etc.), but you’re still losing efficiency.

But it’s worse than that.

It’s like a car where the gas tank leaks, the brakes stick, the tires are flat, and the GPS sends you to the wrong address.

And the worst part? The dashboard looks fine. No warning lights.

This is what makes technical SEO so frustrating – and so valuable.

Technical SEO isn’t sexy. It doesn’t make for good tweets.

Nobody goes viral sharing “I fixed my canonical tags today!”

But the ROI is insane if you know what you’re doing.

I’ve seen more traffic gains from fixing technical issues than from any other SEO tactic.

Content creation gets all the glory, but technical fixes move the needle faster.

What happens when you fix technical SEO problems:

Google crawls more of your important pages
More pages get indexed (and stay indexed)
Existing rankings improve as signals consolidate
Pages load faster, reducing bounce rates
Users have better experiences, increasing engagement signals
Link equity flows properly to important pages
Content updates get discovered and ranked faster

All of this happens without creating a single new piece of content.

It’s like finding money you already had but couldn’t access.

THE HAPPY ENDING: E-commerce client spent 6 months creating content. Zero traffic improvement. We spent 2 weeks fixing technical issues. Traffic increased 143%. Sometimes the boring stuff is what moves the needle.

Another happy ending: News site was convinced they needed more content. They were publishing 15 articles daily but traffic was flat. Their navigation had JavaScript issues preventing proper crawling. One fix = 86% more pages indexed in two weeks = 51% traffic increase.

Technical SEO isn’t a one-time project. It’s ongoing maintenance.

Sites break. CMSs update. Developers push code. Plugins get installed.

Each change can introduce new gremlins.

The sites that win are the ones that keep the gremlins in check.

What Next?

Don’t just read this and think “cool info.”

Actually DO something:

Pick ONE issue from this list
Check if your site has it
Fix it this week

Start small. You don’t need to fix everything at once.

Just fix something. Anything.

I promise you’ll find at least one of these issues on your site. Probably several.

And fixing just one could make a significant difference.

Remember: Technical SEO is cumulative. Each small fix adds up.

It’s not about perfection. It’s about continuous improvement.

Your site doesn’t need to be technically perfect. It just needs to be better than your competitors.

And most of them aren’t doing any of this stuff.

That’s your advantage.

This was a hell of a thing to put together. I hope I helped.

And remember: Never feed your website after midnight. Bad things happen.

Tim "Technically fucking done with this now" Hanson

Generate high quality, search optimised articles with Penfriend.ai

Get your first 3 articles free

No credit card necessary
Unsubscribe any time

About the Author

Tim Hanson

Hello there. I'm Tim, Chief Creative Officer for Penfriend.ai

I've been involved with SEO and Content for over a decade at this point.
I'm also the person designing the product/content process for how Penfriend actually works.
I like skiing, drums and yoyos.

Follow me

Share 0

With Penfriend, I was able to generate two 3,000+ word articles around niche topics in 10 minutes. AND THEY ARE SO HUMAN. I can easily pass these first drafts to my SMEs to embed with practical examples and customer use cases. I have no doubt these will rank.

I cannot wait to put these articles into action and see what happens.

Jess Cook

Head of Content & Comms
Island