noindex
noindex is a meta tag (or HTTP header) that instructs search engines not to include a page in their index, even though the page may still be crawled. The noindex directive is the correct tool for keeping pages out of search results - unlike robots.txt, which blocks crawling but doesn’t prevent indexing via external references. Noindex is commonly used on admin pages, internal search results, tag archives, thin duplicates, and any page that should be accessible to users but invisible to search.
How to implement noindex
Three standard methods:
Meta tag in HTML head. <meta name="robots" content="noindex">. Placed in the <head> of the page. Most common method for HTML pages.
HTTP header. X-Robots-Tag: noindex. Useful for non-HTML responses (PDFs, images, JSON endpoints) or when HTML edits are difficult.
Bot-specific directives. <meta name="googlebot" content="noindex"> targets Googlebot specifically. Allows different rules for different crawlers.
When to use noindex
Seven legitimate scenarios:
Thin duplicate content. Pages substantively similar to other pages that shouldn’t both be indexed. Canonical tags can handle this; noindex is sometimes cleaner.
Internal search result pages. Dynamic pages generated from internal searches. Usually worthless in the index; clutter crawl budget.
User account and admin pages. Login pages, account settings, checkout. Users reach them directly; search traffic inappropriate.
Thin archive and tag pages. Blog tag archives or category pages with minimal content beyond links. Often better noindexed.
Faceted-navigation combinations. Parameter-combination URLs that don’t warrant individual indexing.
Gated or paywalled content indexing decisions. Content that shouldn’t appear in search until fully available.
Temporary or under-construction pages. Pages that aren’t ready for public visibility.
When NOT to use noindex
Four common mistakes:
To hide sensitive content. Noindex prevents search indexing but not direct URL access. Use proper authentication and access controls.
To manage duplicate content. Canonical tags are usually the better tool. Noindex removes a page entirely; canonical consolidates signal.
On blocked-by-robots.txt URLs. If Googlebot can’t crawl a URL, it can’t see the noindex directive. The page may still get indexed (with ‘no information available’). Robots.txt + noindex is the wrong combination.
During migrations. Applying noindex to old URLs during migration instead of redirecting is a common error. Redirects preserve equity; noindex discards it.
Noindex behaviour details
Four nuances worth knowing:
Noindex takes effect on next crawl. The page needs to be crawled with the noindex directive before Google removes it from the index. Typically 1–4 weeks for medium-authority sites.
Long-term noindex treats the page as low-value. Pages noindexed for months or years get deprioritized in crawl. Googlebot visits them less often.
Noindex + nofollow is different from noindex alone. Plain noindex still follows links on the page; noindex, nofollow tells Google to neither index the page nor follow its links. Use nofollow sparingly - blocking link flow can hurt crawl and signal distribution.
Removing noindex takes time. After removing the tag, the page needs to be recrawled and re-evaluated. Indexing won’t be immediate.
Diagnosing noindex issues
Four common problems and their fixes:
Accidentally noindexing important pages. Staging environments with blanket noindex often ship to production by mistake. Check the live site after every deploy.
Noindex conflicting with canonical. A page that’s noindexed shouldn’t be a canonical target. Fix: either remove noindex or canonicalise elsewhere.
Pages won’t de-index after noindex added. Usually means Googlebot hasn’t re-crawled. Use URL Inspection to request re-crawl.
Noindex applied via robots.txt. Historical - robots.txt used to support a noindex directive unofficially; Google removed support. Meta tag or header is the only supported method now.
Noindex versus alternatives
Four tools and when each is right:
noindex. Exclude from index; allow crawling, allow link flow.
Canonical tag. Consolidate index signal onto a specified URL. Use when pages are near-duplicates.
robots.txt. Block crawling entirely. Use when pages shouldn’t be fetched at all.
404/410. Remove the page entirely. Use when content is genuinely gone.
Penfriend and noindex
Penfriend-produced content is intended to be indexed - it’s published content for public audiences. The noindex tag isn’t applied to Penfriend-generated pages by default. Sites that want to temporarily block a page (during editorial review, for instance) can add noindex independently of the generation workflow; Penfriend respects whatever meta configuration the site chooses.
Related terms
- robots.txt - the crawl-blocking companion
- Canonical URL - the duplicate-management alternative
- Googlebot - the crawler noindex speaks to
- Redirect - the URL-replacement alternative
- Search Engine Optimization (SEO) - the discipline noindex sits inside
