Holdout Test
Holdout Test is a type of incrementality experiment in which a randomly-selected group of users or markets is deliberately excluded from receiving marketing - advertisements, emails, retargeting - while a comparable group receives the full marketing treatment. Comparing outcomes between the groups measures the true incremental lift attributable to the marketing. Holdout tests are the cleanest way to answer ‘did this marketing actually cause conversions?’ at the cost of forgoing some conversions from the held-out group.
How holdout tests work
Four-step typical process:
1. Define the holdout group. Random sample of users or markets that won’t receive the marketing under test. Size balances statistical power against forgone revenue.
2. Suppress marketing to that group. Technical implementation depends on the channel. Meta and Google have built-in lift-test products; email and content require custom implementation.
3. Run the test for a defined period. Long enough to accumulate conversions; short enough to not forgo too much revenue.
4. Compare conversion rates. Conversion rate in the marketed group minus conversion rate in the holdout group is the incremental lift.
Common holdout test structures
Four variants:
User-level holdouts. Some users don’t see ads. Cleanest but requires identity tracking and privacy-compliant implementation.
Market-level holdouts. Some geographic markets don’t get campaigns. Easier to implement; less statistically efficient because markets are heterogeneous.
Channel-level suppression. One channel is paused for a period. Measures what happens without that channel.
Campaign-level suppression. Specific campaigns don’t run for a period. Narrower lens; more interpretable results.
What holdout tests reveal
Four common findings:
Retargeting lift is often smaller than expected. Holdout tests on retargeting frequently show 40–70% of the expected attribution-reported lift is actually incremental.
Branded search has mixed incrementality. Some portion of branded-search conversions would have happened organically. Paid branded search captures these at cost.
Awareness campaigns have longer tails. Holdout effects from awareness spending take weeks or months to fully materialise. Short tests under-measure.
Email lift varies by audience. Engaged audience lift is substantial; inactive-audience lift often minimal. Segmented holdouts reveal this.
Holdout test challenges
Five practical issues:
Sufficient sample size. Detecting a 5% lift on a 2% base conversion rate requires thousands of users per group. Small businesses often lack volume.
Cross-channel contamination. Users in the holdout group for one channel may encounter the brand through other channels. Clean isolation is hard.
Opportunity cost. The holdout group loses conversions that would have happened with marketing. In growth-critical periods, this cost can be significant.
Executive pressure to stop tests that are working. If the control group is performing worse, executives sometimes halt tests early - destroying statistical validity.
Cohort-level effects. Marketing often affects repeat behaviour, not just immediate conversion. Short tests miss long-term effects.
How often to run holdout tests
Three practical considerations:
Major channels annually. Incrementality of major spend channels should be revalidated regularly. Channel dynamics shift.
New channels before scaling. Before committing significant spend to a new channel, a holdout test validates the expected returns.
Between strategy shifts. Before and after major campaign or messaging changes. Baseline the new approach against holdout.
Alternative to pure holdouts
Three methods that approximate holdout-style analysis:
Ghost ads. Meta’s lift-test product doesn’t serve ads to holdout users but tracks what would have been shown. Measures lift without fully foregoing reach.
Synthetic controls. Statistical construction of a ‘what would have happened’ counterfactual. Useful when clean randomisation isn’t possible.
Geo-experiments. Market-level holdouts without user-level randomisation. Simpler to implement at scale.
Holdout analysis rigour
Four disciplines:
Pre-register the hypothesis and analysis. Define success criteria before seeing data. Prevents motivated reasoning in analysis.
Use proper significance testing. Incremental lift is noisy; results need statistical rigour to interpret.
Report confidence intervals. ‘Lift is somewhere between 5% and 25%’ is more honest than ‘lift is 15%.’ Understanding uncertainty drives better decisions.
Look at multiple metrics. Conversion rate alone isn’t enough. LTV of converted users, downstream retention, order values - all worth checking.
Content and holdout testing
Content holdout tests are possible but complex:
Paid content distribution. Holdouts on paid content amplification (boosted posts, promoted articles) work well. Clean channel control.
Organic content. Harder to hold out from. Could A/B test specific articles or content programmes against each other.
Email content. Holding back email content from a random sample is straightforward. Reveals true email-programme lift.
Mature content teams use holdout-adjacent techniques - A/B testing content variants, pausing specific content programmes as natural experiments - to build incrementality understanding even when pure holdouts aren’t feasible.
Related terms
- Incrementality - the discipline holdout tests implement
- Attribution - the adjacent measurement approach
- Multi-Touch Attribution - the model family holdout tests calibrate
- A/B Testing - the adjacent experimental technique
- Marketing Analytics - the broader discipline
