Definition
A holdout campaign is an experiment in which a predefined portion of an eligible audience is intentionally withheld from receiving a marketing treatment (e.g., an ad, email, offer, or on-site personalization). The performance of the treated group is compared to the holdout (control) group to estimate incrementality—what the campaign caused beyond what would have happened anyway.
In marketing, holdouts quantify true business impact across channels, helping leaders separate correlation from causation, validate vendor claims, and allocate budget to programs that create incremental revenue, leads, or retention—rather than activity that would have occurred without the spend.
How to calculate
Define a measurement window long enough to capture conversions (account for attribution lag).
-
Absolute lift (conversion outcomes)
Lift = p_exposed − p_holdout
, wherep
is the conversion rate. -
Relative lift
Relative Lift = (p_exposed − p_holdout) / p_holdout
-
Incremental revenue
Inc_Rev = Revenue_exposed − Revenue_holdout
-
Incremental ROAS (iROAS)
iROAS = Inc_Rev / Ad_Spend
-
Incremental profit (if margin known)
Inc_Profit = (Inc_Rev × Gross_Margin) − Ad_Spend
-
Statistical significance
Use a two-proportion z-test or logistic regression with a treatment indicator; report confidence intervals for lift or iROAS. -
Sample size (rule of thumb)
Determine per-groupn
via power analysis using baseline conversionp0
, expected liftΔ
, desired power (e.g., 80%), and alpha (e.g., 5%). Reserve a 5–20% holdout depending on traffic and expected effect.
How to utilize
Common use cases:
- Email & lifecycle: Prove incremental opens, clicks, orders; validate triggered vs. batch sends.
- Paid media: Measure incremental conversions or revenue from prospecting and retargeting.
- On-site/app personalization: Quantify net impact of recommendations, banners, and offers.
- Loyalty & retention: Test whether points, perks, or save-offers change churn behavior.
- B2B: Evaluate ABM sequences on accounts or buying groups (account-level randomization).
Practical steps:
- Choose the unit of randomization (user, household, account, geography). Match it to how exposure happens to avoid contamination.
- Define eligibility and exclusions (e.g., recent purchasers, suppression lists).
- Randomly assign to treatment vs. holdout and enforce the holdout across all entry points.
- Run for a fixed window; set guardrail metrics (e.g., unsubscribes, customer service contacts).
- Analyze intention-to-treat (ITT) to keep groups comparable; optionally add covariate adjustment (e.g., CUPED) to reduce variance.
- Decide and scale: If lift is significant and profitable, graduate from test to standard practice and schedule periodic re-validation.
Comparison to similar approaches
Approach | What it measures | When to use | Strengths | Limitations |
---|---|---|---|---|
Holdout campaign (control group) | Incremental effect of a campaign vs. no exposure | Direct, person/account-level programs | Causal, simple to explain; channel-agnostic | Requires withholding treatment; needs enough volume/time |
A/B/n creative tests | Relative performance between variants (all treated) | Optimize subject lines, creatives, bids | Fast optimization within a campaign | Does not reveal incrementality vs. doing nothing |
Geo-experiments / matched markets | Market-level incrementality | Offline media, OOH, TV, or when user-level randomization isn’t possible | Lower spillover risk; realistic scale | Fewer units → lower power; geographic heterogeneity |
Ghost ads / PSA controls | Incrementality in walled gardens via simulated control | Large ad platforms | Reduces contamination; platform-native | Method details/platform limits; black-box elements |
Pre-post (time series) | Change vs. historical baseline | Early directional reads | Quick, data-light | Confounded by seasonality and external factors |
Propensity / synthetic control | Quasi-experimental incrementality | When randomization isn’t feasible | Uses observational data | Sensitive to unobserved bias |
Uplift modeling | Predicted individual-level causal effect | Targeting only the “persuadables” | Improves efficiency of who to treat | Requires prior experiments and data science maturity |
Media mix modeling (MMM) | Channel contribution at aggregate level | Strategic budget planning | Works with privacy constraints; long-horizon | Coarse granularity; complements, not replaces, holdouts |
Best practices
- Randomize at the right level (person, household, account, or geo) to match exposure and avoid cross-contamination.
- Pre-register the plan: KPIs, window, success thresholds, guardrails, and analysis method.
- Size for power using realistic baseline rates and minimum detectable effect.
- Stratify or block on key covariates (e.g., lifecycle stage, value tier) before randomization.
- Enforce the holdout in all orchestration tools; audit delivery logs for leaks.
- Respect conversion lags; include late conversions or run a follow-up sensitivity analysis.
- Use ITT as the primary estimate; add treatment-on-the-treated as a sensitivity check if applicable.
- Adjust for seasonality and major events; avoid overlapping experiments that create interference.
- Report uncertainty (CIs, p-values) and business impact (incremental revenue/profit, iROAS).
- Re-validate periodically; effects decay as markets and audiences change.
Future trends
- Experimentation platforms integrated with CDPs to automate eligibility, assignment, and enforcement across channels.
- Privacy-preserving measurement via clean rooms, aggregated reporting, and modeled conversions.
- Sequential and real-time testing with alpha-spending controls for faster reads without inflating false positives.
- Causal ML and uplift-driven targeting that uses past holdouts to prioritize customers most likely to be influenced.
- Hybrid MMM + experiment workflows where market-level models are calibrated with ongoing holdouts.
- Platform-native incrementality tools (e.g., ghost ads) expanding beyond walled gardens and into retail media and CTV.
Related Terms
- Incrementality
- Control Group
- A/B Testing
- Geo-Experiment
- Ghost Ads
- Causal Inference
- Uplift Modeling
- Media Mix Modeling (MMM)
- Propensity Score Matching
- Intention-to-Treat (ITT) Analysis