Intention-to-Treat (ITT) Analysis

Definition

Intention-to-Treat (ITT) analysis is an experimental analysis principle that evaluates outcomes for all subjects according to the group they were originally randomized or assigned to, regardless of whether they actually received, opened, viewed, or adhered to the treatment. ITT preserves the benefits of randomization and provides an unbiased estimate of the effect of offering or rolling out an intervention to the target population.

How it relates to marketing

In marketing experiments—email A/B tests, holdouts for paid media, feature-flag rollouts, or promo offers—some audiences won’t comply as intended (e.g., emails bounce, ads aren’t won in auctions, users ignore prompts, cross-device identity breaks). ITT answers: “What is the expected impact if we deploy this to the intended audience under real-world conditions?” It reflects operational realities (deliverability, adoption, platform limits) that leaders must account for in go-to-market decisions.

How to Calculate

Setup

Let Zi ∈ {0,1} denote assignment (1 = assigned to treatment, 0 = assigned to control), and let Yi be the outcome per subject (e.g., conversion flag, revenue, ARPU, retention).

Point Estimate (Additive Lift)

Compute the difference in average outcomes by original assignment:

ITT = Y |Z=1 Y |Z=0
Difference in means by assignment.
For Binary Outcomes (Conversion Rate)

Treat missing conversions as 0 within the analysis window (per your pre-spec). Then compute:

How to utilize

  • Decision framing: Use ITT as the primary estimate for the impact of launching a campaign/feature to your intended audience under realistic adoption and delivery.
  • Guardrail alignment: ITT aligns with top-line KPIs (revenue per user, conversion, retention) that matter to portfolio-level decisions.
  • Sensitivity analyses: Complement ITT with secondary views (Per-Protocol, As-Treated, or Complier effects) to understand why impact may be muted (e.g., low exposure, low adoption).
  • Common use cases:
    • Email or push tests with non-openers and bounces
    • Paid media with auction loss or frequency caps
    • Feature toggles where many users never engage
    • Offers requiring opt-in or multi-step completion
    • Geo holdouts where market leakage occurs

Compare to similar approaches

ApproachWhat it analyzesWhen it’s usefulBias riskTypical metric reported
Intention-to-Treat (ITT)Effect of assignment (offer/rollout) on all assignedPrimary decision metric for real-world deploymentLow (preserves randomization)Δ in CR, ARPU, retention across assigned
Per-Protocol (PP)Effect among those who fully complied with protocol (e.g., opened, viewed, completed)Understand efficacy among compliersHigher (excludes noncompliers; selection bias)Δ among protocol-adherent users
As-Treated (AT)Effect by actual exposure regardless of original assignmentOperational diagnostics of delivery/exposureHigh (assignment no longer randomized)Δ among exposed vs unexposed
Treatment-on-the-Treated (TOT)ITT scaled by compliance rate (estimate of effect on those actually treated)Quantify impact if exposure were universalModerate; requires strong assumptionsTOT≈ITTCompliance\text{TOT} \approx \frac{\text{ITT}}{\text{Compliance}}TOT≈ComplianceITT​
CACE/LATEEffect among compliers identified via assignment as an instrumentWhen noncompliance is substantial and instrument validity is credibleModerate; IV assumptions requiredLocal effect for compliers

Best practices

  • Pre-specify ITT as primary. Declare ITT before launch, including outcome windows, cohorts, and handling of missing data.
  • Track assignment vs exposure. Keep explicit flags for randomized assignment, delivered, opened/viewed, clicked, engaged, converted.
  • Avoid post-randomization filtering. Do not drop non-openers, bounces, or unexposed users from the ITT population.
  • Use variance reduction that respects ITT. Techniques like CUPED or covariate adjustment can reduce standard errors while keeping the ITT estimand intact.
  • Guard against interference. Minimize contamination between arms (e.g., frequency caps, geo or device-level isolation, holdout integrity checks).
  • Scale-aware design. For platform-level changes with partial adoption, prefer cluster or geo randomization when individual isolation is impractical.
  • Run sensitivity analyses. Report PP/AT or CACE alongside ITT to diagnose uptake and operational friction.
  • Document compliance. Report delivery, exposure, and adoption rates to interpret ITT magnitude.
  • Power for real-world effects. Base sample size on expected ITT (often smaller than idealized efficacy).
  • Audit identity and attribution. Cross-device and cookie loss can depress measured exposure; maintain stable IDs and server-side logging.
  • Server-side and privacy-resilient experimentation: Wider use of server-side assignment, clean rooms, and aggregated event pipelines that still preserve ITT analyses.
  • Geo and cluster experiments: Increased adoption where individual-level exposure is hard to control, with improved cluster-robust inference.
  • Automated diagnostics: Experiment platforms that surface compliance, exposure, and interference metrics alongside ITT by default.
  • Causal scaling methods: More routine pairing of ITT with CUPED, synthetic controls for market tests, and IV-based estimates to separate delivery from efficacy.
  • Identity fragmentation resilience: Designs and logging that keep ITT credible despite third-party cookie deprecation and cross-device gaps.
  • A/B Testing
  • Randomized Controlled Trial (RCT)
  • Per-Protocol Analysis
  • As-Treated Analysis
  • Treatment-on-the-Treated (TOT)
  • Complier Average Causal Effect (CACE)
  • Noncompliance
  • CUPED
  • Geo-Experimentation
  • Instrumental Variables

Was this helpful?