1.3 Intercurrent Events and Estimand Strategy

Where the experimental logic breaks down

A randomized controlled trial is built on a simple premise: assign patients to treatment or control, follow them under identical conditions, and attribute any difference in outcomes to the treatment. The premise is powerful. It is also fragile, because real clinical trials do not follow identical conditions.

Patients discontinue the assigned treatment because of side effects, because they feel better, because they feel worse, because their insurance changes, or because they decide they would rather not continue. They receive rescue medication when their condition deteriorates beyond what the protocol permits to go untreated. They switch to the comparator, or to a third treatment not specified in the protocol. They die before the primary endpoint can be measured. They become pregnant and are withdrawn. They experience an event that, by protocol definition, ends their participation.

Each of these occurrences is an intercurrent event: something that happens after randomization and before the primary outcome measurement that complicates the interpretation of whatever outcome is eventually observed. They are not edge cases. In most trials of meaningful duration, they are the norm. The question is not whether intercurrent events will occur. The question is what they mean—and what the trial’s answer to that question says about what the trial is trying to show.

The ICH E9(R1) framework names five strategies for handling intercurrent events in the primary estimand: treatment policy, hypothetical, composite, while on treatment, and principal stratum. Each strategy produces a different estimand. Each estimand answers a different question. And the choice among them is not a statistical decision. It is a scientific and clinical decision about the nature of the treatment effect that the trial is claiming to estimate.


Treatment policy: the question a prescriber asks

Under the treatment policy strategy, the outcome is measured and used regardless of whether the intercurrent event occurred. A patient who discontinues the assigned treatment at week four and takes nothing thereafter is followed to the primary endpoint time point. A patient who requires rescue medication at week eight is still analyzed at week twelve. The intercurrent event is acknowledged—it happened—but it does not change what is measured or how the measurement enters the analysis.

The resulting estimand answers the question: what happens to patients who are assigned to this treatment, in a clinical environment where some will discontinue, some will need rescue, and some will deviate from the regimen? This is the question a prescribing physician asks when deciding whether to use a treatment in practice. It is the question a payer asks when deciding whether a treatment produces value in a real patient population. It captures the treatment effect in context—messy, imperfect, realistic context.

The cost of this realism is dilution. If a substantial proportion of patients in the active arm discontinue and derive no further benefit, the treatment policy estimand will reflect that. If rescue medication is available and effective, its use by patients in the active arm who are not responding will compress the observed difference between arms. The treatment policy estimate is typically smaller than the biological effect of the treatment, because it includes the patients for whom the treatment did not work as intended in clinical practice.

This is not a defect. It is the point. The treatment policy estimand is honest about what the treatment produces in practice. A treatment that shows a meaningful effect under a treatment policy estimand is a treatment that produces real-world benefit, not just biological activity in perfect adherers.

What the treatment policy strategy demands from the trial is rigorous outcome collection after intercurrent events. If patients who discontinue are lost to follow-up, or if missing outcomes are handled by imputation that assumes discontinuers behave like continuers, the analysis is not serving the treatment policy estimand—it is producing a number with a treatment policy label that reflects something else. The estimand requires the data. If the data cannot be collected, the estimand cannot be estimated, and the strategy should not be chosen.


Hypothetical: the question a mechanism scientist asks

Under the hypothetical strategy, the estimand is the outcome that would have been observed if the intercurrent event had not occurred. What would the patient’s blood pressure have been at week twelve if they had not required rescue medication at week eight? What would the six-minute walk distance have been if the patient had not discontinued? The intercurrent event is treated as a deviation from the experimental ideal, and the analysis attempts to estimate what would have happened in its absence.

This estimand answers the question: what is the biological effect of this treatment, under conditions where the protocol is followed? It is the question a pharmacologist asks when trying to understand the mechanism. It is sometimes the question a regulatory agency asks when the treatment policy estimate is diluted by high rates of rescue or discontinuation and the agency wants to understand the treatment’s underlying activity.

The hypothetical estimand is scientifically coherent, but it requires a causal assumption that is not directly verifiable: that the counterfactual outcome—what would have happened if the intercurrent event had not occurred—can be estimated from the observed data. This estimation requires statistical modeling, and the model encodes assumptions about how the outcome trajectory would have continued in the absence of the intercurrent event. These assumptions cannot be tested from the data; they must be argued from clinical knowledge about the disease and the treatment.

The practical implication is that a hypothetical estimand produces a primary result that depends on modeling assumptions in a way that a treatment policy result does not. Sensitivity analyses are not optional; they are essential, because the primary result is only as credible as the assumptions underlying the counterfactual model. A sponsor who chooses a hypothetical estimand must be prepared to defend not just the analysis but the assumptions that make the analysis interpretable.

There is also a regulatory risk. An estimand that answers the hypothetical question—what would have happened if patients had followed the protocol?—may be difficult to translate into a label claim that describes what will happen when the treatment is used in practice. The gap between the hypothetical estimand and the treatment policy question is precisely the gap that regulators and payers are asking about when they ask whether a treatment works in the real world. Choosing the hypothetical estimand is a decision to defer that question, and deferral has consequences.


Composite: incorporating the event into the outcome

Under the composite strategy, the intercurrent event is incorporated into the outcome definition rather than handled separately. Death before the primary endpoint becomes an outcome—typically the worst possible outcome on the scale being measured—rather than a complication that interrupts measurement of the primary endpoint. Discontinuation due to treatment failure is coded as failure. Rescue medication use is itself a component of a composite endpoint.

The composite strategy is most natural when the intercurrent event is clinically meaningful in its own right. If a patient dies before the primary endpoint can be measured, their death is not a missing data problem. It is an outcome—arguably the most important outcome—and treating it as a complication that requires imputation produces a result that is scientifically distorted. Incorporating death into a composite, or treating it as the worst rank in a win ratio analysis, acknowledges its clinical significance rather than modeling it away.

The composite strategy is also transparent. Unlike the hypothetical strategy, it does not rely on modeling assumptions about counterfactual outcomes. The outcome is what it is—the composite of the primary measure and the intercurrent event—and the analysis reflects that composite directly.

The cost is interpretability. A composite that combines a continuous outcome with a binary intercurrent event is no longer a continuous outcome in the usual sense. The scale has been extended, the distribution has changed, and the clinical meaning of a unit change in the composite measure may be less intuitive than the clinical meaning of the original continuous outcome. The composite strategy requires that the extended or transformed outcome be interpretable on its own terms, not just on the terms of its components. When it is, it is often the most honest available strategy. When it is not, it produces a primary result that is statistically clean and clinically opaque.


While on treatment: limiting the question

Under the while-on-treatment strategy, the estimand is restricted to the period during which the patient is receiving the assigned treatment. Outcomes after discontinuation are not included in the primary estimand. The question is not what happens to assigned patients overall, but what happens to patients while they are on treatment.

This strategy is appropriate when the scientific question is genuinely about the on-treatment period—when a short-term biological effect is the target, when the treatment is intended to be used for a defined duration and the off-treatment period is not clinically relevant, or when the mechanism of action is such that effects are expected only during active treatment. In oncology, where post-progression treatment switches are nearly universal and post-progression outcomes are governed more by subsequent therapy than by the index treatment, a while-on-treatment or related strategy may be the most scientifically honest framing of the question.

The risk of this strategy is that it can be used to avoid the treatment policy question rather than to answer a genuinely different one. If patients who discontinue due to tolerability are excluded from the estimand, and tolerability is a meaningful dimension of the treatment’s clinical profile, the while-on-treatment estimand may produce an efficacy estimate that does not reflect the treatment’s net benefit. The question of what happens while patients are on treatment is legitimate. The answer is incomplete if the reason patients stop being on treatment is itself clinically relevant.


Principal stratum: asking about a defined subgroup

Under the principal stratum strategy, the estimand is defined for a subgroup of patients identified by their potential intercurrent event status—patients who would not have experienced the intercurrent event regardless of treatment assignment. If the intercurrent event is death before the primary endpoint, the principal stratum might be patients who would survive to the measurement time regardless of which treatment they received. The estimand is the treatment effect in that subgroup.

This strategy addresses a specific problem: when the intercurrent event itself may be affected by treatment, comparing outcomes across patients who experienced different intercurrent events may conflate the effect of the intercurrent event with the effect of the treatment. The principal stratum approach attempts to define a comparison that is not confounded by differential intercurrent event rates.

The difficulty is that the principal stratum is defined by potential outcomes that are not directly observable. Patients cannot be classified into the stratum from the data alone; the classification must be estimated, which requires strong and often untestable assumptions. Principal stratum estimands are scientifically important and methodologically demanding. They are most defensible in settings where the scientific question requires them and where the assumptions can be partially evaluated from clinical knowledge—and least defensible as a default choice made because the other strategies seemed complicated.


The strategy is the estimand

The practical consequence of the five-strategy framework is that the choice of intercurrent event strategy is not separable from the choice of estimand. They are the same decision. A trial that specifies a primary endpoint without specifying an intercurrent event strategy has not specified an estimand. It has specified a measurement. What that measurement means—what question it answers—remains open until the strategy is chosen.

This distinction has immediate design implications. Different intercurrent event strategies require different data collection procedures. A treatment policy strategy requires outcome data from patients after they discontinue—data that many trial designs do not collect because the assumption, often unstated, is that post-discontinuation data are irrelevant or unavailable. A hypothetical strategy requires the data needed to support the counterfactual model—which may include covariates, time-varying measurements, and auxiliary outcomes that were not originally planned for. A composite strategy requires the intercurrent event itself to be measured and adjudicated with the same rigor as the primary outcome—which it often is not.

If the intercurrent event strategy is chosen after the protocol is finalized, some of the data needed to implement it may not have been collected. The strategy then becomes constrained not by scientific reasoning but by what the database happens to contain. This is the design equivalent of letting the available data determine the question, rather than letting the question determine what data to collect.


Multiple estimands and the primary/sensitivity relationship

ICH E9(R1) explicitly endorses the use of multiple estimands—one primary and several secondary or sensitivity estimands—as a way of addressing the trial’s question from multiple angles. This is not multiplicity in the statistical sense. It is scientific completeness.

A primary estimand of treatment policy, for example, answers the prescriber’s question. A hypothetical sensitivity estimand, estimated under the assumption that discontinued patients would have continued on the same trajectory as completers with similar baseline characteristics, answers the mechanism scientist’s question. Together, they provide a more complete picture than either alone.

The relationship between the primary and sensitivity estimands must be pre-specified, and the logic of the relationship must be transparent. Sensitivity estimands exist to stress-test the primary result—to ask what the result would look like if the assumptions underlying the primary strategy were modified. If the treatment policy and hypothetical estimands produce similar results, confidence in the primary result is strengthened. If they diverge substantially, the divergence itself is informative: it means the intercurrent events matter clinically, that the treatment’s effect depends heavily on whether patients stay on treatment, and that prescribers and payers will need to account for that dependence.

The sensitivity estimand is not a fallback. It is not the analysis to run if the primary fails. It is a pre-specified scientific question that the trial is simultaneously designed to answer, with the relationship to the primary result specified in advance and the divergence, if it occurs, treated as a finding rather than a problem.


What must be settled before section 1.4

By the time a trial design moves from estimand specification to the questions of closing—what has been committed to, what remains uncertain, what can go wrong—the intercurrent event strategy must be resolved. Not tentatively, not with a note that it will be finalized in the SAP, but resolved: the strategy named, the data collection implications identified, the sensitivity approach pre-specified, and the owner identified.

The owner is the person who can stand up in a regulatory meeting and answer the question: why this strategy, for this trial, for these patients? The answer requires clinical reasoning about the disease, about the treatment mechanism, about the conditions under which patients will be prescribed this treatment in practice. It requires regulatory judgment about what the agency will accept and what they will challenge. And it requires statistical judgment about what the chosen strategy can actually produce given the trial’s data collection plan.

If these three forms of judgment have not been brought together before the design is finalized, the intercurrent event strategy will be settled by default—by the software, by the previous protocol, by whoever drafted the SAP without instruction. Default strategies produce defensible-looking results. They do not produce defensible science, because the defense requires knowing why the strategy was chosen, not just what it was.

The estimand is the question. The intercurrent event strategy is the part of the question that most people would rather not answer. Answering it is where trial design becomes honest.


References: ICH E9(R1) Addendum on Estimands and Sensitivity Analysis in Clinical Trials (2019); Lipkovich et al., “Causal Inference and Estimands in Clinical Trials,” Stat Biopharm Res 2020; Qu et al., “Defining the Estimand in Clinical Trials with Treatment Switching,” Pharm Stat 2021; Rubin, “Causal Inference Using Potential Outcomes,” J Am Stat Assoc 2005.