Chapter 4: When Might We Stop Early?

The question Chapter 3 leaves open

Chapter 3 committed the trial to a sample size. It specified the assumed effect size, documented the nuisance parameters, justified the power level, and produced a target—a number of patients, or a number of events, at which the trial would complete its primary analysis.

What Chapter 3 did not address is what happens between enrollment and completion. Trials are not sealed experiments that run to their prespecified endpoint and then open. They run in real time, with real patients accumulating outcomes, and with the increasing discomfort that comes from suspecting the answer before it is officially known.

That suspicion creates pressure. If the treatment appears to be working dramatically, continuing to randomize patients to a control arm feels wrong. If the treatment appears to be harmful, continuing enrollment feels unconscionable. If the treatment appears to be doing nothing, continuing a long and expensive trial feels wasteful. These are real pressures, not hypothetical ones, and they have produced real consequences: trials stopped early for striking efficacy, trials stopped early for harm signals, trials continued past the point where futility was evident, and trials with interim results that contaminated the final analysis in ways that could not be corrected afterward.

The interim analysis plan—the pre-specified rules governing when the trial’s accumulating data may be examined, by whom, under what conditions, and with what authority to stop—is the design’s answer to these pressures. It is not a technical add-on to the sample size calculation. It is a governance document: a specification of who has authority over the trial’s continuation, what information they may see, what decision rules constrain their authority, and how that authority is exercised when the interim data are surprising.

Why this chapter exists separately from the sample size

Interim analyses are sometimes presented as consequences of the sample size calculation—the trial needs 500 events, so we will look at the data at 250 events. This framing is backward. The interim analysis plan is a design choice that must be made alongside the sample size calculation, not after it, because interim analyses affect the operating characteristics of the primary test in ways that the sample size calculation must account for.

Specifically: every interim look at the accumulating data that uses a test of the primary hypothesis consumes alpha. Alpha is the probability of a false positive—the threshold the primary analysis must clear. If alpha is consumed at interim analyses, the final analysis must clear a higher threshold than the unadjusted alpha to maintain the overall type I error rate. The total alpha budget is fixed; the interim analyses and the final analysis share it. If the allocation is not pre-specified, the type I error rate is not controlled.

This is the statistical argument for pre-specifying interim analyses. But it understates the full purpose of the interim analysis plan, which is not just to control type I error but to govern a series of high-stakes decisions that will be made under conditions of uncertainty, time pressure, and competing interests. The plan must specify not just the statistical boundaries but the decision rules, the governance structure, the information access rights, and the documentation requirements that ensure the decisions are made correctly—which means made honestly, made by the right people, and made in a way that can be defended after the fact.

What this chapter covers

Section 4.1 — Why Sponsors Want Interim Analyses begins where the design conversation usually does not: with the interests that drive the request for interim analyses. Interim analyses are not value-neutral design features. They serve the sponsor’s interest in early information, the regulatory interest in ongoing safety monitoring, the DSMB’s interest in patient protection, and the clinical community’s interest in not running trials past the point where the answer is evident. These interests are not always aligned, and the design of the interim analysis plan should acknowledge their divergence explicitly.

Section 4.2 — Alpha Spending examines the mechanism by which the type I error rate is controlled across multiple looks at the accumulating data. The alpha-spending framework—the O’Brien-Fleming, Pocock, and family of spending functions—is presented not as a statistical technique to be selected from a menu but as a design commitment: a pre-specified rule for how the total alpha budget is allocated across information fractions, with consequences for the shape of the stopping boundaries and the probability of stopping early under different scenarios.

Section 4.3 — Operating Characteristics addresses what the interim analysis plan actually commits the trial to in terms of probability statements. Under the null hypothesis, what is the probability of stopping early due to false efficacy? Under the alternative, what is the probability of stopping early due to correct efficacy detection? What is the expected sample size under each scenario? These operating characteristics are the trial’s behavioral profile—the statistical analog of the power curve—and they should be computed and examined before the plan is finalized.

Section 4.4 — Futility, Efficacy, and Safety examines the three categories of interim stopping: stopping for overwhelming efficacy, stopping for probable futility, and stopping for safety. Each has a different decision logic, a different evidentiary standard, and a different governance implication. They are frequently conflated in design discussions and should not be.

Section 4.5 — Governance addresses the organizational structure within which interim decisions are made: the DSMB, its charter, its relationship to the sponsor, the information flow at interim analyses, and the documentation requirements that ensure the interim analysis does not compromise the integrity of the final analysis.

Section 4.6 — Closing returns to the chapter’s central theme: an interim analysis plan is a governance document as much as a statistical document. Its purpose is not only to control type I error but to structure the decisions that will be made when the trial’s accumulating data are surprising—and to make those decisions defensible regardless of what they turn out to be.

The decision this chapter must make

Every trial must answer four questions before enrollment begins.

Will the trial have any interim analyses? If not, the answer is simple and honest: the trial will be analyzed once, at the end, and no interim decisions will be made based on the primary outcome data. If yes, the subsequent questions must be answered.

What will the interim analyses examine, and at what information fractions? The timing of interim analyses—expressed as fractions of the total planned information, not as calendar dates—must be pre-specified. Analyses at unspecified times, or analyses whose timing can be adjusted based on accumulating data, are not interim analyses in the controlled sense. They are unplanned looks.

Who will have access to the interim results, and in what form? Access to treatment-arm-specific interim data is the defining sensitivity of the interim analysis. Unrestricted access contaminates the final analysis. Restricted access—through an independent DSMB that acts on the interim data without revealing arm-specific results to the sponsor—preserves the integrity of the final analysis while enabling the decisions the interim plan requires.

What authority does the interim result carry? Recommendations, not decisions, is the appropriate framing for most DSMBs—the sponsor retains formal authority over trial continuation, but the DSMB’s recommendation is the operative decision in practice. The charter must specify the conditions under which the sponsor may override a DSMB recommendation, and those conditions should be narrow.

These four questions have answers that must be written down before enrollment begins. If they are not written down—if the interim analysis plan is a verbal understanding rather than a document—the trial does not have an interim analysis plan. It has an informal arrangement that will be renegotiated under pressure, and pressure is precisely when the arrangement matters most.

What this chapter is not about

This chapter does not provide a guide to choosing between O’Brien-Fleming and Pocock stopping boundaries, or to selecting an alpha-spending function from the available literature. The statistical properties of these boundaries are well-documented, and the choice among them is a technical decision that follows from the design considerations this chapter discusses.

It also does not address Bayesian adaptive designs or response-adaptive randomization, which are covered in Chapter 7. The interim analysis plan discussed here is the classical frequentist plan for a trial with pre-specified information fractions and pre-specified alpha spending—the design that most regulatory agencies expect as the default and against which adaptive departures are evaluated.

What this chapter provides is the conceptual foundation: why interim analyses create risks as well as opportunities, what the interim analysis plan is actually governing, how the governance structure should be designed, and what the consequences are when the governance fails.