4.6 Closing: The Governed Decision

What this chapter asked

Chapter 4 asked one question: when might we stop early?

It asked that question in five registers, each addressing a different dimension of the same design challenge.

Section 4.1 asked why interim analyses are requested—and answered honestly: the interests driving that request are not identical across parties. Sponsors want early information about program viability. DSMBs want protection for enrolled patients. Regulators want evidence integrity. These interests are not opposed, but they diverge at the margins, and the divergence should be designed for explicitly rather than assumed away by the fiction that everyone wants the same thing.

Section 4.2 asked what the alpha-spending function actually commits to—not just the type I error calculation, but the shape of the boundaries across information fractions, and what that shape says about the relative weighting of early stopping sensitivity versus final analysis integrity. The spending function is a design commitment, not a technical parameter.

Section 4.3 asked what the interim analysis plan actually does across the full range of possible true effects—not just under the assumed alternative and the null, but under the intermediate scenarios that are most likely to be realized. The operating characteristics are the behavioral profile of the plan, and they must be examined before the plan is finalized.

Section 4.4 asked what it means to stop for efficacy, for futility, or for safety—and answered that these are three categorically different decisions with different evidentiary standards, different decision authorities, and different governance implications that should not be collapsed into a single stopping rule.

Section 4.5 asked who has authority over the interim decisions and how that authority is exercised—answering that the statistical boundaries are the scaffolding, and governance is the building. Without governance, the plan is a document. With governance, it is a decision system.

What this chapter decided

By the end of this chapter, five things must be documented and owned.

The interim analysis plan is pre-specified, in writing, before enrollment begins. The number of interim analyses, their information fractions, the spending function, the resulting boundaries, and the futility rules are all stated. If the plan is “no planned interim analyses,” that too must be stated, as an explicit choice—not an omission.

The operating characteristics are computed and reviewed. The probability of early stopping under the null, under the assumed alternative, and under at least two intermediate scenarios is documented. The expected sample size under each scenario is reported. The expected overestimation conditional on early stopping is quantified. These are not optional sensitivity analyses; they are the design team’s due diligence on what the plan actually commits the trial to.

The three stopping categories are explicitly separated. The efficacy boundaries are distinct from the futility boundaries, which are distinct from the safety monitoring rules. Each category has its own evidentiary standard, its own authority structure, and its own documentation requirement. The DSMB charter specifies the decision hierarchy when multiple stopping signals are present simultaneously.

The DSMB is constituted, and its charter is complete. The composition, the independence criteria, the information flow structure, the session format, the communication protocol, and the documentation requirements are all specified. The charter is a document that a DSMB member with no prior knowledge of the trial could read and know exactly what they are expected to do.

The governance system is in place before the first interim analysis. This means the independent statistician is identified, the data management team understands the firewall requirements, the documentation templates are prepared, and the communication chain from the DSMB to the sponsor is tested. The governance system does not begin at the first interim meeting. It begins at enrollment, because the information security that protects the interim analysis starts when the first patient is enrolled.

The characteristic mistakes of this chapter

Three failures recur in the territory Chapter 4 covers.

The plan designed for the boundary, not for the decision. The alpha-spending function is selected—O’Brien-Fleming because it is conservative, or because it is the software default—without examining the operating characteristics it produces or comparing them to alternative designs. The boundaries are computed and reported. The probability of early stopping under plausible scenarios is not computed. The DSMB convenes at the first interim and finds a test statistic below the efficacy boundary but also below the conditional power threshold—and the charter does not specify what to do when both signals are present. The decision is improvised. The documentation of the improvisation is incomplete.

The governance that existed on paper but not in practice. The charter specified a full firewall. The independent statistician prepared the interim report for the DSMB. But the interim meeting included a sponsor representative in the “observer” role who was present during the presentation of arm-specific data. No one objected. The meeting minutes recorded “sponsor observer present during closed session” without noting the governance implication. Two years later, during the regulatory review, this entry in the meeting minutes raises questions about information access that the charter cannot answer because the charter’s implementation was not monitored.

The DSMB recommendation that was not followed. The DSMB observed an interim result approaching the futility boundary and recommended continuation with heightened monitoring. The charter specified the futility boundary as non-binding, so the sponsor had the authority to continue. The sponsor continued. The basis for continuation—a pipeline argument that the program had companion treatments in development that needed the phase III result to proceed—was not documented in the continuation decision record. Eighteen months later, the trial failed at the final analysis. The regulatory reviewer noted the non-binding futility near-crossing and the undocumented continuation decision. The final result is credible but its context is not explained, and the explanation must be reconstructed from memory.

Each of these failures shares the same origin: the design team distinguished between the statistical plan and the governance structure, treating the first as the real design work and the second as administrative detail. The administrative detail is where the design lives during the trial. The statistical plan is what was committed to before enrollment; the governance is what ensures the commitment is honored after enrollment.

What cannot be recovered

Some governance failures can be managed after the fact. A documentation gap can be supplemented with declarations from the participants. An unexpected DSMB charter situation can be adjudicated by a chartered process. An interim analysis that deviated from the pre-specified boundaries by a small margin can be explained with statistical precision about the actual versus planned operating characteristics.

Some cannot. When arm-specific interim data have been seen by parties outside the DSMB—by sponsor representatives, by commercial teams, by investigators—the contamination of subsequent trial conduct cannot be quantified or corrected. It is possible that nothing changed. It is also possible that enrollment practices, patient counseling, investigator enthusiasm, or data collection procedures were influenced by knowledge that was not supposed to be known. The uncertainty is irreducible.

When an interim analysis is conducted outside the pre-specified plan—at an unplanned information fraction, with an unplanned scope, by an unplanned party—the type I error rate for the primary test is no longer controlled at the nominal level. This is not a regulatory technicality. It is the fundamental promise of the trial design: that the probability of a false positive, across all the analyses the trial will conduct, is no more than 2.5% one-sided. That promise cannot be retroactively restored.

When the DSMB charter is not specific enough to constrain the decisions made at the interim meeting—when the DSMB exercises discretion that the charter did not anticipate and did not limit—the interim decisions are governed by the judgment of the DSMB members at that moment, under those conditions, with those interests. That governance may be excellent. But it is not documented, it is not pre-specified, and it is not auditable. It is the decision that was made by whoever was in the room when the decision had to be made.

These are the failures that cannot be recovered because they are not mistakes in the implementation of the plan. They are failures of the plan itself—failures that became visible only after the plan was tested by real data, real pressures, and real decisions. The guard against them is not better implementation; it is a plan that is specific enough, governed well enough, and documented carefully enough that implementation failures are detectable before they become irreversible.

The connection to what follows

Chapter 5 addresses bias—the systematic distortion of the trial’s primary comparison that arises when randomization, blinding, or allocation concealment fail. The interim analysis plan is a source of bias when it fails: information leakage from the interim to the sponsor creates bias in trial conduct; stopping decisions made outside the pre-specified rules create bias in the final result; documentation gaps create bias in the regulatory review. Chapter 5’s tools for protecting against bias assume that the governance system of Chapter 4 has functioned correctly—that the blind has been maintained, that the interim data have not contaminated the conduct of the trial, and that the trial’s operating characteristics correspond to what was pre-specified.

Chapters 4 and 5 are therefore not sequential in the operational sense. They are simultaneous requirements. The governance of the interim analysis is part of the bias protection system, not a separate design element that follows after bias protection is established. A trial that has good randomization and allocation concealment but poor interim governance is not a well-protected trial. It is a trial that has protected one dimension of its integrity and left another dimension open.

Chapter 4 risk summary

The decision this chapter owns: will the trial examine its accumulating data before completion, under what rules, with what stopping authority, governed by what structure?

The most common mistake: treating the interim analysis plan as a statistical document—a spending function and a set of boundaries—without designing the governance structure that ensures the plan is implemented faithfully. The boundaries control the type I error when they are correctly applied. The governance is what ensures they are correctly applied.

The professional-level risk: the interim analysis that was conducted—or that some party believes was conducted—outside the pre-specified plan. This risk is not primarily about fraud or misconduct. It is about the gap between what the plan specified and what actually happened at the interim meeting, in the information flow between the independent statistician and the DSMB, and in the documentation of the recommendation. When that gap is large enough, the trial’s primary result cannot be fully defended—not because the result is wrong, but because the process that produced it is not fully transparent. That is the professional risk: a result that is probably correct but cannot be completely verified, because the governance record does not establish that the pre-specified plan was the plan that was actually followed.