4.2 Alpha Spending

What is being spent

The type I error rate—the probability of a false positive—is the trial’s most fundamental statistical commitment. Set at 2.5% one-sided (the standard for regulatory submissions), it means that if the treatment has no effect, there is a 2.5% probability that the trial will produce a statistically significant result anyway. This probability is controlled by requiring that the test statistic exceed a threshold that corresponds to the chosen alpha level.

When a trial looks at its accumulating data more than once—at an interim analysis and at the final analysis—each look is an opportunity to reject the null hypothesis. If each look is conducted at the unadjusted alpha level of 2.5%, the overall probability of a false positive across all looks is higher than 2.5%. The exact inflation depends on the number of looks and the correlation structure of the sequential test statistics, but for two independent looks each at 2.5%, the probability of a false positive at either look exceeds 4%—a 60% inflation of the nominal type I error.

Alpha spending is the framework for controlling this inflation. Rather than conducting each look at the full alpha, the total alpha budget of 2.5% is allocated—spent—across the planned looks according to a pre-specified spending function. The spending function determines how much alpha is available at each information fraction: how stringent the stopping boundary is at the first interim, how stringent at the second, and how stringent at the final analysis. The total alpha spent across all looks, integrated over the spending function, equals 2.5%. The family-wise type I error rate is thereby controlled.

This is the statistical purpose of alpha spending. Its design purpose is different: it forces the design team to commit, before enrollment begins, to exactly how the trial will be analyzed at each interim time point. That commitment is not optional. A trial that performs interim analyses without a pre-specified spending plan has consumed alpha from an unaccounted budget, and the resulting family-wise type I error rate is unknown.

Spending functions and what they commit to

The spending function is a mathematical rule that maps the information fraction to the cumulative alpha spent up to that fraction. Three families of spending functions are in common use, and they make different commitments about the shape of the stopping boundaries.

The O’Brien-Fleming spending function spends very little alpha at early information fractions and most of the alpha at or near the final analysis. The resulting boundaries are very stringent early—requiring an extremely large interim effect to stop for efficacy—and approach the unadjusted alpha at the final analysis. The final analysis is conducted at nearly the full nominal alpha level, with minimal penalty for the interim looks.

The practical implication is that the O’Brien-Fleming plan is conservative about early stopping: only a dramatic interim result—a treatment effect substantially larger than the assumed effect—will trigger an early stop. This conservatism is appropriate when early estimates are expected to be noisy and regression to the mean is a concern, and when the cost of an early stop—the overestimation bias discussed in Section 4.1—is judged to be worse than the cost of continuing. It is less appropriate when the treatment is expected to show an early, large effect and when early stopping carries lower overestimation cost.

The Pocock spending function spends alpha more evenly across information fractions, resulting in boundaries that are the same at each look. The effect is that early stopping is easier—a less extreme interim effect is needed to cross the boundary—but the final analysis is conducted at a more stringent alpha than the nominal level, with a meaningful penalty for the interim looks.

The practical implication is that Pocock boundaries are more sensitive to early efficacy signals and more protective of enrolled patients from continued randomization if the treatment is clearly working early. They are less appropriate when the scientific question requires a definitive final analysis with minimal alpha penalty, and when the clinical community expects the final result to be the evidentiary standard regardless of interim signals.

The flexible spending functions of Lan and DeMets allow the spending function to be specified in terms of alpha spent as a continuous function of the information fraction, with particular forms that approximate O’Brien-Fleming or Pocock shapes or occupy the space between them. The key advantage of the Lan-DeMets framework is that the number of interim analyses and their exact timing do not need to be specified in advance—only the spending function. If an interim analysis is added or its timing shifts, the boundaries can be recalculated from the spending function without violating the overall alpha control. This flexibility is valuable in practice, where interim analyses are often triggered by DSMB schedules that do not perfectly match the information fractions assumed at design.

The choice among these functions is not primarily statistical. It is a commitment about the relative weighting of early stopping sensitivity versus final analysis integrity, and it should be made based on the clinical characteristics of the trial—the expected shape and timing of the treatment effect, the cost of overestimation bias, the relative importance of early versus final evidence—not on convention or software defaults.

One-sided versus two-sided boundaries

The alpha-spending framework applies separately to the upper and lower tails of the test statistic distribution. The upper boundary—corresponding to the treatment appearing dramatically more effective than control—triggers stopping for efficacy. The lower boundary—corresponding to the treatment appearing dramatically worse than control—triggers stopping for harm or futility.

In a superiority trial, the primary test is typically one-sided: the hypothesis is that the treatment is better than control, and the trial is powered to detect improvement, not to exclude harm. The one-sided alpha of 2.5% is spent on the upper boundary. The lower boundary, if it exists, is a separate stopping rule for harm—often symmetric to the upper boundary in its statistical form but carrying different decision authority and different governance implications.

This asymmetry is important and frequently conflated. An interim result that crosses the upper boundary triggers a recommendation to stop for efficacy, subject to the governance constraints of the DSMB charter. An interim result that crosses the lower boundary triggers a recommendation to stop for harm—a recommendation with different clinical weight and different regulatory implications. The two boundaries are statistically symmetric but clinically different, and the interim analysis plan should treat them as different decision rules with different governance requirements.

For non-inferiority trials, the boundary structure is further complicated by the structure of the non-inferiority hypothesis. The efficacy test in an NI design is one-sided in the direction of the new treatment being no worse than the comparator by more than the margin. The stopping rules must reflect this structure, not the structure of a superiority test, and the spending function must be calibrated to the NI hypothesis.

The information fraction problem

The alpha-spending framework is specified in terms of information fractions—the proportion of the total planned information that has been observed at each interim look. For event-driven time-to-event trials, the information fraction is the fraction of the planned number of events observed. For continuous outcome trials, it is typically the fraction of the planned sample size enrolled and assessed.

The information fraction is the right currency for alpha spending because it reflects the statistical information content of the interim data, not its calendar timing. An interim analysis at 50% information fraction has half the statistical information of the final analysis, regardless of whether that corresponds to 18 months or 36 months of calendar time.

The problem arises when the planned total information—the target event count or the target sample size—is wrong. If the trial was designed for 500 events and will actually accumulate only 400 due to a lower-than-assumed event rate, an interim analysis at 250 events is at the 50% information fraction of the actual trial but at the 50% information fraction of the planned trial only coincidentally. The spending function was calibrated to the planned 500-event design; applying it at the actual 400-event scale produces boundaries that are not quite right—not catastrophically wrong, but not operating as designed.

This is the interface between Chapter 3’s event rate uncertainty and Chapter 4’s alpha spending. When the total information target is misspecified, the spending function produces boundaries that are calibrated to a design that does not correspond to the trial being run. The correction requires either updating the spending function as the true total information becomes clearer—which requires pre-specification of the update rule—or accepting that the operating characteristics will deviate from the planned values, and documenting that deviation.

The practical implication is that interim analysis plans should specify not just the spending function but the rule for recalibrating the boundaries if the total information target changes. This rule must be pre-specified—applied by an independent statistician who does not see the treatment arm results—to prevent the recalibration from being influenced by knowledge of the interim trend.

What the spending function does not control

Alpha spending controls the family-wise type I error rate across the pre-specified interim analyses and the final analysis. It does not control the type I error rate for additional analyses—sensitivity analyses, subgroup analyses, secondary endpoints, exploratory analyses—that are conducted at the same data cut. It does not prevent information leakage from the interim results to parties who can use that information to influence the trial. And it does not protect against the trial being modified—endpoint changed, eligibility expanded, analysis plan revised—after knowledge of the interim trend is available to the design team.

These limitations are not defects of the alpha-spending framework. They are the boundary conditions within which the framework operates, and they define the governance requirements that must complement the statistical boundaries. A trial with a well-specified alpha-spending plan but a poorly specified governance structure—one that does not control who sees the interim results, does not prevent unblinded access from influencing protocol modifications, and does not document what was seen at each interim—is not a trial with controlled type I error in the full sense. It has controlled statistical type I error within the specified tests, and uncontrolled error everywhere else.

The full protection of the type I error rate requires the statistical spending plan and the governance structure to work together. The statistical plan specifies the boundaries. The governance structure ensures that the only decisions made based on interim data are the ones the statistical plan allows.

Reporting the spending plan

The alpha-spending plan must be reported in the protocol and in the statistical analysis plan before the first interim analysis. The report should specify the spending function by name and parameters, the planned information fractions at which interim analyses will be conducted, the stopping boundaries at each planned interim, the final analysis boundary after alpha spending, and the rule for recalibrating boundaries if the total information target changes.

A spending plan that specifies only the function name—“we will use an O’Brien-Fleming spending function”—without reporting the resulting boundaries is not complete. The boundaries must be computed and reported, so that the DSMB, the sponsor, and the regulatory agency can verify that the boundaries were applied correctly at each interim analysis. Discrepancies between the pre-specified boundaries and the boundaries applied at the interim analysis are a finding in a regulatory review, regardless of whether the primary result is positive or negative.

The spending plan is a commitment. It should be treated as one—documented with the same care as the estimand and the sample size justification, reviewed by the same people, and defended with the same rigor if it is questioned. A spending plan adopted because the statistical package defaults to it, without examination of what it commits the trial to, is not a plan. It is a setting.

References: Lan and DeMets, “Discrete Sequential Boundaries for Clinical Trials,” Biometrika 1983; O’Brien and Fleming, “A Multiple Testing Procedure for Clinical Trials,” Biometrics 1979; Pocock, “Group Sequential Methods in the Design and Analysis of Clinical Trials,” Biometrika 1977; Jennison and Turnbull, Group Sequential Methods with Applications to Clinical Trials (2000).