7.4 Non-Inferiority Failure Modes in Adaptive Designs

Why NI and adaptive design compound each other

Chapter 2 introduced non-inferiority design as the context in which the effect measure’s interpretation is most demanding and the margin’s derivation is most consequential. The NI trial is not asking whether the treatment works; it is asking whether the treatment is not too much worse than a comparator whose historical effect provides the evidence base for the margin. The constancy assumption—that the comparator’s historical effect transfers to the current trial—is the load-bearing assumption on which the entire NI argument rests.

Chapter 7 has been examining adaptive design as a context in which the risks of prior chapters are amplified. The amplification when NI design and adaptive design are combined is not additive—it is multiplicative. Every adaptation that changes the enrolled population, the trial duration, the background therapy context, or the analysis timing has the potential to undermine the constancy assumption on which the NI margin is based. When the constancy assumption fails in an adaptive NI design, the result is a trial that formally concludes non-inferiority—the pre-specified boundary was not crossed—while the scientific basis for that conclusion has been eroded by the adaptations that occurred during the trial.

This is the specific failure mode that this section examines: the adaptive NI trial that is formally valid and scientifically misleading.


The constancy assumption under population shift

Section 2.3 established that the NI margin requires two components: M1, the estimated effect of the comparator over placebo from historical trials, and M2, the clinically acceptable fraction of M1 to sacrifice. The constancy assumption is the claim that the comparator’s true effect in the current trial is at least as large as M1—that the historical evidence about the comparator’s effect transfers to the current population and context.

When an adaptive enrichment shifts the enrolled population toward a biomarker-positive subgroup, the constancy assumption must be re-evaluated for the enriched population. The comparator’s historical effect in the overall population—which was the basis for M1—may not be the same as its effect in the biomarker-positive subgroup, because biomarker-positive patients have a different underlying biology, a different disease trajectory, and potentially a different response to the comparator.

If the comparator’s effect is smaller in the biomarker-positive subgroup than in the overall population, the margin M1 derived from historical overall-population data overstates the comparator’s historical effect in the enriched population. The pre-specified NI margin is too permissive for the enriched population—it allows a larger inferiority than is scientifically justified. A trial that concludes non-inferiority against this permissive margin has shown that the new treatment is not worse than the comparator by more than an amount that was calculated for a different population.

This problem is not resolved by simply noting the enrichment in the trial report. It requires either: recalculating M1 for the biomarker-positive subgroup, based on historical data stratified by biomarker status; or restricting the NI claim to the pre-adaptation population, for which the original M1 is appropriate; or accepting that the NI conclusion applies to the enriched population with a margin that may be too permissive, and acknowledging this limitation explicitly.

None of these resolutions is costless, and the pre-specification requirement means that the chosen resolution must be specified before the enrichment is triggered—before the adaptation reveals whether the enriched population’s constancy assumption is credible.


The constancy assumption under duration extension

SSR in an event-driven NI trial extends the trial’s follow-up duration when the control arm event rate is lower than assumed. The extension may seem operationally straightforward—more time to accumulate the needed events—but it carries a specific constancy assumption problem.

The NI margin was derived from historical trials that had a specific follow-up duration. The comparator’s effect estimated in those trials—the M1—is an effect over that specific duration. If the current trial extends its duration beyond the duration of the historical trials that generated M1, the constancy assumption is being applied to a longer follow-up period than the historical evidence covers.

In many indications, the treatment effect of both the comparator and the new treatment is not constant over time. Active treatments that prevent events in the short term may show waning efficacy over longer follow-up; treatments that require time to achieve their mechanism may show increasing efficacy over longer follow-up. When the follow-up is extended beyond the duration of the historical evidence, the M1 derived from that evidence may not apply, and the margin calculated from it may be too permissive or too conservative for the extended duration.

The standard response—that the constancy assumption is routinely violated and should be evaluated qualitatively rather than tested formally—does not resolve the problem when the adaptation itself introduces the duration extension. In a fixed design, the trial duration is pre-specified, and the constancy assumption is evaluated for that pre-specified duration. In an adaptive NI design that extends the duration based on SSR, the constancy assumption must be evaluated for the extended duration, which was not known at the time of design.

Pre-specification of the constancy assumption evaluation for the extended duration scenario—including what historical data will be used and how the analysis will be modified if the constancy assumption is not credible at the extended duration—is part of the SSR rule specification for adaptive NI designs.


The assay sensitivity problem under adaptation

Chapter 2’s discussion of the NI trial identified assay sensitivity—the trial’s capacity to distinguish active treatments from inactive ones—as the property that makes the NI margin meaningful. A trial with adequate assay sensitivity, run under conditions where the comparator would show its historical effect, demonstrates that a treatment within the NI margin of the comparator is genuinely close to the comparator in efficacy.

Adaptive designs can undermine assay sensitivity in ways that are not always recognized.

An adaptive enrichment that shifts toward patients with better prognosis may reduce the trial’s assay sensitivity. Better-prognosis patients have lower background event rates and a smaller absolute treatment effect, even if the relative effect is similar. A smaller absolute effect means the difference between the arms is smaller in absolute terms, making it harder to distinguish a genuinely effective treatment from a less effective one at the chosen margin. The NI margin—expressed in absolute terms, as a risk difference, or in relative terms, as a hazard ratio—may not have the same discriminatory power in the enriched population as in the original population.

An SSR that extends the trial beyond the original duration may similarly affect assay sensitivity. If the comparator’s effect diminishes over longer follow-up—because of habituation, resistance, or natural history changes—the trial at the extended duration may have lower assay sensitivity than at the original duration, because the comparator is less effective at distinguishing its own effect from the null than it was at the original duration.

These are not categorical objections to adaptive design in NI trials. They are the specific questions that must be asked—and answered—before the adaptive rule is finalized. If the enrichment or the SSR can be shown, by simulation and by review of relevant historical data, to not materially affect assay sensitivity, the concern is addressed. If it cannot be shown, the adaptation must be modified or the NI conclusion must be bounded to the conditions under which assay sensitivity was maintained.


The switching strategy: from NI to superiority

A common adaptive strategy in NI trials is to pre-specify a switch from an NI primary objective to a superiority objective if the interim data show the new treatment is substantially better than the comparator. The rationale is efficiency: if the new treatment is not merely non-inferior but actually superior, the trial should be capable of claiming the stronger result.

This strategy is statistically valid when pre-specified correctly. A trial that tests superiority after failing to show non-inferiority—or that tests NI after the superiority test fails—is testing two hypotheses and must control the family-wise type I error across both. The pre-specified switching strategy must specify the conditions under which the primary test switches, the alpha allocation between the NI and superiority tests, and the decision rule for each possible outcome.

The switching strategy also has an estimand implication. The NI test and the superiority test are estimating the same effect—the treatment versus comparator difference—but interpreting it against different thresholds and for different conclusions. This is not a new estimand; it is the same estimand with two evaluation criteria. The pre-specification must be clear that the estimand does not change at the switch—the population, the variable, the intercurrent event strategy, and the summary measure remain the same—and that only the conclusion rule changes.

What the switching strategy cannot do is allow the choice of NI versus superiority objective to be made based on the interim treatment effect without controlling the type I error for the adaptive selection. A trial that tests NI if the interim effect is weak and superiority if the interim effect is strong—and that claims whichever test succeeds without adjusting for the adaptive selection—has an uncontrolled type I error rate. The switching strategy must pre-specify both tests, both thresholds, and the combined family-wise error rate.


Pre-specifying the constancy re-evaluation

The constancy assumption cannot be tested within the trial. It is a claim about the comparator’s behavior in the current trial relative to its behavior in historical trials, and the historical trials are not randomized to the current or historical context. The constancy assumption is evaluated qualitatively—by examining whether the current trial’s population, background therapy, and context are similar enough to the historical trials that the M1 is transferable.

In an adaptive NI trial, the constancy re-evaluation must occur at the adaptation point. When the adaptation changes the population—enrichment—or the duration—SSR—the design team must re-examine whether the constancy assumption remains credible for the post-adaptation trial. This re-evaluation must use the same criteria that were used to justify the constancy assumption at design: a structured comparison of the pre-adaptation and post-adaptation trial contexts against the historical trials that generated M1.

This re-evaluation cannot be conducted by parties who have access to the interim treatment effect data. It must be conducted by an independent group—which may be the same independent statistician or IDMC that makes the adaptation decision, but acting on the constancy question separately from the adaptation decision—using only the protocol-defined population characteristics, the background therapy information, and the historical evidence base.

Pre-specifying the constancy re-evaluation means specifying, before enrollment, the criteria that will be applied to evaluate whether the constancy assumption remains credible after the adaptation. If the criteria are not pre-specified, the re-evaluation will be conducted post-hoc, using criteria that may be chosen to support the NI conclusion the sponsor wants to make.


What this section demands before proceeding

Adaptive NI design is the intersection of two demanding design frameworks, each of which amplifies the other’s risks. Before this section’s concerns can be addressed in the design, the following must be specified.

The constancy assumption for the original trial population is documented with the specific historical evidence—the trials, the M1 calculation, the credibility evaluation for the current population. The constancy re-evaluation criteria are pre-specified: what will be examined if the population shifts through enrichment or if the duration extends through SSR, by whom, at what point in the adaptation process, and with what consequence for the NI margin or the primary analysis if the constancy is no longer credible.

The assay sensitivity analysis for the adapted trial is documented: what historical data support the conclusion that the adapted population and duration will have adequate assay sensitivity to distinguish effective from ineffective treatments at the pre-specified margin. If the adaptation changes the assay sensitivity, the consequence for the NI margin is pre-specified—either the margin is adjusted for the adapted population, or the claim is bounded to the conditions under which assay sensitivity was maintained.

And the switching strategy, if used, pre-specifies both the NI and superiority tests, the conditions for switching, the alpha allocation, and the combined family-wise error rate.

These requirements are demanding because the combination of adaptive design and NI design is demanding—not because the combination is inherently invalid, but because each framework requires careful pre-specification of assumptions that the other framework can undermine. The demands are proportionate to the risk.


References: D’Agostino et al., “Non-Inferiority Trials: Design Concepts and Issues,” Stat Med 2003; Hung et al., “Adaptive Statistical Analysis Following Sample Size Modification in Clinical Trials,” J Biopharm Stat 2006; EMA Reflection Paper on Methodological Issues in Confirmatory Clinical Trials Planned with an Adaptive Design (2007); ICH E10, Choice of Control Group and Related Issues in Clinical Trials (2000).