2.3 The Non-Inferiority Margin

The question only a non-inferiority trial can answer

A superiority trial asks: does the new treatment produce a better outcome than the comparator? A non-inferiority trial asks a different question: is the new treatment acceptably similar to an active comparator, in the sense that whatever inferiority exists does not outweigh the treatment’s other attributes—its safety profile, its route of administration, its cost, its suitability for patients who cannot tolerate the comparator?

This is a legitimate scientific question, and it arises in legitimate clinical contexts. When an effective treatment exists and withholding it from a control arm would be unethical, a placebo-controlled superiority trial is impossible. When a new treatment offers meaningful advantages in tolerability or convenience, demonstrating that it is not inferior to the standard on the primary efficacy endpoint—combined with demonstrating the other advantages—constitutes a clinically meaningful result. Non-inferiority trials are not a regulatory convenience or a statistical workaround. They answer a real question.

But the non-inferiority question is harder to answer than the superiority question, because it requires specifying in advance what “acceptably similar” means. The non-inferiority margin—call it M—is that specification. It is the largest difference in the primary effect measure, in favor of the comparator, that the trial is willing to accept while still concluding non-inferiority. If the treatment arm falls short of the comparator by no more than M, the trial concludes non-inferiority. If it falls short by more than M, it does not.

The margin is not a statistical parameter. It is a clinical and ethical judgment, expressed as a number in the units of the effect measure, that the design team—and ultimately the regulatory agency—must accept before the trial begins. The number must be defensible. And it must be earned, not borrowed.

The two-step logic of margin justification

The regulatory standard for non-inferiority margin justification, articulated in FDA guidance and ICH E10, requires reasoning in two steps.

The first step is the M1 step: establishing the effect of the active comparator over placebo. This requires evidence—usually from historical randomized controlled trials—that the comparator produces a meaningful effect. The historical estimate of this effect, at some conservative fraction of its confidence interval, defines the ceiling from which the margin is derived. The logic is: if the comparator produced an effect of size E over placebo, and the new treatment is non-inferior to the comparator within margin M, then the new treatment has preserved at least E minus M of the comparator’s effect over what placebo would have produced. This preserved effect is the assurance of efficacy.

The second step is the M2 step: determining what fraction of M1 is acceptable to sacrifice for the other benefits the new treatment offers. M2 is always smaller than M1. The proportion preserved is a clinical judgment about how much of the comparator’s established efficacy must be retained to justify the new treatment’s use. For treatments with serious or irreversible outcomes—mortality, stroke, major organ failure—the preserved proportion is typically high: regulators expect 50% or more of M1 to be preserved, and may require higher fractions for outcomes where any loss of efficacy is clinically unacceptable. For outcomes with effective rescue options, or where the new treatment’s advantages are large, the preserved proportion may be smaller.

This two-step structure has an important implication: the margin is not negotiable independently of the comparator’s evidence base. A sponsor who sets M to be 20% of the comparator’s effect is claiming that 80% of the comparator’s effect will be preserved. That claim is only defensible if the estimate of the comparator’s effect is itself credible—if the historical trials are recent, conducted in populations similar to the current trial, using endpoints that are still clinically relevant, and analyzed with methods that produce an honest estimate of the effect size.

When the historical evidence base is weak—few trials, small samples, effect estimates with wide confidence intervals—the margin derived from it will be correspondingly uncertain, and a conservatively chosen margin will be small. A sponsor who finds that a rigorous application of the two-step logic produces a margin that makes the trial infeasible—too small to detect a meaningful treatment difference—has learned something important: the historical evidence does not support a non-inferiority design. The correct response is to reconsider the design, not to relax the margin.

Who owns the margin

The non-inferiority margin is a clinical judgment expressed as a number. It is not a statistical parameter, and it cannot be determined by statistical analysis alone. It requires a synthesis of historical evidence about the comparator’s effect, clinical reasoning about what magnitude of inferiority is acceptable, and regulatory judgment about what the agency will accept.

This creates an ownership problem that is almost universally mishandled.

In most trial design processes, the margin is assigned to the statistician. The statistician reviews the historical trials, estimates the comparator’s effect, applies a conservative fraction, and proposes a margin. The clinical team accepts the proposal because it is expressed as a number they cannot easily challenge. The regulatory team is consulted, if at all, after the protocol is already drafted.

This sequence is backward. The statistician can estimate M1 from the historical data; that is an appropriate statistical task. But the M2 step—how much of M1 is acceptable to sacrifice?—is a clinical and ethical judgment that belongs to the clinical team and the clinical community that will use the treatment. What level of inferiority, on this endpoint, in this disease, for this patient population, is the profession willing to accept in exchange for the new treatment’s advantages? That question cannot be answered from historical trial data. It requires clinical reasoning, patient input where available, and professional consensus that the statistician is not positioned to substitute for.

When the clinical team has not owned the M2 judgment—when the margin was set by statistical convention or borrowed from a related indication without examination—the margin is indefensible under scrutiny. Not because the number is necessarily wrong, but because the reasoning behind it has not been developed. And when the reasoning has not been developed, it cannot be recovered. The agency’s question—“how did you determine that this magnitude of inferiority was acceptable?”—cannot be answered with “the statistician proposed it.”

The constancy assumption and why it matters

The two-step margin justification assumes that the comparator’s effect in the new trial is the same as its effect in the historical trials from which M1 was estimated. This is the constancy assumption: the treatment effect of the active comparator over placebo would be the same in the current trial as it was in the historical trials, if a placebo arm were included.

The constancy assumption is never literally true, and in many trials it is under serious pressure. The historical trials that established the comparator’s efficacy may have been conducted in different patient populations, with different definitions of the primary endpoint, with different standards of background therapy, or with different disease severities. The effect of the comparator may have changed because the patient population has evolved—better background therapy means that the comparator’s marginal effect is smaller than it was when background therapy was minimal. The endpoint may have been redefined in ways that change the comparator’s estimated effect. The historical control event rate may be substantially different from the current trial’s expected event rate, which matters for absolute effect measures.

When the constancy assumption is not credible—when the historical trials and the current trial are so different that the comparator’s historical effect cannot be expected to transfer—the entire margin justification collapses. The M1 estimate is not an estimate of the comparator’s effect in the current trial; it is an artifact of a different clinical context. No M2 adjustment can repair this.

Evaluating the constancy assumption requires comparing the historical trials to the current trial design: same population, same definition of the endpoint, same background therapy, similar event rates. Where substantial differences exist, they should be documented and their implications for M1 assessed. If the differences are large enough that M1 cannot be credibly estimated, the non-inferiority design should be reconsidered.

This evaluation is not optional. Regulatory agencies now routinely ask for it. A non-inferiority protocol that cannot address the constancy assumption will receive questions that cannot be answered after the trial is complete.

The margin and the effect measure

The non-inferiority margin is expressed in the units of the primary effect measure. A margin of 0.10 on the risk difference scale means something entirely different from a margin of 0.10 on the log hazard ratio scale. A margin defined in the original units of a continuous outcome means something different from a margin defined as a proportion of the standard deviation.

This is why Section 2.1 must precede Section 2.3. The margin cannot be defined before the effect measure is settled, because the margin is an expression in the language of the effect measure. Borrowing a margin from a historical trial that used a different effect measure—common in practice, particularly when the historical evidence base is thin—requires converting the historical margin to the current scale, which requires assumptions about the relationship between the two scales that are themselves empirical claims.

When the effect measure changes between the historical trials and the current design, the margin must be recalculated, not imported. This is not a technicality. It is the difference between a margin that reflects a coherent clinical judgment and a margin that is a number without a referent.

The margin as a risk allocation

A non-inferiority margin allocates a specific risk: the risk that the new treatment is inferior to the comparator by an amount that is clinically meaningful, but that the trial—by design—will not detect.

If M is large, the trial is easy to run and easy to succeed in. It is also willing to accept substantial inferiority. If a treatment that is truly 15 percentage points worse than the comparator would still be declared non-inferior because M is 20 percentage points, the margin has allocated the risk of that 15-point inferiority to patients who will subsequently receive the treatment—patients who are not in the trial and who did not consent to that risk.

This is the ethical dimension of the non-inferiority margin, and it is rarely discussed explicitly. The margin is a design decision that affects not just the enrolled patients but all future patients who will be treated based on the trial’s result. A margin set generously to make the trial feasible is a margin set at the expense of those future patients’ expected benefit.

The appropriate framing is not “how large can the margin be for this trial to be feasible?” It is “how large can the margin be for the trial’s result to be trustworthy—for a non-inferiority conclusion to genuinely mean that the treatment preserves adequate efficacy?” If the answer to the first question exceeds the answer to the second, the non-inferiority design is not appropriate for this trial at this moment in the treatment’s development. A trial that can be run but whose result cannot be trusted is not a trial worth running.

What this section demands before proceeding

Before Chapter 2’s closing and before Chapter 3 addresses sample size, three things must be resolved about the non-inferiority margin if the trial is a non-inferiority design.

The historical evidence base for M1 must be identified and assessed for the constancy assumption. The relevant trials, their population, their endpoint definitions, their effect size estimates and confidence intervals, and their comparability to the current design must be documented. If the constancy assumption cannot be credibly made, the design must be reconsidered.

The M2 judgment must be owned by the clinical team, with documented reasoning about what magnitude of inferiority is acceptable for this indication, this patient population, and this trade-off. The reasoning must be expressible in clinical terms—not “we used 50% of M1” but “we judged that a treatment that preserves at least X of the comparator’s established effect is acceptable because Y”—where Y is a clinical argument that a physician or patient could evaluate.

And the margin must be expressed in the units of the primary effect measure. If the effect measure was settled in Section 2.1, the margin can be expressed in those units. If it was not, it cannot be, and the effect measure must be settled first.

A margin that has not been through this process is a number. A margin that has is a commitment—one that can be defended, by name, by the person who owns it, in a regulatory meeting.

References: FDA Guidance for Industry, Non-Inferiority Clinical Trials to Establish Effectiveness (2016); ICH E10, Choice of Control Group and Related Issues in Clinical Trials (2000); D’Agostino et al., “Non-Inferiority Trials,” Drug Inf J 2003; Snapinn and Jiang, “Preserving the Comparator’s Treatment Effect in a Non-Inferiority Trial,” Pharm Stat 2008.