Chapter 5: How Do We Protect Against Bias?

The question Chapter 4 leaves open

Chapter 4 governed the decisions made during the trial. It specified who could see the accumulating data, under what rules, with what authority, and with what documentation. When governance works, the trial’s operating integrity is maintained across time—the blind holds, the interim data stay within the DSMB, and the final analysis reflects the design that was pre-specified.

What Chapter 4 assumed, without stating, is that the comparison between arms is valid. That the patients in the treatment arm and the control arm are exchangeable at baseline. That the difference observed at the primary endpoint reflects the treatment effect and not some systematic difference in who received which treatment, how they were assessed, or how their outcomes were handled.

That assumption is the subject of this chapter. Bias—the systematic distortion of the estimated treatment effect—is not one failure but many. It can enter through the assignment mechanism, through the assessment of outcomes, through the handling of protocol deviations, or through the differential loss of patients from each arm over time. Some forms of bias are obvious and dramatic; most are subtle and invisible, operating at a level below what any single person in the trial observes. Together, they can produce a trial that is statistically significant for a treatment that does not work, or that fails to detect a treatment that does.

The tools against bias are randomization, blinding, allocation concealment, and validation. They are not independent defenses; they are a system, and the system fails when any component fails. Chapter 5 examines each component, what it protects against, and what it cannot protect against.

What bias is, and what it is not

Bias in clinical trials is a systematic error in the estimated treatment effect—an error that does not diminish as the sample size increases. This distinguishes bias from random error, which diminishes with increasing sample size and is addressed by the power calculation. A biased trial with 10,000 patients is more precisely wrong than a biased trial with 1,000 patients. The additional patients add precision to the wrong estimate.

Bias operates at the level of the comparison, not the level of the individual measurement. A measurement that is systematically too high is a biased measurement, but in a randomized trial, if both arms are measured with the same systematic error, the bias in the measurement cancels in the comparison. The comparison is unbiased even if the measurements are not. Conversely, a measurement error that differs between arms—because assessors know which arm each patient is in and adjust their assessments accordingly—introduces bias into the comparison even if each arm’s measurements are individually consistent.

This distinction matters for design. The protection against bias in clinical trials is not the accuracy of the individual measurements; it is the comparability of the conditions under which measurements are made across arms. Randomization ensures that the arms are comparable at baseline. Blinding ensures that post-randomization conditions—assessment, treatment, dropout—do not systematically differ between arms in ways that are not part of the treatment effect.

When either of these mechanisms fails, the trial’s comparison is distorted. The distortion may be small or large, detectable or invisible, correctable or not. What it is, always, is systematic—favoring one arm over the other in a way that is not attributable to the treatment.

What this chapter covers

Section 5.1 — Randomization Logic examines what randomization actually does and what it does not do. Randomization guarantees that treatment assignment is probabilistically independent of patient characteristics—known and unknown—at the time of assignment. It does not guarantee balance; in small samples, randomization can produce imbalanced groups by chance. And it does not guarantee that post-randomization characteristics will be balanced; if the treatment affects which patients stay in the trial, which patients comply, or which patients are assessed, post-randomization imbalance is not a randomization failure—it is a feature of the treatment effect—but it will distort the primary comparison if it is not accounted for in the estimand and the analysis.

Section 5.2 — Stratification examines the specific design choice of whether to stratify randomization by pre-specified prognostic factors. Stratification is a tool for ensuring that the known predictors of the primary outcome are balanced between arms even in relatively small samples. It is not without cost: stratification factors must be specified before randomization begins, must be accurately ascertained at the time of randomization, and must be reflected in the primary analysis to capture their efficiency benefit. When stratification factors are misspecified, inaccurately measured, or not reflected in the analysis, stratification adds administrative complexity without the statistical benefit it was intended to provide.

Section 5.3 — Allocation Concealment examines the protection that must exist between the randomization sequence and the moment of treatment assignment. Allocation concealment prevents the person enrolling the patient from knowing the upcoming assignment before the patient is enrolled. Without allocation concealment, randomization can be subverted—enrollers who know the upcoming assignment will enroll or defer patients to ensure their preferred patients receive their preferred treatment. The result is a selection bias that compromises the validity of the randomized comparison as thoroughly as the absence of randomization would.

Section 5.4 — Validation examines the specific protections against outcome assessment bias—the systematic distortion of outcome measurements that occurs when the assessor knows, or can infer, which treatment the patient received. Validation in this context refers to the procedures for ensuring that primary outcomes are assessed by people who do not know the treatment assignment, that assessment procedures are specified in enough detail to prevent interpretation bias, and that deviations from the assessment protocol are documented and handled according to pre-specified rules.

Section 5.5 — Open-Label Designs examines the special challenges of trials in which blinding is not feasible—where the treatment itself is visible, or where it is impractical to conceal the assignment from the patient or the clinician. Open-label designs sacrifice the blinding protection but can preserve other protections through endpoint choice, centralized assessment, and pre-specified analysis strategies. The section asks what is lost when blinding is not possible and what compensating designs can recover.

The system, not the components

The chapter’s sections are organized around the specific tools of bias protection. But the chapter’s argument is about the system.

Randomization without allocation concealment is not randomization in the sense that matters—it is a randomization sequence that has been revealed before it can prevent selection bias. Allocation concealment without randomization protects against selection bias at enrollment but cannot protect against the baseline differences that randomization eliminates. Blinding without randomization protects against assessment bias in a comparison that was not valid to begin with. And validation of outcome assessment does not compensate for a blinding failure that occurred six months earlier.

The system works when all components work together. The failure of any component creates a gap through which bias enters, and the remaining components cannot fully compensate. This is why the bias protection design is not a checklist—check randomization, check blinding, check allocation concealment—but a system design: an examination of how the components interact, where the gaps are, and what happens when each component is tested by the realities of trial conduct.

What this chapter is not about

This chapter is not about statistical adjustments for baseline imbalance. Post-hoc covariate adjustment for observed imbalance—however sophisticated—does not substitute for the design protections that prevent imbalance from occurring in the first place. Adjustment for observed imbalance corrects for known confounders; it cannot correct for unknown confounders that were not measured, which randomization would have controlled for and post-hoc adjustment cannot.

It is also not about missing data in the sense of methods for handling outcomes that were not observed. Missing data handling—multiple imputation, mixed-effects models, pattern mixture models—is an analysis-stage tool that addresses a design-stage problem. The design-stage problem is protecting against the differential loss of patients from each arm in ways that are related to the outcome. If that differential loss is prevented by design—through follow-up procedures, through outcome collection from withdrawals, through endpoint selection that minimizes missing data—the missing data problem is smaller. If it is not prevented, the analysis tools address the symptom but not the cause.

The cause of most missing data problems in clinical trials is a bias protection failure: patients dropped out differentially between arms, or in ways related to their prognosis, because the design did not adequately protect against the forces that drive differential dropout. This chapter is about designing against those forces. Chapter 1, in its discussion of intercurrent event strategy, is about how the design commits to handling what dropout occurs despite those protections.

The question this chapter must answer

Every bias protection decision has a cost. Randomization requires a randomization system. Stratification requires pre-specification and accurate ascertainment of stratification factors. Allocation concealment requires central randomization or sealed envelopes. Blinding requires placebo formulations, sham procedures, or central assessment by blinded assessors. Each of these costs is real, and each involves trade-offs with recruitment feasibility, patient acceptability, trial complexity, and budget.

The question this chapter must answer is not whether to incur these costs—they should generally be incurred—but how to make the trade-offs explicit and defensible. When a design element that protects against bias is omitted, the omission should be acknowledged as a design decision with known consequences, not passed over as an operational necessity. The consequences of the omission—what kinds of bias become possible, how large they might be, and what compensating protections are available—should be documented.

That documentation is not primarily for the regulatory agency, though the agency will scrutinize it. It is for the design team: the acknowledgment of a known vulnerability is the prerequisite for designing a compensating protection. A team that does not acknowledge the vulnerability cannot protect against it.