5.5 Open-Label Designs
When blinding is not available
Blinding is the design’s protection against assessment bias. But blinding is not always achievable. Some treatments are inherently identifiable—a surgical procedure versus no surgery, a device implanted under imaging guidance versus a sham procedure, an intravenous infusion with visible characteristics versus an oral tablet, a behavioral intervention that requires patient engagement versus no intervention. Some patient populations cannot be masked to treatment without deception that ethics committees will not approve. Some comparators have side effect profiles so distinctive that placebo matching is impossible.
In these situations, the trial must be conducted open-label: the patient knows which treatment they are receiving, the clinician knows, and the assessor—if they receive clinical information about the patient—also knows. The blinding protection is absent, and the bias risks that blinding would prevent are real.
The appropriate response to this situation is not to abandon the trial. Open-label trials have produced clinically important evidence in many indications—surgical trials, device trials, behavioral intervention trials, and open-label extensions of blinded trials have all contributed to the evidence base for clinical practice. The appropriate response is to design the trial to compensate for the absent blinding protection, using the tools that remain available.
What open-label designs are at risk for
When the treatment assignment is known, three forms of bias become possible that blinded designs prevent.
Patient-reported outcome bias is the most serious risk in open-label trials where the primary endpoint is patient-reported. Patients who know they are receiving the active treatment expect to improve; their reports of symptoms, function, and quality of life are influenced by that expectation. The placebo response—the improvement that occurs because of the belief that one is receiving a beneficial treatment—is not a measurement error; it is a real biological and psychological phenomenon. But when the comparison between arms includes patients in the active arm who have benefited from both the treatment and the placebo response, and patients in the control arm who have not, the difference between arms reflects both effects. The open-label trial cannot separate them.
The design response is endpoint selection. If a patient-reported primary outcome can be replaced by an objective outcome—a laboratory measurement, an imaging finding, an event adjudicated by a blinded committee from clinical documentation—the placebo response is removed from the primary comparison. The objective outcome may be less sensitive to the treatment’s effect than the patient-reported outcome; this is a cost. It is the cost of trading interpretive ambiguity for bias protection.
Assessment bias occurs when clinicians who know the treatment assignment make clinical decisions—ordering additional tests, adjusting concurrent medications, classifying clinical events—differently across arms. A clinician who believes the patient in the active arm is improving may be more likely to classify an ambiguous event as an adverse effect of the control treatment, or less likely to classify it as an adverse effect of the active treatment. These classification differences are subtle, largely unconscious, and difficult to prevent when the treatment assignment is known and the clinician has strong prior beliefs about the treatment’s effect.
Dropout differential occurs when patients in the control arm, knowing they are not receiving what they believe to be the better treatment, withdraw from the trial at higher rates than patients in the active arm. When dropout is informative—when the patients who drop out are sicker, or less responsive, or more burdened—the control arm from which the dropouts have left is no longer representative of the originally enrolled population. The comparison at the primary endpoint is between the treated arm and a selected control arm survivor population.
Compensating designs for open-label trials
The tools available to compensate for absent blinding vary by the form of bias they address, and each involves a trade-off.
Blinded endpoint adjudication separates the assessment of the primary outcome from the clinical management of the patient. A central adjudication committee reviews clinical documentation—imaging, laboratory results, hospital records—without knowledge of the treatment assignment, and classifies events according to a pre-specified adjudication charter. This mechanism protects against assessment bias for adjudicable endpoints. It does not protect against patient-reported outcome bias, because the patient’s self-report is generated with knowledge of the assignment. And it does not protect against dropout differential, because the dropout has already occurred before the adjudicator reviews the remaining patients’ outcomes.
The effectiveness of blinded adjudication depends on the degree to which the clinical documentation itself reveals the treatment assignment. Imaging reports that include description of treatment-related changes, hospital notes that mention the patient’s treatment, and laboratory values that are characteristic of the active treatment can unblind the adjudicator through the documentation they review. The adjudication process must be designed to remove or redact this information from the documentation reviewed by the committee—a more operationally demanding requirement than is usually anticipated.
Objective endpoints replace subjective clinical assessments with measurements that do not require clinical judgment. A trial of a treatment for arrhythmia can use a device-recorded continuous ECG as the primary endpoint rather than a clinician’s classification of arrhythmia severity; a trial of a treatment for bone density loss can use DEXA scan measurement rather than a clinician’s assessment of fracture risk. Objective endpoints are not influenced by the assessor’s knowledge of the treatment assignment, because the measurement is automated rather than judged.
The limitation is that objective endpoints may not capture the treatment effects that matter most to patients. A treatment that reduces arrhythmia burden on a continuous ECG but does not reduce symptoms may or may not improve quality of life; the ECG endpoint measures the mechanism, not the consequence. In open-label trials where symptoms are the primary concern, the conflict between bias protection and clinical relevance is not easily resolved by endpoint substitution.
Placebo-run-in and enrichment are sometimes used to reduce dropout differential by enrolling only patients who demonstrate the willingness and ability to adhere to the trial regimen before randomization. A run-in period in which all patients receive placebo or a standardized treatment, and only those who complete the run-in are randomized, reduces the dropout differential in the randomized phase at the cost of a more selected, less generalizable enrolled population. This is an enrichment strategy that changes the estimand—the trial is now estimating the treatment effect in the population that completed the run-in, not in the population initially eligible—and the change must be reflected in the design documentation.
The estimand for open-label trials
The estimand is particularly consequential for open-label designs because the choice of intercurrent event strategy determines what the trial is claiming to show about a treatment whose benefits and risks cannot be separated from the patient’s knowledge of receiving it.
Under a treatment policy estimand, the open-label trial estimates the effect of receiving the treatment in a clinical context where patients and clinicians know the assignment—which is the natural context of clinical use. The result includes the placebo component of the benefit, which may be a legitimate part of the treatment’s clinical value if it is present in practice. This is defensible but requires honesty: the treatment policy estimand from an open-label trial does not separate the biological effect from the knowledge effect, and the label should acknowledge this.
Under a hypothetical estimand—what would the treatment effect be if patients did not know which treatment they were receiving?—the open-label trial cannot produce a direct estimate. The hypothetical estimand requires a counterfactual condition that the open-label trial did not implement. Estimating it requires modeling assumptions about the size of the knowledge-related component of the benefit, which cannot be validated from the trial’s own data.
This is why endpoint choice is so important in open-label trials: objective endpoints, unlike subjective ones, produce outcomes that are less influenced by the knowledge of assignment, and therefore closer to the hypothetical estimand. If the trial’s intended claim is about the biological effect of the treatment rather than its contextual effect including the patient’s knowledge, the primary endpoint should be as objective as possible.
Open-label extensions
A specific and common form of open-label data arises in the open-label extension phase that follows a blinded controlled trial. In these extensions, patients who completed the blinded phase are offered continued treatment, usually open-label, to generate long-term safety and effectiveness data.
Open-label extension data have a specific vulnerability: the patients who enter the extension are not representative of the patients who were randomized. Patients who completed the blinded phase and chose to enter the extension are patients who tolerated the treatment well enough to complete the controlled phase and who were willing to continue. Patients who dropped out during the blinded phase—often for tolerability reasons or lack of response—are not in the extension. The extension population is therefore a selected survivor population, and estimates of long-term safety and effectiveness derived from it may be systematically favorable.
This selection bias is intrinsic to open-label extension designs and cannot be removed by analysis. The appropriate response is to acknowledge it in the design and in the reporting, and to be explicit about what the extension data can and cannot show. Extension data are appropriate evidence for characterizing the safety profile in long-term users, for examining whether the treatment effect is durable, and for informing dosing decisions in patients who tolerate the treatment well. They are not appropriate evidence for estimating the treatment effect in the full enrolled population or for making claims about what would happen if all patients who started the treatment continued it.
What this section demands before proceeding
The decision to use an open-label design—when blinding is not feasible—must be documented as a design choice with known consequences, not as an operational necessity that escaped scrutiny. The documentation should identify which forms of bias the open-label design introduces, what compensating design elements are used to address each form, and what residual bias remains after the compensating elements are in place.
The compensating elements must be specified before enrollment begins. Blinded adjudication requires a finalized adjudication charter before the first event; central assessment by blinded assessors requires the assessors to be identified and trained before enrollment; objective endpoint selection requires the endpoint to be pre-specified in the estimand and the protocol.
And the estimand must reflect the open-label context. If the treatment policy estimand is used—estimating the effect of receiving the treatment in the context where the assignment is known—the result should be interpreted as inclusive of the knowledge component of the benefit. If the endpoint choice is designed to minimize the knowledge component, this should be stated as a design intent in the protocol.
An open-label trial is not a compromised trial. It is a trial that has confronted a specific set of bias risks honestly and designed to manage them explicitly. The management is imperfect—some bias remains—but the imperfection is known and documented, and the result is interpretable within the limits that the design acknowledges.
References: Wood et al., “Empirical Evidence of Bias in Treatment Effect Estimates in Controlled Trials with Different Interventions and Outcomes,” BMJ 2008; Hrobjartsson et al., “Observer Bias in Randomised Clinical Trials with Binary Outcomes,” BMJ 2012; Noseworthy et al., “The Impact of Blinding on the Results of a Randomized, Placebo-Controlled Multiple Sclerosis Clinical Trial,” Neurology 1994; Boutron et al., “Reporting and Interpretation of Randomized Controlled Trials with Statistically Nonsignificant Results for Primary Outcomes,” JAMA 2010.