Module 4 - Assessing the Risk of Bias in Clinical Trials

Assessing Risk of Bias in Systematic Reviews and Meta-Analyses

Systematic reviews and meta-analyses aim to synthesize evidence from multiple studies to provide a comprehensive answer to a research question. A critical step in this process is assessing the risk of bias in the individual studies included. This assessment, often corresponding to steps nine and ten of a systematic review, involves developing forms for bias assessment and data extraction, followed by the assessment itself.

Why Assess Bias in Individual Studies?

The primary reason for assessing bias is to ensure the inclusion of high-quality studies. When conducting a systematic review and meta-analysis, the goal is to summarize findings from studies with a low risk of bias in their methodologies. This focus on “risk of bias” rather than “study quality” is deliberate.

Risk of Bias vs. Study Quality: While some may equate “risk of bias” with “study quality,” the latter term can be misleading. “Study quality” refers to the methods used to conduct a study, which is often unobservable. What we can assess from a journal article or public database (like ClinicalTrials.gov) is how well the study was reported. Therefore, our assessment focuses on the internal validity of a study, which reflects the minimization of bias.

Impact of Bias on Summary Results: Bias in individual studies can significantly impact the summary results of a meta-analysis. If biases exist, the quantitative findings from the meta-analysis may be skewed. We must assume that bias could affect our results, hence the rigorous attention to identifying and mitigating it.

Understanding Assessable and Unassessable Elements of Study Conduct

While we aim to assess the internal validity of a study, much of the actual conduct remains hidden:

Assessable Elements (via Study Report):
- Internal validity (minimization of bias).
- External validity (generalizability or applicability of findings).
- Relevance and originality.
- Adherence to ethical constraints (e.g., IRB approval).
Unassessable Elements (from Study Report):
- Protocol violations.
- Quality of record-keeping.
- Accuracy of forms.
- Frequency of data entry errors.
- Fidelity of study procedures (e.g., whether visual acuity was measured exactly as reported).
- Whether the report accurately reflects the actual study conduct.
- Falsification or fabrication of data.

Our assessment of bias primarily concerns the elements of internal validity that can be inferred from the study report.

Key Types of Bias in Intervention Studies

Three main types of bias are typically assessed in intervention studies:

Selection Bias: This refers to bias in how treatment was assigned. It primarily encompasses two aspects:
- Random Sequence Generation:
  - Purpose: Ensures that intervention groups are comparable by accounting for both known and unknown confounding variables. In sufficiently large studies, randomization theoretically prevents systematic differences between groups, thus avoiding differential selection (selection bias).
  - Historical Evidence: Research dating back to 1983 (e.g., comparing coronary artery bypass surgery in randomized trials vs. quasi-experimental/observational studies) consistently shows that randomization reduces the observed effect size, suggesting selection bias is at play in non-randomized designs. While observational studies and randomized trials may differ in effect estimates, there’s no consistent direction or “multiplier” that can be applied to account for this.
  - Methods with Low Risk of Bias (Unpredictable Assignment):
    - Random numbers table.
    - Computer random number generator.
    - Stratified or block randomization.
    - Coin toss (if properly conducted without cheating).
  - Methods with High Risk of Bias (Predictable or Non-Random Assignment):
    - Quasi-random (Predictable):
      - Assignment by date of birth (even/odd year).
      - Day of visit.
      - Patient ID (odd/even number).
      - Alternation.
    - Non-random:
      - Patient, participant, or clinician choice. (e.g., placing healthier patients in a new treatment group due to human nature).
- Allocation Concealment:
  - Purpose: Prevents those enrolling participants from knowing the upcoming assignment, thus protecting against selection bias at the point of allocation. It is distinct from masking/blinding.
  - Example of Inadequate Concealment: A visible list of randomized assignments (e.g., C, C, D, C…) allows researchers to predict the next assignment, potentially leading to manipulation (e.g., switching patient order to ensure a desired assignment).
  - Impact: Studies have shown that inadequate or unclear allocation concealment tends to lead to more favorable effects for the experimental treatment. This subversion of randomization is a significant concern. The impact is particularly pronounced for subjective outcomes rather than objective ones like all-cause mortality.
  - Methods with Low Risk of Bias:
    - Central allocation (e.g., telephone, internet, central pharmacy).
    - Sequentially numbered opaque envelopes.
    - Sequentially numbered identical drug containers.
  - Methods with High Risk of Bias:
    - Predictable sequence (e.g., known to staff in advance based on day of week).
    - Lacking safeguards (e.g., non-opaque envelopes that can be held up to light).
    - Any non-random sequence.
Information Bias (Masking/Blinding): This refers to systematic error in measurement or classification of information, leading to inaccurate data collection.
- Purpose: To ensure that all recorded information is accurate and unbiased. In randomized trials, masking prevents differential expectations and behaviors among participants, providers, and outcome assessors that could influence reported outcomes. For example, a patient knowing they receive placebo might report more side effects, or an unmasked assessor might subtly influence an outcome measurement.
- Who can be masked:
  - The patient/participant.
  - The healthcare provider/person giving care.
  - The outcome assessor.
- Challenges in Reporting: Terms like “single-masked” or “double-masked” are often vague and do not clearly specify who was masked. It’s also difficult to tell if masking was broken or if guessing was possible (e.g., due to different tastes/appearances of interventions).
- Situations Where Masking is Difficult/Impossible: Some studies, such as surgical trials where one group receives surgery and another does not, make it impossible to mask the provider or patient. However, it may still be possible to mask the outcome assessors (e.g., for eye surgery outcomes after scars are no longer visible). The feasibility of masking should be considered.
- Low Risk of Bias:
  - Masking was adequately described, and it’s unlikely to have been broken.
  - Incomplete masking, but the outcome is unlikely to be influenced (e.g., all-cause mortality in an unmasked surgical study).
- High Risk of Bias:
  - Not masked/blinded.
  - Masking was easily broken or compromised.
  - The outcome is likely to be influenced by lack of masking (e.g., pain in an unmasked study).
- Impact: Studies without double-blinding tend to show a more favorable effect for the experimental treatment. Similar to allocation concealment, this effect is more pronounced for subjective outcomes (e.g., quality of life, patient-reported outcomes) than for objective ones like all-cause mortality.
Bias in the Analysis: This category addresses factors within the study’s analysis that can introduce bias.
- Losses to Follow-up: High rates of participants dropping out can bias results if those who drop out differ systematically from those who remain. There’s no simple rule for an acceptable percentage (e.g., 10%, 25%, 50% are all concerning depending on the context).
- Non-compliance and Withdrawals: Participants who do not adhere to their assigned treatment or withdraw entirely can also bias the analysis, especially if they are simply removed from the analysis.
- Changes in Outcome Measure / Selective Outcome Reporting:
  - Post-hoc Outcome Definition: This occurs when investigators change the primary outcome or the time point of measurement after seeing the study results. For instance, selecting one pain measure over another because it shows a statistically significant difference.
  - Type 1 Error: Specifying too many outcomes or changing them post-hoc increases the risk of a Type 1 error (finding a statistically significant difference when none truly exists) by chance alone.
  - Evidence: A 2011 analysis of ClinicalTrials.gov data by Deborah Zarin showed a wide range in the number of primary (1-71) and secondary (0-122) outcome “domains” specified per trial, far exceeding what would be considered appropriate. Moreover, outcome descriptions were often vague (e.g., “anxiety” instead of “proportion of participants with a change of greater than or equal to 11 points at 12 weeks from baseline on the Hamilton Anxiety Rating Scale”).
  - Desired Outcome Detail: To assess potential bias and combine data in a meta-analysis, systematic reviewers need detailed information on outcomes, including:
    - Domain: The broad outcome (e.g., anxiety, visual acuity, pain).
    - How it was measured: The specific instrument (e.g., Beck Anxiety Inventory).
    - Metric: The type of measurement (e.g., change from baseline, time to event, n value).
    - Aggregation: How data are summarized (e.g., categorical vs. continuous, proportion vs. mean).
    - Time point: When the outcome was measured (e.g., 12 weeks, 1 year).

Displaying and Managing Risk of Bias in Systematic Reviews

AVOID QUALITY SCORES: It is critical to never use quality scores (e.g., assigning a numerical score from 1-4 based on meeting certain criteria). These scores are unreliable, not validated, and their meaning is arbitrary.
Component-Based Assessment: The recommended approach is to report the assessment of each individual risk of bias component. Tools like the Cochrane Risk of Bias Summary display how each study was assessed for specific elements (e.g., sequence generation, allocation concealment, blinding, incomplete outcome data). This allows readers to understand the specific strengths and weaknesses of each included study.
Sensitivity Analysis: To test the robustness of the meta-analysis results, perform sensitivity analyses. This involves re-running the analysis after excluding studies based on certain risk of bias criteria (e.g., excluding studies at high risk of bias, studies with more than 15% loss to follow-up, or unpublished studies). If the conclusions remain consistent, it strengthens confidence in the findings.

Reporting Guidelines for Primary Studies

The quality of risk of bias assessment in a systematic review heavily relies on the transparency of reporting in the primary studies. Two key guidelines promote this:

CONSORT Statement: (Consolidated Standards of Reporting Trials) - Provides guidelines for reporting parallel-group randomized trials, including the crucial requirement for a flow diagram illustrating patient progression (randomized, received treatment, lost to follow-up, included in analysis).
STROBE Statement: (STrengthening the Reporting of OBservational studies in Epidemiology) - Offers detailed guidelines for reporting observational studies.

Adherence to these statements, while not guaranteeing flawless study conduct, significantly improves the ability of systematic reviewers to assess the risk of bias.

Conclusion

While precisely defining “study quality” remains challenging, assessing the risk of bias within individual studies is paramount for ensuring the legitimacy and validity of systematic reviews and meta-analyses. We cannot truly know how a study was carried out, only how it was reported. However, clues from design elements, particularly those related to internal validity, are crucial.

By reporting the assessment of individual bias components (e.g., how treatment assignment was done, if it was concealed, who was masked, how comparison groups and outcomes were defined) rather than using unreliable quality scores, systematic reviews can transparently account for potential biases. This diligent assessment helps avoid the “garbage in, garbage out” problem, ensuring that the synthesis reflects robust evidence and allows for appropriate interpretation of the findings. This field is constantly evolving, emphasizing its critical role in evidence-based research.

Core Concepts

Risk of Bias: The potential for a study’s methods to lead to a systematically distorted estimate of an intervention’s effect.
Selection Bias: Bias that arises from how participants are assigned to intervention groups, leading to systematic differences between groups.
Random Sequence Generation: The process of producing an unpredictable sequence for allocating participants to intervention groups in a trial.
Allocation Concealment: The procedure that prevents those involved in a trial from knowing which intervention a participant will be allocated to before they are enrolled.
Information Bias (Masking/Blinding): Bias arising from systematic differences in the way information is collected, measured, or recorded between intervention groups.
Bias in Analysis: Bias that can occur in a study due to issues in how data is handled or outcomes are reported, such as missing data, non-compliance, or changes in outcome measures post-hoc.
Quality Scores: Numerical summaries used historically to assess the methodological rigor or quality of studies, but are now discouraged in systematic reviews due to their lack of validity and reliability.

Concept Details and Examples

Risk of Bias

Explanation: The potential for methodological flaws in a study to systematically distort its results, leading to an over- or under-estimation of an intervention’s true effect. It’s distinct from general ‘study quality,’ which is broader, and focuses specifically on methodological design and conduct that could introduce systematic error. Assessing risk of bias helps determine the trustworthiness of individual study findings. Examples:

A clinical trial comparing two drugs where participants know which drug they are receiving might overestimate the effectiveness of the new drug due to participant expectations (information bias).
A study where researchers can predict the next treatment allocation might preferentially assign healthier patients to the experimental group, making the new treatment appear more effective than it is (selection bias). Common Pitfalls/Misconceptions: Often confused with ‘study quality’; it specifically refers to methodological flaws that could lead to systematic error, not general adherence to good research practices or ethical conduct.

Selection Bias

Explanation: Occurs when there are systematic differences between the baseline characteristics of the groups being compared. This often arises from inadequate generation of the randomization sequence or lack of allocation concealment, allowing those involved in assigning participants to influence group composition. It can lead to an imbalance in known and unknown confounding variables, distorting the observed effect. Examples:

A study assigning patients to treatment A on even days and treatment B on odd days. A researcher could wait for a patient with a specific characteristic (e.g., severe symptoms) to enroll on a day that gives them their preferred treatment.
If a new drug is being tested, and clinicians decide which patients receive it based on their perceived likelihood of success, healthier patients might disproportionately end up in the new drug group. Common Pitfalls/Misconceptions: Often confused with allocation concealment, but selection bias is the result of inadequate random sequence generation and/or allocation concealment.

Random Sequence Generation

Explanation: This is the process used to create an unpredictable sequence of allocations to interventions, ensuring that each participant has an equal chance of being assigned to any of the groups. Proper random sequence generation minimizes selection bias by preventing foreknowledge of treatment assignments. It’s ideally performed by a computer or through methods like random number tables. Examples:

Using a computer program to generate a list of “A” or “B” assignments for 200 participants before the trial begins, ensuring a truly random order.
Flipping a coin for each participant to decide group assignment, assuming no cheating occurs. Common Pitfalls/Misconceptions: Misconception that “alternation” (e.g., every other patient) or assignment based on birth dates or clinic visit days constitutes true randomization; these are predictable and prone to selection bias.

Allocation Concealment

Explanation: Refers to the procedures that prevent study investigators and participants from knowing the upcoming assignment of a participant to an intervention group until the participant has been irrevocably enrolled in the study. It protects the integrity of the random sequence generation and prevents selection bias by ensuring that assignment cannot be influenced by any knowledge of the next allocation. Examples:

Using a central pharmacy or coordinating center where the next treatment assignment is only revealed via phone or a secure web system after a participant is confirmed eligible and enrolled.
Sequentially numbered, opaque, sealed envelopes where the next assignment is only opened after the participant has consented and met all inclusion criteria. Common Pitfalls/Misconceptions: Often confused with “blinding” or “masking.” Allocation concealment happens before intervention assignment, preventing selection bias. Blinding happens after assignment, preventing information bias.

Information Bias (Masking/Blinding)

Explanation: Arises from systematic differences in the way data on outcomes or exposures are obtained from study participants, researchers, or outcome assessors. Masking (or blinding) aims to prevent this by keeping participants, healthcare providers, and outcome assessors unaware of the assigned interventions. This prevents their knowledge from influencing their behavior, reporting, or assessment of outcomes. Examples:

In a pain relief study, if a patient knows they are receiving a placebo, they might be more likely to report a less favorable pain outcome than if they were unaware (patient masking).
An outcome assessor measuring visual acuity in an eye surgery trial, if unmasked, might push a patient harder on the eye chart if they know the patient received the control treatment (outcome assessor masking). Common Pitfalls/Misconceptions: Stating “single-masked” or “double-masked” is vague; it’s crucial to specify who was masked (e.g., participant, provider, outcome assessor). Sometimes masking is not feasible (e.g., surgery), but efforts should still be made to mask outcome assessors.

Bias in Analysis

Explanation: Refers to systematic errors introduced during the statistical analysis phase of a study, often due to issues like incomplete outcome data, non-compliance with the assigned intervention, or post-hoc changes to outcome measures. Proper handling of these issues, such as using intention-to-treat analysis for randomized trials and pre-specifying outcomes, is crucial to maintain the integrity of results. Examples:

Losses to follow-up: If patients who experience worse outcomes in the experimental group disproportionately drop out, the analysis might show an artificially positive effect for that group.
Selective outcome reporting: An investigator measures five different depression scales but only reports the one that shows a statistically significant improvement for the intervention group, even if it wasn’t the pre-specified primary outcome. Common Pitfalls/Misconceptions: Misconception that balanced missing data or similar baseline characteristics of dropouts guarantee no bias; the direction and magnitude of bias from missing data are inherently unknown.

Quality Scores

Explanation: Numeric scales or checklists that assign a single summary score to a study, ostensibly reflecting its methodological quality or risk of bias. These scores are highly discouraged in modern systematic reviews because they are often not validated, lack transparency regarding weighting of different elements, and fail to provide actionable insights into specific biases present. Examples:

A scale that assigns 1 point for randomization, 1 for blinding, 1 for allocation concealment, summing to a total score out of 3 or 4.
A specialized scale for observational studies that assigns points for sample size, response rate, and statistical adjustment, leading to a “quality rating.” Common Pitfalls/Misconceptions: Lack of validity (what does a score of 7 out of 10 truly mean?), poor reliability (different reviewers might assign different scores), and obscuring specific biases by rolling them into a single number. The recommendation is to report individual risk of bias components.

Application Scenario

A researcher is conducting a systematic review to determine the effectiveness of a new behavioral therapy for chronic back pain compared to standard physical therapy. They plan to include both randomized controlled trials and prospective cohort studies.

The researcher will apply the lesson’s concepts by critically assessing each included study for its risk of bias. For RCTs, they will evaluate random sequence generation and allocation concealment (selection bias), and masking of participants, providers, and outcome assessors (information bias), alongside checking for intention-to-treat analysis and pre-specified outcomes (bias in analysis). For observational studies, they will consider how exposure and outcomes were defined and measured to identify potential information bias, and how comparison groups were selected to identify selection bias. Finally, instead of using a quality score, they will present a detailed table showing the assessment of each bias domain for every study, performing sensitivity analyses based on high-risk of bias studies.

Quiz

Multiple Choice: Which of the following is not a primary type of bias discussed in the context of individual studies for systematic reviews? a) Selection bias b) Publication bias c) Information bias d) Bias in analysis
True/False: Allocation concealment refers to preventing participants and researchers from knowing which intervention was assigned after enrollment.
Short Answer: Why are “quality scores” for studies no longer recommended in systematic reviews?
Scenario-based: In a clinical trial for a new antidepressant, patients are assigned to either the drug or placebo group based on their medical record number (even numbers get drug, odd numbers get placebo). What type of bias is this most likely to introduce, and why?
Short Answer: Name two groups of people who should ideally be “masked” or “blinded” in a randomized clinical trial to prevent information bias.

ANSWERS---

Correct Answer: b) Publication bias (Publication bias refers to the selective publication of studies, often based on the direction or strength of their findings, which is a bias at the review level, not within an individual study’s methods.)
Correct Answer: False. (Allocation concealment prevents knowledge of the upcoming assignment before enrollment, ensuring the integrity of randomization and preventing selection bias. Blinding/masking happens after enrollment to prevent information bias.)
Correct Answer: Quality scores are problematic because they are often not validated, lack reliability (different reviewers might assign different scores), and obscure specific biases by rolling them into a single, uninterpretable number. They don’t provide clear actionable insights into the methodological flaws.
Correct Answer: This method is likely to introduce selection bias. While seemingly systematic, it is predictable. Researchers or clinicians could potentially manipulate patient enrollment (e.g., delay enrolling a patient until their medical record number allows assignment to a preferred group) based on their characteristics, leading to systematic differences between the treatment groups that are not due to the intervention itself.
Correct Answer: Two groups that should be masked are:
1. Participants (patients): To prevent their expectations or behaviors from influencing reported outcomes.
2. Outcome Assessors: To prevent their knowledge of the intervention from influencing how they collect or interpret outcome data. (Other correct answers include: Healthcare providers/Caregivers)

Tools
Radar
Test
Toolkit

Community
X
Discord
YouTube
GitHub