Source: Journal of Evaluation in Clinical Practice
Originally posted on Quora May 25, 2023
Three biases and the data required to assess them in real-world studies.
CASE-COUNTING WINDOW BIAS
The pivotal covid-19 vaccine trials used a primary endpoint of lab-confirmed, symptomatic covid-19.8-11 Not all covid cases, however, factored into the estimate of vaccine efficacy.Investigators did not begin counting cases until participants were at least 14 days (7 days for Pfizer) past completion of the dosing regimen, a timepoint public health officials subsequently termed “fully vaccinated.”12 The rationale for excluding cases occurring before the start of this “case-counting window” was not provided in trial protocols–and legitimacy of excluding post-randomization events has long been debated13—however, one Pfizer post-marketing document states that in the early period post-vaccination, “the vaccine has not had sufficient time to stimulate the immune system.”14
In randomised trials, applying the “fully vaccinated” case counting window to both vaccine and placebo arms is easy. But in cohort studies, the case-counting window is only applied to the vaccinated group. Because unvaccinated people do not take placebo shots, counting 14 days after the second shot is simply inoperable. This asymmetry, in which the case-counting window nullifies cases in the vaccinated group but not in the unvaccinated group, biases estimates. As a result, a completely ineffective vaccine can appear substantially effective—48% effective in the example shown in Table 1. (The placebo data in Table 1 comes from the Pfizer Phase III randomised trial, and is the assumed case counts for the unvaccinated group in a counterfactual observational study occurring simultaneously; this setup illustrates the potential size of a case-counting window bias in a real-world setting as well as why this bias does not exist in a randomised trial.).
Table 1. How the asymmetric application of case counting windows can bias observational studies.
- a Hypothetical vaccine is assumed to have zero efficacy.
- b The placebo case count includes cases reported up to Day 84 in Pfizer's pivotal vaccine trial. This number, and the (unequal) number of participants at risk, are taken from the actual reported trial results (see Pfizer's cumulative incidence graph, p. 30).15 The vaccine case rate is fictitious and matches the placebo rate to carry out calculations on a hypothetical vaccine with zero efficacy. We follow the standard convention of the 14-day case counting window, which translates to Day 36 (14 days following Dose 2, which is given 21 days after Dose 1)./m.p
We are aware of just one observational study that addressed case-counting window bias, by using matching and designating a pseudo-study enrolment date for the unvaccinated party in each matched pair of vaccinated and unvaccinated persons. While matching mitigates case-counting window bias, this method injects an artificial and severe age bias between unvaccinated and vaccinated groups: the matched subset underrepresented patients ≥ 70 years by 50% while over-representing patients ≤ 40 years by 50%. (This occurred because the propensity to receive the vaccine is highly influenced by age. Therefore, the number of one-to-one matched pairs of elderly patients is upper bounded by the number of unvaccinated elderly while the number of one-to-one matched pairs of younger patients is upper bounded by the number of vaccinated young.).
In retrospective studies using large population samples, we propose a simple adjustment that can correct for case-counting window bias. The case rate from vaccination to the start of the case-counting window can be observed from the vaccinated group and applied to the unvaccinated group to estimate the number of cases to be excluded before computing the relative ratio of cases. This adjustment preserves the case-counting window, while assuming the vaccine is completely ineffective before its start. Because we use the 0% efficacy assumption, this simple adjustment returns the vaccine effectiveness estimate back to zero. A similar strategy has proved useful in influenza treatment analyses.
AGE BIAS
Age is perhaps the most influential risk factor in medicine, affecting nearly every health outcome. Thus, great care must be taken in studies comparing vaccinated and unvaccinated to ensure that the groups are balanced by age. Failure to do so may lead to inaccurate estimates of vaccine effectiveness when the difference in outcomes can be explained, at least partially, by age bias.
In trials, randomisation helps ensure statistically identical age distributions in vaccinated and unvaccinated groups, so that the average vaccine efficacy estimate is unbiased, even if vaccine efficacy and/or infection rates differ across age groups (see Figure 2A).
However, unlike trials, in real life, vaccination status is not randomly assigned (see Figure 2B). While vaccination rates are high in many countries, the vaccinated remain, on average, older and less healthy than the unvaccinated because vaccines were prioritised for those older and at higher risk. Individuals also self-select for vaccination regardless of policy.
Because covid-19 related risks (of infection, disease, and complications) also vary by age, this can confound the estimate of vaccine effectiveness. To illustrate this, consider the REACT-1 study.18 This study conducts PCR testing for severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) on a random sample of England's population once a month. In June–July 2021 (the most recent data available), SARS-CoV-2 positivity rates varied considerably by age (from 1.7 to 15.6 positives per 1000 individuals), with higher rates among people under 25 years of age (see Figure 2C).
REACT-1 also reports vaccination status. As seen in Figure 2B, almost half of the unvaccinated group is aged between 5 and 12, while the most common age group in the vaccinated was 45–54 years old. While details differ, age bias is present in all observational data sets.
To understand the impact of age bias, consider a hypothetical vaccine with zero efficacy. The vaccinated and unvaccinated groups’ case rates should be statistically identical if the vaccine were completely ineffective (Figure 2D). But age bias in observational data alters the age-weighted case rates in both the vaccinated and the unvaccinated groups, resulting in different infection rates by vaccination status. Since older people recorded lower infection rates, the age-weighted case rate of the (older) vaccinated group registered at 5.5 per 1000 while the corresponding value for the (younger) unvaccinated group was 11.2 per 1000 (Figure 2C). The resultant vaccine effectiveness, which is the relative ratio of these case rates, reflects the interaction between differential age distributions and the correlation of covid-19 incidence with age. The vaccine effectiveness appears as 51% even though the vaccine is completely ineffective by assumption. (Note that the direction of the age bias would reverse if older age groups had suffered higher case rates during the study period.).
A viable adjustment method for this instance of Simpson's paradox19 induced by age bias should shift 51% back to zero. Simpson's paradox describes the condition in which aggregated and disaggregated analyses of the same data lead to contradictory findings, a common phenomenon in real-world data. Many observational studies incorporate an age term into regression models in an attempt to correct this age bias.4, 20, 21 But it has been discovered in a meta-analysis of influenza vaccine studies that standard regression adjustments insufficiently correct for the variety and magnitude of biases.22
BACKGROUND INFECTION RATE BIAS
From December 2020, the speedy dissemination of vaccines, particularly in wealthier nations (Figure 1), coincided with a period of plunging infection rates. However, accurately determining the contribution of vaccines to this decline is far from straightforward. Indeed, the considerable variation in case decline by country, such as the time lag observed in Israel—by far the quickest to reach 50% vaccinated relative to the UK and the United States—defies simple explanation (Figure 1, timepoint “B”). The sharp drop in infections complicates estimating vaccine effectiveness from observational data in a manner similar to age bias. The risk of virus exposure was considerably higher in January than in April. Thus exposure time was not balanced between unvaccinated and vaccinated individuals. Exposure time for the unvaccinated group was heavily weighted towards the early months of 2021 while the inverse pattern was observed in the vaccinated group. This imbalance is inescapable in the real world due to the timing of vaccination rollout.
In addition, unlike trials, individuals in “real-world” studies do not stay in a single analysis subgroup throughout the study period: each person is unvaccinated on the first day of the study until the day of vaccination (or the end of the study should the person remain unvaccinated). Instead of crudely categorising individuals as either “vaccinated” or “unvaccinated,” many observational studies split each person's exposure time into an unvaccinated period followed by a vaccinated period if the individual got vaccinated.4-6 This technique is essential in contexts where the vast majority of the population becomes vaccinated, to avoid losing a comparison population. However, this procedure injects a strong bias into the analysis subgroups because the unvaccinated exposure time is heavily skewed to the early period in a study while the exposure time for vaccinated people skews towards the end of the study period.
For a hypothetical vaccine with zero efficacy, the case rates for vaccinated and unvaccinated should be equal during each week of the study period. Indeed in RCTs, changes in background infection rate do not bias estimates of vaccine efficacy because by design, vaccine and placebo arms follow a synchronized dosing schedule that ensures exposure (at-risk) time is balanced, even in the context of changing infection rates.
But background infection rate bias can cause estimates of vaccine efficacy in “real world” studies to vary widely from 0%. For example, using infection rate data from an actual observational study of Danish nursing home residents,20 where infection rates rapidly declined simultaneous with vaccine rollout (from 12 per 1000 residents in December 2020, to almost 0 during the last 2 weeks of the study),20 vaccine effectiveness of a hypothetically ineffective vaccine appears as 67%, an illusion chiefly created because unvaccinated people were preferentially exposed to the earlier weeks of higher background infection rates (Figure 3). We note that the direction of this bias would reverse if the background infection rate were to have steadily risen during the study period (i.e., vaccinating into a wave rather than out of one).
The Danish study was one of the first “real-world” studies to recognise this background infection rate bias. The researchers added a “calendar time” adjustment term to their Cox regression model to address this bias, which reduced their estimate of vaccine effectiveness from 96% to 64%.20 However, as with age bias, we believe that regression adjustment is unlikely to sufficiently cure this type of imbalance. Because the regression equation was not published, we could not make a more definitive assessment.