Vaccine Effectiveness Simulation

Assumptions

From the New York Times: New Pfizer Results: Coronavirus Vaccine Is Safe and 95% Effective.

n <- 4.4e4  # number of volunteers
r_c <- 162  # number of events in control
r_t <- 8    # number of events in vaccine group

NYT reports a 44 thousand person trial with half of the people going to treatment and half to control. They further report that 162 people developed COVID in the control group and 8 were in the vaccine group.

What is the probability that the vaccine is effective and what is the uncertainty in that probability? The Pfizer protocol defines vaccine effectiveness as follows:

\[ \text{VE} = 1 - \frac{p_{t}}{p_{c}} \] Here \(p_{t}\) is infection rate in vaccinated group and \(p_{c}\) is the rate in the control group.

Model

Also, let’s assume that we have no prior beliefs about the effectiveness rate and so our model is as follows: \[ \begin{align} p_{c} \sim \textsf{beta}(1, 1) \\ p_{t} \sim \textsf{beta}(1, 1) \\ y_{c} \sim \textsf{binomial}(n_{c},p_{c}) \\ y_{t} \sim \textsf{binomial}(n_{t},p_{t}) \\ \end{align} \] The treatment effect and \(VE\) can be computed directly from this model.

\[ \begin{align} \text{effect} = p_{t} - p_{c} \\ \text{VE} = 1 - \frac{p_{t}}{p_{c}} \end{align} \]

The effect will have a distribution and to get the probability of the effect (this is different from VE), we sum up the negative area. For this problem we do not need Stan, but I am including it here to show how easy it is to specify this model, once we write down the math above.

data {
  int<lower=1> r_c; // num events, control
  int<lower=1> r_t; // num events, treatment
  int<lower=1> n_c; // num cases, control
  int<lower=1> n_t; // num cases, treatment
  int<lower=1> a;   // prior a for beta(a, b)
  int<lower=1> b;   // prior b for beta(a, b)
}
parameters {
  real<lower=0, upper=1> p_c; // binomial p for control
  real<lower=0, upper=1> p_t; // binomial p for treatment 
}
model {
  p_c ~ beta(a, b); // prior for control
  p_t ~ beta(a, b); // prior for treatment
  r_c ~ binomial(n_c, p_c); // likelihood for control
  r_t ~ binomial(n_t, p_t); // likelihood for treatment
}
generated quantities {
  real effect   = p_t - p_c;      // treatment effect
  real VE       = 1 - p_t / p_c;  // vaccine effectiveness
  real log_odds = log(p_t / (1 - p_t)) -
                  log(p_c / (1 - p_c));
}

Running the model

Let’s run this model from R and make a few plots.

library(cmdstanr)
library(posterior)
library(ggplot2)

# first we get the data ready for Stan
d <- list(r_c = r_c, r_t = r_t, n_c = n/2, 
          n_t = n/2, a = 1, b = 1) # beta(1,1) -> uniform prior

# compile the model
mod <- cmdstan_model("vaccine.stan")

## Model executable is up to date!

# fit the model with MCMC
fit <- mod$sample(
  data = d,
  seed = 123,
  chains = 4,
  parallel_chains = 4,
  refresh = 0
)

## Running MCMC with 4 parallel chains...
## 
## Chain 1 finished in 0.2 seconds.
## Chain 2 finished in 0.2 seconds.
## Chain 3 finished in 0.2 seconds.
## Chain 4 finished in 0.2 seconds.
## 
## All 4 chains finished successfully.
## Mean chain execution time: 0.2 seconds.
## Total execution time: 0.3 seconds.

# parameter summary
fit$summary()

## # A tibble: 6 x 10
##   variable     mean   median      sd     mad       q5      q95  rhat ess_bulk
##   <chr>       <dbl>    <dbl>   <dbl>   <dbl>    <dbl>    <dbl> <dbl>    <dbl>
## 1 lp__     -1.04e+3 -1.04e+3 1.03e+0 7.04e-1 -1.04e+3 -1.04e+3  1.00    1645.
## 2 p_c       7.40e-3  7.40e-3 5.78e-4 5.86e-4  6.48e-3  8.40e-3  1.00    3209.
## 3 p_t       4.06e-4  3.91e-4 1.37e-4 1.32e-4  2.08e-4  6.54e-4  1.00    3137.
## 4 effect   -7.00e-3 -6.99e-3 5.95e-4 6.03e-4 -8.01e-3 -6.05e-3  1.00    3217.
## 5 VE        9.45e-1  9.47e-1 1.92e-2 1.84e-2  9.10e-1  9.72e-1  1.00    3124.
## 6 log_odds -2.97e+0 -2.95e+0 3.59e-1 3.51e-1 -3.59e+0 -2.42e+0  1.00    3123.
## # … with 1 more variable: ess_tail <dbl>

# extract the draws
draws <- fit$draws()

# Convert to data frame
draws <- posterior::as_draws_df(draws)
head(draws)

## # A draws_df: 6 iterations, 1 chains, and 6 variables
##    lp__    p_c     p_t  effect   VE log_odds
## 1 -1041 0.0072 0.00045 -0.0068 0.94     -2.8
## 2 -1041 0.0075 0.00034 -0.0071 0.95     -3.1
## 3 -1044 0.0084 0.00072 -0.0077 0.91     -2.5
## 4 -1043 0.0065 0.00027 -0.0063 0.96     -3.2
## 5 -1043 0.0068 0.00026 -0.0065 0.96     -3.2
## 6 -1042 0.0078 0.00029 -0.0075 0.96     -3.3
## # ... hidden meta-columns {'.chain', '.iteration', '.draw'}

# look at the distribution of the effect
ggplot(draws, aes(x = effect*1e4)) +
  geom_density(fill = "blue", alpha = .2) +
  expand_limits(y = 0) + theme_minimal() +
  xlab("Size of the effect") +
    ggtitle("Reduciton in infections on treatment per 10,000 people")

# Probability that there is an effect is the negative mass of the effect 
# distribution; more negative favors treatment -- there is no question
# that there is an effect
round(mean(draws$effect < 0), 2)

## [1] 1

ggplot(draws, aes(x = log_odds)) +
  geom_density(fill = "blue", alpha = .2) +
  expand_limits(y = 0) + theme_minimal() +
  xlab("Log odds") + 
  ggtitle("Log odds of the treatment effect.
More negative, less likely to get infected on treatment")

label_txt <- paste("median =", round(median(draws$VE), 2))
ggplot(draws, aes(x = VE)) +
  geom_density(fill = "blue", alpha = .2) +
  expand_limits(y = 0) + theme_minimal() +
  geom_vline(xintercept = median(draws$VE), size = 0.2) +
  annotate("text", x = 0.958, y = 10, label = label_txt, size = 3) +
  xlab("Vaccine effectiveness") +
  ggtitle("Pfizer study protocol defines VE = 1 - Pt/Pc")

quant <- round(quantile(draws$VE, probs = c(0.05, 0.5, 0.95)), 2)

So if we observe 162 infections in placebo group and 8 in the vaccinated group, the vaccine could be considered to be about 0.95 effective with the likely effectiveness anywhere between 0.91 and 0.97 which represents a median and 90% quantile interval. We should insist that reporters produce uncertainty estimates and the reporters in turn should insist that companies provide them.

Vaccine Effectiveness Simulation

Eric Novik

17 Nov 2020

Assumptions

Model

Running the model