Assumptions

From the New York Times: New Pfizer Results: Coronavirus Vaccine Is Safe and 95% Effective.

n <- 4.4e4  # number of volunteers
r_c <- 162  # number of events in control
r_t <- 8    # number of events in vaccine group

NYT reports a 44 thousand person trial with half of the people going to treatment and half to control. They further report that 162 people developed COVID in the control group and 8 were in the vaccine group.

What is the probability that the vaccine is effective and what is the uncertainty in that probability? The Pfizer protocol defines vaccine effectiveness as follows:

\[ \text{VE} = 1 - \frac{p_{t}}{p_{c}} \] Here \(p_{t}\) is infection rate in vaccinated group and \(p_{c}\) is the rate in the control group.

Model

Also, let’s assume that we have no prior beliefs about the effectiveness rate and so our model is as follows: \[ \begin{align} p_{c} \sim \textsf{beta}(1, 1) \\ p_{t} \sim \textsf{beta}(1, 1) \\ y_{c} \sim \textsf{binomial}(n_{c},p_{c}) \\ y_{t} \sim \textsf{binomial}(n_{t},p_{t}) \\ \end{align} \] The treatment effect and \(VE\) can be computed directly from this model.

\[ \begin{align} \text{effect} = p_{t} - p_{c} \\ \text{VE} = 1 - \frac{p_{t}}{p_{c}} \end{align} \]

The effect will have a distribution and to get the probability of the effect (this is different from VE), we sum up the negative area. For this problem we do not need Stan, but I am including it here to show how easy it is to specify this model, once we write down the math above.

data {
  int<lower=1> r_c; // num events, control
  int<lower=1> r_t; // num events, treatment
  int<lower=1> n_c; // num cases, control
  int<lower=1> n_t; // num cases, treatment
  int<lower=1> a;   // prior a for beta(a, b)
  int<lower=1> b;   // prior b for beta(a, b)
}
parameters {
  real<lower=0, upper=1> p_c; // binomial p for control
  real<lower=0, upper=1> p_t; // binomial p for treatment 
}
model {
  p_c ~ beta(a, b); // prior for control
  p_t ~ beta(a, b); // prior for treatment
  r_c ~ binomial(n_c, p_c); // likelihood for control
  r_t ~ binomial(n_t, p_t); // likelihood for treatment
}
generated quantities {
  real effect   = p_t - p_c;      // treatment effect
  real VE       = 1 - p_t / p_c;  // vaccine effectiveness
  real log_odds = log(p_t / (1 - p_t)) -
                  log(p_c / (1 - p_c));
}

Running the model

Let’s run this model from R and make a few plots.

library(cmdstanr)
library(posterior)
library(ggplot2)

# first we get the data ready for Stan
d <- list(r_c = r_c, r_t = r_t, n_c = n/2, 
          n_t = n/2, a = 1, b = 1) # beta(1,1) -> uniform prior

# compile the model
mod <- cmdstan_model("vaccine.stan")
## Model executable is up to date!
# fit the model with MCMC
fit <- mod$sample(
  data = d,
  seed = 123,
  chains = 4,
  parallel_chains = 4,
  refresh = 0
)
## Running MCMC with 4 parallel chains...
## 
## Chain 1 finished in 0.2 seconds.
## Chain 2 finished in 0.2 seconds.
## Chain 3 finished in 0.2 seconds.
## Chain 4 finished in 0.2 seconds.
## 
## All 4 chains finished successfully.
## Mean chain execution time: 0.2 seconds.
## Total execution time: 0.3 seconds.
# parameter summary
fit$summary()
## # A tibble: 6 x 10
##   variable     mean   median      sd     mad       q5      q95  rhat ess_bulk
##   <chr>       <dbl>    <dbl>   <dbl>   <dbl>    <dbl>    <dbl> <dbl>    <dbl>
## 1 lp__     -1.04e+3 -1.04e+3 1.03e+0 7.04e-1 -1.04e+3 -1.04e+3  1.00    1645.
## 2 p_c       7.40e-3  7.40e-3 5.78e-4 5.86e-4  6.48e-3  8.40e-3  1.00    3209.
## 3 p_t       4.06e-4  3.91e-4 1.37e-4 1.32e-4  2.08e-4  6.54e-4  1.00    3137.
## 4 effect   -7.00e-3 -6.99e-3 5.95e-4 6.03e-4 -8.01e-3 -6.05e-3  1.00    3217.
## 5 VE        9.45e-1  9.47e-1 1.92e-2 1.84e-2  9.10e-1  9.72e-1  1.00    3124.
## 6 log_odds -2.97e+0 -2.95e+0 3.59e-1 3.51e-1 -3.59e+0 -2.42e+0  1.00    3123.
## # … with 1 more variable: ess_tail <dbl>
# extract the draws
draws <- fit$draws()

# Convert to data frame
draws <- posterior::as_draws_df(draws)
head(draws)
## # A draws_df: 6 iterations, 1 chains, and 6 variables
##    lp__    p_c     p_t  effect   VE log_odds
## 1 -1041 0.0072 0.00045 -0.0068 0.94     -2.8
## 2 -1041 0.0075 0.00034 -0.0071 0.95     -3.1
## 3 -1044 0.0084 0.00072 -0.0077 0.91     -2.5
## 4 -1043 0.0065 0.00027 -0.0063 0.96     -3.2
## 5 -1043 0.0068 0.00026 -0.0065 0.96     -3.2
## 6 -1042 0.0078 0.00029 -0.0075 0.96     -3.3
## # ... hidden meta-columns {'.chain', '.iteration', '.draw'}
# look at the distribution of the effect
ggplot(draws, aes(x = effect*1e4)) +
  geom_density(fill = "blue", alpha = .2) +
  expand_limits(y = 0) + theme_minimal() +
  xlab("Size of the effect") +
    ggtitle("Reduciton in infections on treatment per 10,000 people")

# Probability that there is an effect is the negative mass of the effect 
# distribution; more negative favors treatment -- there is no question
# that there is an effect
round(mean(draws$effect < 0), 2)
## [1] 1
ggplot(draws, aes(x = log_odds)) +
  geom_density(fill = "blue", alpha = .2) +
  expand_limits(y = 0) + theme_minimal() +
  xlab("Log odds") + 
  ggtitle("Log odds of the treatment effect.
More negative, less likely to get infected on treatment")

label_txt <- paste("median =", round(median(draws$VE), 2))
ggplot(draws, aes(x = VE)) +
  geom_density(fill = "blue", alpha = .2) +
  expand_limits(y = 0) + theme_minimal() +
  geom_vline(xintercept = median(draws$VE), size = 0.2) +
  annotate("text", x = 0.958, y = 10, label = label_txt, size = 3) +
  xlab("Vaccine effectiveness") +
  ggtitle("Pfizer study protocol defines VE = 1 - Pt/Pc")

quant <- round(quantile(draws$VE, probs = c(0.05, 0.5, 0.95)), 2)

So if we observe 162 infections in placebo group and 8 in the vaccinated group, the vaccine could be considered to be about 0.95 effective with the likely effectiveness anywhere between 0.91 and 0.97 which represents a median and 90% quantile interval. We should insist that reporters produce uncertainty estimates and the reporters in turn should insist that companies provide them.