library(epitools) #contains the credit data
library(tidyverse)
theme_set(theme_classic())

Analyze 2x2 Categorical Vaccine Data

Pfizer indicated they had evidence of a vaccine canditate that was found to be more than 90% effective. In a press-release from the 9th of November 2020. I thought it would be interesting to do a simple categorical data analysis using R to illustrate how one might estimate effectiveness in a simple scenario.

https://www.pfizer.com/news/press-release/press-release-detail/pfizer-and-biontech-announce-vaccine-candidate-against

Pfizer indicated the vaccine candidate was found to be more than 90% effective in preventing COVID-19 in participants without evidence of prior SARS-CoV-2 infection in the first interim efficacy analysis. The analysis evaluated 94 confirmed cases of COVID-19 in trial participants, and the study enrolled 43,538 participants. Safety and additional efficacy data continue to be collected but I thought it would be interesting to do a quick study to help us understand what might be going on. Note that they probably have more complicated models with stratification and might be using a Poisson model or other Generalized Linaer Model to get estimates but for simplification sake we will make some assumptions and illustrate.

If Efficacy is 0.90 and we have 94 cases, Efficacy is defined as: \[E = 1 -\frac{ARV}{ARU}\] where \(ARV\) is attack rate in vaccinated (proportion in vaccinated group who get sick) and \(ARU\) is attack rate in unvaccinated, if we assume equal counts are randomized to the treatement and control condition and do some algebra if we have \(E=0.90\) and 94 cases, we get 10.4 cases in vaccinated group, since they said more than 90% we can round down to 10. So that means we have 10 cases in the vaccinated group and 84 in the unvaccinated group, and 21,769 people in each group.

#below uses epitools package
# The 1stline below creates the contingency table; the 2nd line prints the table so you can check the orientation
RRtable<-matrix(c(21685,21759,84,10),nrow = 2, ncol = 2)
RRtable
##       [,1] [,2]
## [1,] 21685   84
## [2,] 21759   10
# The next line asks R to compute the RR and 95% confidence interval
rrout=riskratio.wald(RRtable)
rrout
## $data
##           Outcome
## Predictor  Disease1 Disease2 Total
##   Exposed1    21685       84 21769
##   Exposed2    21759       10 21769
##   Total       43444       94 43538
## 
## $measure
##           risk ratio with 95% C.I.
## Predictor   estimate      lower    upper
##   Exposed1 1.0000000         NA       NA
##   Exposed2 0.1190476 0.06181477 0.229271
## 
## $p.value
##           two-sided
## Predictor    midp.exact fisher.exact   chi.square
##   Exposed1           NA           NA           NA
##   Exposed2 4.440892e-16 9.702399e-16 2.159265e-14
## 
## $correction
## [1] FALSE
## 
## attr(,"method")
## [1] "Unconditional MLE & normal approximation (Wald) CI"
#do 1-rrout to get the riskratio
1-rrout$measure
##           risk ratio with 95% C.I.
## Predictor   estimate     lower    upper
##   Exposed1 0.0000000        NA       NA
##   Exposed2 0.8809524 0.9381852 0.770729

Bar Chart

cases <- tibble(
   treatment = c("Not Vaccinated", "Vaccinated"),
   Cases = c(84, 10)
 )
ggplot(data = cases, mapping = aes(x = treatment, y = Cases, fill=treatment) )+
   geom_col()+
 ggtitle("Cases of Covid-19 Assuming Equal Numbers in Each Group") +
   xlab(" ") + ylab("Number of Cases")+ theme(legend.position="none")