library(epitools) #contains the credit data
library(tidyverse)
theme_set(theme_classic())
Pfizer indicated they had evidence of a vaccine canditate that was found to be more than 90% effective. In a press-release from the 9th of November 2020. I thought it would be interesting to do a simple categorical data analysis using R to illustrate how one might estimate effectiveness in a simple scenario.
Pfizer indicated the vaccine candidate was found to be more than 90% effective in preventing COVID-19 in participants without evidence of prior SARS-CoV-2 infection in the first interim efficacy analysis. The analysis evaluated 94 confirmed cases of COVID-19 in trial participants, and the study enrolled 43,538 participants. Safety and additional efficacy data continue to be collected but I thought it would be interesting to do a quick study to help us understand what might be going on. Note that they probably have more complicated models with stratification and might be using a Poisson model or other Generalized Linaer Model to get estimates but for simplification sake we will make some assumptions and illustrate.
If Efficacy is 0.90 and we have 94 cases, Efficacy is defined as: \[E = 1 -\frac{ARV}{ARU}\] where \(ARV\) is attack rate in vaccinated (proportion in vaccinated group who get sick) and \(ARU\) is attack rate in unvaccinated, if we assume equal counts are randomized to the treatement and control condition and do some algebra if we have \(E=0.90\) and 94 cases, we get 10.4 cases in vaccinated group, since they said more than 90% we can round down to 10. So that means we have 10 cases in the vaccinated group and 84 in the unvaccinated group, and 21,769 people in each group.
#below uses epitools package
# The 1stline below creates the contingency table; the 2nd line prints the table so you can check the orientation
RRtable<-matrix(c(21685,21759,84,10),nrow = 2, ncol = 2)
RRtable
## [,1] [,2]
## [1,] 21685 84
## [2,] 21759 10
# The next line asks R to compute the RR and 95% confidence interval
rrout=riskratio.wald(RRtable)
rrout
## $data
## Outcome
## Predictor Disease1 Disease2 Total
## Exposed1 21685 84 21769
## Exposed2 21759 10 21769
## Total 43444 94 43538
##
## $measure
## risk ratio with 95% C.I.
## Predictor estimate lower upper
## Exposed1 1.0000000 NA NA
## Exposed2 0.1190476 0.06181477 0.229271
##
## $p.value
## two-sided
## Predictor midp.exact fisher.exact chi.square
## Exposed1 NA NA NA
## Exposed2 4.440892e-16 9.702399e-16 2.159265e-14
##
## $correction
## [1] FALSE
##
## attr(,"method")
## [1] "Unconditional MLE & normal approximation (Wald) CI"
#do 1-rrout to get the riskratio
1-rrout$measure
## risk ratio with 95% C.I.
## Predictor estimate lower upper
## Exposed1 0.0000000 NA NA
## Exposed2 0.8809524 0.9381852 0.770729
cases <- tibble(
treatment = c("Not Vaccinated", "Vaccinated"),
Cases = c(84, 10)
)
ggplot(data = cases, mapping = aes(x = treatment, y = Cases, fill=treatment) )+
geom_col()+
ggtitle("Cases of Covid-19 Assuming Equal Numbers in Each Group") +
xlab(" ") + ylab("Number of Cases")+ theme(legend.position="none")