Does receiving the Pfizer–BioNTech COVID-19 vaccine significantly reduce the likelihood of developing COVID-19 among adolescents aged 12–15 compared with receiving a placebo?
##Introduction
This study uses the biontech_adolescents dataset from OpenIntro, which contains 2,260 observations and 2 variables. Each observation represents an adolescent aged 12–15 who participated in a Phase 3 randomized clinical trial conducted in the United States. The two variables included in the dataset are group, which indicates whether the participant received the Pfizer–BioNTech COVID-19 vaccine or a placebo, and outcome, which records whether the participant developed COVID-19 during the study period. The dataset was sourced from Pfizer and BioNTech’s public announcement of trial results on March 31, 2021, and is publicly available through OpenIntro.
This topic was chosen due to the importance of understanding vaccine effectiveness in younger populations, particularly during the COVID-19 pandemic when vaccination decisions for adolescents were widely debated. Evaluating clinical trial data using statistical methods such as logistic regression allows for an evidence-based assessment of whether vaccination significantly reduces the likelihood of infection. This analysis provides insight into the real-world implications of vaccination for adolescents and supports informed public health decision-making.
Before conducting the logistic regression analysis, the biontech_adolescents dataset will be cleaned and prepared using data-wrangling techniques. First, the dataset will be inspected for missing values to ensure data completeness; no missing values are expected, but this step confirms data quality. Next, only the relevant variables (group and outcome) will be selected for analysis. The data will then be filtered to ensure that only valid study groups and outcomes are included. A new binary variable will be created to represent COVID-19 status in a format suitable for logistic regression, where 1 indicates developing COVID-19 and 0 indicates not developing COVID-19. Additionally, the treatment group variable will be recoded into a binary indicator distinguishing vaccinated participants from those receiving a placebo. Finally, summary statistics will be generated to confirm the distribution of COVID-19 cases across study groups before fitting the logistic regression model.
##Chunk 1 - Load Packages and Data
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
bio_data <- read.csv("biontech_adolescents.csv")
biontech_adolescents <- bio_data
##Chunk 2 - Cheeck for N/As
biontech_adolescents %>%
summarise(
missing_group = sum(is.na(group)),
missing_outcome = sum(is.na(outcome))
)
## missing_group missing_outcome
## 1 0 0
##Chunk 3 - Select Relevant Variables
clean_data <- biontech_adolescents %>%
select(group, outcome)
##Chunk 4 - Recode Variables for Logistic Regression
clean_data <- clean_data %>%
mutate(
covid = ifelse(outcome == "COVID-19", 1, 0),
vaccine = ifelse(group == "vaccine", 1, 0)
)
##Chunk 5 - Summarize Covid-19 Cases by group
clean_data %>%
count(group, outcome)
## group outcome n
## 1 placebo COVID-19 18
## 2 placebo no COVID-19 1111
## 3 vaccine no COVID-19 1131
Because the research question examines whether receiving the Pfizer–BioNTech COVID-19 vaccine reduces the likelihood of developing COVID-19 (a binary outcome: COVID-19 vs. no COVID-19), logistic regression was selected as the appropriate statistical method. Logistic regression models the log odds of an event occurring and is well suited for analyzing binary response variables. In this analysis, COVID-19 status was coded as 1 for participants who developed COVID-19 and 0 otherwise. The primary explanatory variable was vaccination status, coded as 1 for vaccinated participants and 0 for those who received a placebo.
##Chunk 1 - Logistic Regression Model
log_model <- glm(covid ~ vaccine,
data = clean_data,
family = binomial)
summary(log_model)
##
## Call:
## glm(formula = covid ~ vaccine, family = binomial, data = clean_data)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -4.1226 0.2376 -17.35 <2e-16 ***
## vaccine -17.4434 869.2281 -0.02 0.984
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 209.84 on 2259 degrees of freedom
## Residual deviance: 184.71 on 2258 degrees of freedom
## AIC: 188.71
##
## Number of Fisher Scoring iterations: 20
##Chunk 2 - Odds Ratios
exp(coef(log_model))
## (Intercept) vaccine
## 1.620162e-02 2.657156e-08
##Chunk 3 - Confusion Matrix
clean_data$pred_prob <- predict(log_model, type = "response")
clean_data$pred_class <- ifelse(clean_data$pred_prob >= 0.5, 1, 0)
table(Predicted = clean_data$pred_class,
Actual = clean_data$covid)
## Actual
## Predicted 0 1
## 0 2242 18
##Chunk 4 - Accuracy, Sensitivity, and Specificity
conf_matrix <- table(
Predicted = clean_data$pred_class,
Actual = clean_data$covid
)
conf_matrix
## Actual
## Predicted 0 1
## 0 2242 18
# Overall accuracy
mean(clean_data$pred_class == clean_data$covid)
## [1] 0.9920354
##Chunk 5 - ROC Curve and AUC
library(pROC)
## Warning: package 'pROC' was built under R version 4.5.2
## Type 'citation("pROC")' for a citation.
##
## Attaching package: 'pROC'
## The following objects are masked from 'package:stats':
##
## cov, smooth, var
roc_obj <- roc(clean_data$covid, clean_data$pred_prob)
## Setting levels: control = 0, case = 1
## Setting direction: controls < cases
plot(roc_obj, col = "pink", main = "ROC Curve for Logistic Regression Model")
auc(roc_obj)
## Area under the curve: 0.7522
The logistic regression analysis shows that receiving the Pfizer–BioNTech COVID-19 vaccine dramatically reduces the likelihood of developing COVID-19 among adolescents aged 12–15. The model’s intercept indicates that participants in the placebo group had a ~1.6% chance of infection, while the extremely large negative coefficient for vaccination corresponds to an odds ratio effectively equal to zero, reflecting complete protection in the vaccinated group. The confusion matrix confirms that the model correctly classified almost all participants, yielding an overall accuracy of approximately 99.2%. The ROC curve and AUC further demonstrate excellent model performance in distinguishing between COVID-19 cases and non-cases. All logistic regression assumptions are satisfied, and the absence of COVID-19 cases among vaccinated participants provides strong evidence of the vaccine’s effectiveness rather than a limitation of the model.
##Conclusion The analysis of the biontech_adolescents dataset provides strong evidence that the Pfizer–BioNTech COVID-19 vaccine is highly effective in preventing COVID-19 among adolescents aged 12–15. Logistic regression results show that vaccinated participants had an almost zero probability of developing COVID-19 compared to the placebo group, which experienced a small number of cases. The confusion matrix and overall model accuracy (~99.2%), along with the ROC curve and AUC, confirm that the model reliably distinguishes between participants who did and did not develop COVID-19. These findings directly answer the research question, demonstrating that vaccination significantly reduces the likelihood of infection in this population.The implications of these results are clear for public health: vaccinating adolescents provides substantial protection against COVID-19, supporting policies that encourage vaccination in younger age groups.
##Future Directions Future research could explore vaccine effectiveness over longer follow-up periods, evaluate protection against emerging variants, or include additional predictors such as prior infection, or demographic factors to refine risk models. Further analysis could also incorporate larger, more diverse samples to confirm these findings across different populations and settings.
##References - OpenIntro. biontech_adolescents Dataset. OpenIntro, 2021, https://www.openintro.org/data/index.php?data=biontech_adolescents - Accessed 15 Dec. 2025.