smallpox <- read.csv("smallpox.csv")
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ forcats   1.0.0     ✔ readr     2.1.5
## ✔ ggplot2   4.0.0     ✔ stringr   1.5.2
## ✔ lubridate 1.9.4     ✔ tibble    3.3.0
## ✔ purrr     1.1.0     ✔ tidyr     1.3.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

Research Question: Is there a significant association between inoculation status and mortality?

Introduction

The data set that I’ll be using for this project is named smallpox which is a data set of a sample of 6224 individuals from the year 1721 . There are two variables in this set which are inoculated (yes/no) and result (died/lived). Inoculation is essentially the practice of deliberately infecting someone with mild smallpox material and was the early form of immunization, which is a precursor to vaccination.

The reason why I chose this topic is because of how vaccination has come such a long way throughout history and there are still continuous debates to this day about its effectiveness. A lot of misinformation are spread about vaccines so maybe looking at actual data will show us how the effectiveness of early methods of immunization and vaccination can translate to their integrity in our modern era.

Data Set

“Smallpox Vaccine Results.” Data Sets, www.openintro.org/data/index.php?data=smallpox. Accessed 13 Dec. 2025.

Data Analysis

For my data analysis, I’ll first start out with cleaning my data set, making sure there are no N/As in them. Next step will summarize my data set. I’ll also create variables to get the mortality rate of the inoculated group versus the not inoculated group.

Clean data set

smallpox_clean <- smallpox |>
  filter(!is.na(result)) %>%
  filter(!is.na(inoculated))

Summary of data set

summary(smallpox_clean)
##     result           inoculated       
##  Length:6224        Length:6224       
##  Class :character   Class :character  
##  Mode  :character   Mode  :character

Count how many in total and how many died in each group.

inoculated <- smallpox_clean |>
  filter(inoculated == "yes")
total_inoc <- sum(count(inoculated))

inoculated_died <- inoculated |>
  filter(result == "died")
died_inoc <- sum(count(inoculated_died))

not_inoculated <- smallpox_clean |>
  filter(inoculated == "no")
total_no<- sum(count(not_inoculated))

not_inoculated_died <- not_inoculated |>
  filter(result == "died")
died_no<- sum(count(not_inoculated_died))

Calculate mortality rate

mortality_inoculated <- died_inoc / total_inoc
mortality_inoculated
## [1] 0.02459016
mortality_not_inoculated <- died_no / total_no
mortality_not_inoculated
## [1] 0.1411371

Statistical Analysis (Hypothesis Testing)

For this stage in the project, I decided to do a Chi-Squared Test of Independence since the data I’m working with has both variables being categorical. Also since the sample size is pretty big and it would answer my research question which is about association, not really about prediction.

observed_dataset<- table(smallpox_clean$inoculated, smallpox_clean$result)
observed_dataset
##      
##       died lived
##   no   844  5136
##   yes    6   238

Hypothesis

\(H_0\) : Inoculation status and mortality rate are not associated with each other \(H_a\) : Inoculation status and mortality rate are associated with each other

chisq.test(observed_dataset)
## 
##  Pearson's Chi-squared test with Yates' continuity correction
## 
## data:  observed_dataset
## X-squared = 26.026, df = 1, p-value = 3.369e-07

Results

The degree of freedom is just 1 which also helps in producing our very small p-value of 3.369e-07. Because of our very low p-value, we can reject the null hypothesis, meaning that there is a significant evidence to prove that inoculation status is associated with mortality rate.

Create a bar graph to show the proportion of those who died and lived on whether they got inoculated or not

ggplot(smallpox_clean, aes(x = inoculated, fill = result)) +
  geom_bar(position = "fill") +
  labs(
    title = "Proportion of Mortality by Inoculation",
    x = "Inoculated",
    y = "Proportion"
  ) +
  scale_fill_manual(
    values = c("died" = "#EE9572", "lived" = "#E0EEE0")
  ) +
  theme_minimal()

Results

The bar graph shows us the proportion of people who died versus who lived in the two groups that was either not inoculated and was inoculated. On the bar to the left, we see that the not inoculated group has a bigger proportion that died compared to the bar on the right. From our Chi-Squared test, we know that there is a significant evidence to prove the association in mortality because of our low p-value. In this graph it might look insignificant, but because of our big sample size, that difference makes it a lot.

Conclusion

In conclusion, there is a significant evidence to show that there is an association between inoculation and mortality rate. Our resulting p-value from the Chi-Squared Test of Independence is our significant evidence to reject our null. From the data analysis stage in this project as well, we saw that the mortality rate was higher in the not inoculated group. This answers our research question and shows that newer forms of inoculation like immunization and vaccination can be helpful in saving lives.

Some potential avenues for this research would be getting newer data that covers immunization and vaccination or maybe having data sets that covers inoculation throughout the years considering our data set was from the 1700s.

Sources

“Smallpox Vaccine Results.” Data Sets, www.openintro.org/data/index.php?data=smallpox. Accessed 13 Dec. 2025.

“A Brief History of Vaccination.” World Health Organization, World Health Organization, www.who.int/news-room/spotlight/history-of-vaccination/a-brief-history-of-vaccination. Accessed 13 Dec. 2025.