Final Project Template

Part 1 - Introduction

Measles also known as rubeola is an illness that is highly contagious and is caused by a virus.This is an airborne disease which means that it spreads through the air by a cough, sneeze, or when the infected person breathes. Measles is shown through flu-like symptoms and a rash that is widespread. Measles can make a person severely ill and can also be life-threatening. As we see now measles is making a comeback due to the decrease in childhood vaccination rate especially after the Covid-19 pandemic. Since this disease is so contagious it has the ability to easily cross borders from people that travel.

The data that we got from our TidyTuesday website shows stats between a handful of states and their vaccination rates as well as their different exemption rates due to either medical, religious or personal reasons to be exempted. The overall idea of our project is to look at the percentages between each state based on vaccinated vs unvaccinated.

Based on the data we have from the TidyTuesday project, our initial focus involved identifying the percentage of measles vaccinations across different states. This required examining the relevant columns within the dataset that detailed the vaccination rates, specifically for the Measles, Mumps, and Rubella (MMR) vaccine, as measles is one component of this combined shot. By filtering and grouping the data by state, we were able to calculate and compare the proportions of students who had received the MMR vaccination in each of the represented states. This allowed us to gain a preliminary understanding of the variation in vaccination coverage levels across the regions included in the dataset. Further analysis of the state-level vaccination percentages enabled us to identify states with particularly high or low measles vaccination rates. This comparison is crucial for understanding which areas might be more susceptible to measles outbreaks due to lower levels of herd immunity. Additionally, by looking at the provided exemption data (medical, religious, and personal), we could begin to explore potential correlations between exemption rates and overall vaccination coverage within each state. This initial exploration of the data sets the stage for a more in-depth investigation into the factors influencing vaccination rates and the potential public health implications.

#knitr::opts_chunk$set(echo = TRUE)
# load libraries
# Load necessary libraries
library(ggplot2)
library(dplyr)

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

library(ggmosaic)
library(tidytuesdayR)


# Now try loading the data again
tuesdata <- tt_load(2020, week = 9)

## ---- Compiling #TidyTuesday Information for 2020-02-25 ----

## --- There is 1 file available ---
## 
## 
## ── Downloading files ───────────────────────────────────────────────────────────
## 
##   1 of 1: "measles.csv"

# Load the TidyTuesday data for February 25, 2020
tuesdata <- tt_load(2020, week = 9)

## ---- Compiling #TidyTuesday Information for 2020-02-25 ----
## --- There is 1 file available ---
## 
## 
## ── Downloading files ───────────────────────────────────────────────────────────
## 
##   1 of 1: "measles.csv"

measles_clean <- tuesdata$measles

# Clean variables
measles_clean <- measles_clean %>%
  mutate(
    state = as.factor(state),
    xmed = as.numeric(xmed),
    xrel = as.numeric(xrel),
    xper = as.numeric(xper)
  )

# Summarize exception rates
# Summarize medical exception rate
med_exemp <- measles_clean %>%
  group_by(state) %>%
  summarize(avg_percent_medical_except = mean(xmed, na.rm = TRUE)) %>%
  arrange(desc(avg_percent_medical_except))
med_exemp

## # A tibble: 32 × 2
##    state         avg_percent_medical_except
##    <fct>                              <dbl>
##  1 Michigan                            6.89
##  2 North Dakota                        5.92
##  3 Idaho                               5.05
##  4 Ohio                                4.60
##  5 Rhode Island                        4.14
##  6 Oklahoma                            3.99
##  7 Maine                               3.55
##  8 New Jersey                          3.15
##  9 Washington                          2.97
## 10 Massachusetts                       2.87
## # ℹ 22 more rows

# Summarize religious exception rate
rel_exemp <- measles_clean %>%
  group_by(state) %>%
  summarize(avg_percent_rel_except = mean(xrel, na.rm = TRUE)) %>%
  arrange(desc(avg_percent_rel_except))
rel_exemp

## # A tibble: 32 × 2
##    state          avg_percent_rel_except
##    <fct>                           <dbl>
##  1 Florida                             1
##  2 Illinois                            1
##  3 New Jersey                          1
##  4 New York                            1
##  5 North Carolina                      1
##  6 Tennessee                           1
##  7 Virginia                            1
##  8 Washington                          1
##  9 Arizona                           NaN
## 10 Arkansas                          NaN
## # ℹ 22 more rows

# Summarize personal exception rate
per_exemp <- measles_clean %>%
  group_by(state) %>%
  summarize(avg_percent_personal_except = mean(xper, na.rm = TRUE)) %>%
  arrange(desc(avg_percent_personal_except))
per_exemp

## # A tibble: 32 × 2
##    state        avg_percent_personal_except
##    <fct>                              <dbl>
##  1 Maine                               9.79
##  2 Colorado                            7.85
##  3 Arizona                             7.65
##  4 Wisconsin                           7.54
##  5 Washington                          6.73
##  6 North Dakota                        6.28
##  7 Minnesota                           5.95
##  8 Utah                                4.62
##  9 Oregon                              3.82
## 10 Arkansas                          NaN   
## # ℹ 22 more rows

# Plot for medical exemption rates
ggplot(med_exemp, aes(x = reorder(state, -avg_percent_medical_except), y = avg_percent_medical_except)) +
  geom_bar(stat = "identity", fill = "steelblue") +
  labs(title = "Average Medical Exception Rate by State", x = "State", y = "Average Medical Exception Rate (%)") +
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5))

## Warning: Removed 8 rows containing missing values or values outside the scale range
## (`geom_bar()`).

# Plot for religious exemption rates
ggplot(rel_exemp, aes(x = reorder(state, -avg_percent_rel_except), y = avg_percent_rel_except)) +
  geom_bar(stat = "identity", fill = "steelblue") +
  labs(title = "Average Religious Exception Rate by State", x = "State", y = "Average Religious Exception Rate (%)") +
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5))

## Warning: Removed 24 rows containing missing values or values outside the scale range
## (`geom_bar()`).

# Plot for personal exemption rates
ggplot(per_exemp, aes(x = reorder(state, -avg_percent_personal_except), y = avg_percent_personal_except)) +
  geom_bar(stat = "identity", fill = "steelblue") +
  labs(title = "Average Personal Exception Rate by State", x = "State", y = "Average Personal Exception Rate (%)") +
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5))

## Warning: Removed 23 rows containing missing values or values outside the scale range
## (`geom_bar()`).

# QQ plots for each exception type
# QQ plot for xmed
qqnorm(measles_clean$xmed, main = "QQ Plot of xmed", col = "coral3")
qqline(measles_clean$xmed, col = "black")

# QQ plot for xrel
qqnorm(measles_clean$xrel, main = "QQ Plot of xrel", col = "pink")
qqline(measles_clean$xrel, col = "black")

# QQ plot for xper
qqnorm(measles_clean$xper, main = "QQ Plot of xper", col = "blue4")
qqline(measles_clean$xper, col = "black")

# Run Kruskal-Wallis tests
kruskal_med <- kruskal.test(xmed ~overall, data = measles_clean)
print(kruskal_med)

## 
##  Kruskal-Wallis rank sum test
## 
## data:  xmed by overall
## Kruskal-Wallis chi-squared = 13050, df = 1319, p-value < 2.2e-16

kruskal_per <- kruskal.test(xper ~ overall, data = measles_clean)
print(kruskal_per)

## 
##  Kruskal-Wallis rank sum test
## 
## data:  xper by overall
## Kruskal-Wallis chi-squared = 3551.1, df = 1405, p-value < 2.2e-16

kruskal_rel <- kruskal.test(xrel ~ overall, data = measles_clean)
print(kruskal_rel)

## 
##  Kruskal-Wallis rank sum test
## 
## data:  xrel by overall
## Kruskal-Wallis chi-squared = NaN, df = 31, p-value = NA

# Boxplot for xmed
ggplot(measles_clean, aes(x = overall, y = xmed)) +
  geom_boxplot(fill = "skyblue") +
  labs(title = "xmed by Exception Group", x = "Exception Group", y = "xmed")

## Warning: Continuous x aesthetic
## ℹ did you forget `aes(group = ...)`?

## Warning: Removed 45122 rows containing non-finite outside the scale range
## (`stat_boxplot()`).

# Boxplot for xper
ggplot(measles_clean, aes(x = overall, y = xper)) +
  geom_boxplot(fill = "salmon") +
  labs(title = "xper by Exception Group", x = "Exception Group", y = "xper")

## Warning: Continuous x aesthetic
## ℹ did you forget `aes(group = ...)`?

## Warning: Removed 57560 rows containing non-finite outside the scale range
## (`stat_boxplot()`).

# Boxplot for xrel
ggplot(measles_clean, aes(x = overall, y = xrel)) +
  geom_boxplot(fill = "lightgreen") +
  labs(title = "xrel by Exception Group", x = "Exception Group", y = "xrel")

## Warning: Continuous x aesthetic
## ℹ did you forget `aes(group = ...)`?

## Warning: Removed 66004 rows containing non-finite outside the scale range
## (`stat_boxplot()`).

#Stacked Box Plot: Vaccination Distribution per State
is.numeric(measles_clean$xrel)

## [1] TRUE

is.numeric(measles_clean$xper)

## [1] TRUE

is.numeric(measles_clean$xmed)

## [1] TRUE

measles_clean$allexceptions <- rowSums(measles_clean[, c("xmed", "xrel", "xper")], na.rm = TRUE)
measles_subset <- measles_clean[ , c(-12, -13, -14)]
ggplot(measles_clean, aes(x = state, y = allexceptions)) +
  geom_boxplot(fill = "lightgreen") +
  labs(title = "Distribution of Vaccination Rates by State", x = "State", y = "% Vaccinated") +
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5))

Part 2 - Exploring the Data (Descriptive Statistics)

Summarizing Exemption Rates: The data examines average vaccine exemption rates for medical, religious, and personal beliefs across 32 U.S. states. Among these, medical exemptions were the most commonly reported, with Michigan having the highest average rate at 6.89%, followed by North Dakota (5.92%) and Idaho (5.05%). Religious exemptions were reported in only eight states, each showing an average of 1%. Personal belief exemptions were most prevalent in states such as Maine (9.79%), Colorado (7.85%), and Arizona (7.65%). Several states had missing data for religious and personal exemptions, likely due to differences in state laws or reporting practices. Overall, the analysis underscores how exemption rates and the types of exemptions allowed vary significantly across states. States that permit personal or religious exemptions tend to have higher opt-out rates, which could increase the risk of disease outbreaks. Standardizing data reporting and reviewing exemption policies may help improve vaccine coverage and public health outcomes. QQ Plots: The QQ plot for “xmed” (medical exemptions) shows a positively skewed distribution, indicating that while most states have low medical exemption rates, a few outliers exhibit notably higher rates. This suggests variability in medical exemptions, possibly influenced by local factors. Similarly, the plot for “xper” (personal exemptions) exhibits a heavy right tail, indicating that most states have low personal exemption rates, but there are some significant outliers with higher rates. These outliers may reflect variation driven by cultural or political factors. In contrast, the “xrel” (religious exemptions) plot shows almost no variability, with all states clustered around a single value. This indicates that religious exemptions are largely uniform across states, making it a less informative variable for analyzing vaccination exemption patterns. Stacked Box Plot: Vaccination Distribution per State: The box plots illustrate the distribution of vaccination rates across the 32 states. Most states show low vaccination rates, with the majority of data points clustered near the bottom of the y-axis, often below 20%. However, there is notable variability both within and between states. Several states, such as Arizona, Arkansas, California, Florida, Illinois, Iowa, Maine, Massachusetts, Michigan, Minnesota, Montana, New Jersey, New York, North Carolina, Ohio, Texas, Utah, Vermont, Virginia, Washington, and Wisconsin, show a wider spread of vaccination rates, as indicated by larger interquartile ranges (the height of the boxes) and the presence of outliers (individual dots above and below the whiskers). This suggests greater heterogeneity in vaccination rates within these states. On the other hand, states like Colorado, Connecticut, Idaho, Missouri, North Dakota, Oklahoma, Oregon, Pennsylvania, Rhode Island, South Dakota, and Tennessee exhibit narrower boxes and fewer outliers, indicating more consistent, generally low vaccination rates within these states. Notably, no state shows a particularly high median vaccination rate (the horizontal line within each box). Most medians are below 10%, with many closer to 0%. The presence of outliers, especially on the higher end for some states, suggests that while most vaccination rates are low, there are instances of higher rates in specific states. In summary, the data highlights a generally low level of vaccination across the states, but with significant variation both within and between states. This variability may be influenced by factors such as local policies, healthcare access, and community attitudes towards vaccination.

Part 3 - Statistical Test (Inferential Statistics)

Kruskal-Wallis Test: The Kruskal-Wallis tests reveal significant differences in exemption rates across states for both medical and personal exemptions. The results for “xmed” (medical exemptions) and “xper” (personal exemptions) show p-values less than 2.2×10^−16, indicating substantial variation in exemption rates across states. However, the test for “xrel” (religious exemptions) returned an invalid result (NaN), likely due to minimal variation in the data. This suggests that religious exemptions are largely uniform across states. In conclusion, medical and personal exemptions exhibit considerable state-to-state variability, while religious exemptions do not, warranting further exploration into the factors that contribute to this trend.

Part 4 - Conclusion

There are meaningful differences in measles vaccination rates between U.S. states, and even within states. These differences could have important public health implications, as areas with lower vaccination rates are more vulnerable to measles outbreaks. Limitations on this analysis included that there were not a lot of numbers for the exceptions of why people didn’t get the vaccine so it made it hard to do each variable separately so we had to combine them to get the most accurate data that we could. After concluding this research I would like to know what the statistical data looks like now since there are more outbreaks in the world today.

References

Cite your data + any pictures you used + the resources you used to support your background information in the introduction!
Cleveland Clinic. (2025, February 28). What is measles – do people still get it? https://my.clevelandclinic.org/health/diseases/8584-measles
Harmon, J. (2020). Tidytuesday/data/2020/2020-02-25/readme.md at main · rfordatascience/tidytuesday. GitHub. https://github.com/rfordatascience/tidytuesday/blob/main/data/2020/2020-02-25/readme.md
Centers for Disease Control and Prevention. (n.d.). Child immunization. https://www.cdc.gov
The World Bank. (n.d.). World Bank data on immunization. https://www.worldbank.org
Centers for Disease Control and Prevention. (n.d.). General immunization. https://www.cdc.gov
Wall Street Journal. (n.d.). Measles coverage. https://www.wsj.com