0.1 Introduction

Vaccination rates are an essential metric in understanding public health trends. This analysis uses data from the Open Disease API and a dataset to explore vaccination trends in the United States over the years. The goal is to assess how vaccination rates have evolved and identify factors that might influence these trends.

0.2 Methods

0.2.1 Data Import

# Load Required Libraries
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(ggplot2)

# Load the dataset
vaccination_data <- read.csv("us_vaccination_data.csv")

# Use the correct date column
vaccination_data <- vaccination_data %>%
  mutate(date = as.Date(date.1, format = "%m/%d/%y")) %>%  # Convert `date.1` to Date
  select(date, totalPerHundred) %>%  # Keep relevant columns
  rename(coverage = totalPerHundred)  # Rename for consistency

# Ensure the `coverage` column is numeric
vaccination_data$coverage <- as.numeric(vaccination_data$coverage)

# Preview the data
head(vaccination_data)
##         date coverage
## 1 2020-12-01        0
## 2 2020-12-02        0
## 3 2020-12-03        0
## 4 2020-12-04        0
## 5 2020-12-05        0
## 6 2020-12-06        0

0.3 Results

0.3.1 Descriptive Statistics

# Calculate summary statistics
summary_stats <- vaccination_data %>%
  summarize(mean_coverage = mean(coverage, na.rm = TRUE),
            sd_coverage = sd(coverage, na.rm = TRUE),
            min_coverage = min(coverage, na.rm = TRUE),
            max_coverage = max(coverage, na.rm = TRUE))
summary_stats
##   mean_coverage sd_coverage min_coverage max_coverage
## 1      84.18551    83.72416            0          203

0.3.2 Visualization

0.3.2.2 Bar Chart: Monthly Vaccination Coverage

vaccination_data <- vaccination_data %>%
  mutate(month = format(date, "%Y-%m"))

ggplot(vaccination_data, aes(x = month, y = coverage)) +
  geom_bar(stat = "identity", fill = "steelblue") +
  labs(title = "Monthly Vaccination Coverage in the US", x = "Month", y = "Coverage (%)") +
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 1)) +
  scale_x_discrete(breaks = function(x) x[seq(1, length(x), by = 6)]) +
  theme_minimal()

0.3.3 Hypothesis Testing

0.3.3.1 Pre-2021 vs Post-2021 Coverage

pre_2021 <- vaccination_data %>% filter(date < as.Date("2021-01-01"))
post_2021 <- vaccination_data %>% filter(date >= as.Date("2021-01-01"))

# Perform a t-test
t_test <- t.test(pre_2021$coverage, post_2021$coverage)
t_test
## 
##  Welch Two Sample t-test
## 
## data:  pre_2021$coverage and post_2021$coverage
## t = -38.993, df = 1447.2, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -90.17777 -81.53927
## sample estimates:
##  mean of x  mean of y 
##  0.1290323 85.9875519

0.3.4 Linear Model

0.3.4.1 Predicting Coverage Over Time

vaccination_data <- vaccination_data %>%
  mutate(days_since_start = as.numeric(date - min(date)))

model <- lm(coverage ~ days_since_start, data = vaccination_data)
summary(model)
## 
## Call:
## lm(formula = coverage ~ days_since_start, data = vaccination_data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -147.87  -52.60  -26.14   72.00  131.84 
## 
## Coefficients:
##                    Estimate Std. Error t value Pr(>|t|)    
## (Intercept)      147.867778   3.912817   37.79   <2e-16 ***
## days_since_start  -0.086290   0.004591  -18.80   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 75.23 on 1475 degrees of freedom
## Multiple R-squared:  0.1932, Adjusted R-squared:  0.1927 
## F-statistic: 353.3 on 1 and 1475 DF,  p-value: < 2.2e-16

0.4 Discussion

The results indicate vaccination trends in the United States over the years. The linear model suggests a general increase in vaccination rates, with a significant rise after 2020, likely due to the COVID-19 vaccination campaigns. The hypothesis test confirms a statistically significant difference in coverage before and after 2020.

0.4.1 Limitations

  • The dataset may not include all historical data.
  • API constraints might limit detailed analysis for specific vaccines or demographics.

0.4.2 Future Work

Further analyses could examine demographic factors influencing vaccination rates or the impact of state-level policies on coverage.