Vaccination rates are an essential metric in understanding public health trends. This analysis uses data from the Open Disease API and a dataset to explore vaccination trends in the United States over the years. The goal is to assess how vaccination rates have evolved and identify factors that might influence these trends.
# Load Required Libraries
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(ggplot2)
# Load the dataset
vaccination_data <- read.csv("us_vaccination_data.csv")
# Use the correct date column
vaccination_data <- vaccination_data %>%
mutate(date = as.Date(date.1, format = "%m/%d/%y")) %>% # Convert `date.1` to Date
select(date, totalPerHundred) %>% # Keep relevant columns
rename(coverage = totalPerHundred) # Rename for consistency
# Ensure the `coverage` column is numeric
vaccination_data$coverage <- as.numeric(vaccination_data$coverage)
# Preview the data
head(vaccination_data)
## date coverage
## 1 2020-12-01 0
## 2 2020-12-02 0
## 3 2020-12-03 0
## 4 2020-12-04 0
## 5 2020-12-05 0
## 6 2020-12-06 0
# Calculate summary statistics
summary_stats <- vaccination_data %>%
summarize(mean_coverage = mean(coverage, na.rm = TRUE),
sd_coverage = sd(coverage, na.rm = TRUE),
min_coverage = min(coverage, na.rm = TRUE),
max_coverage = max(coverage, na.rm = TRUE))
summary_stats
## mean_coverage sd_coverage min_coverage max_coverage
## 1 84.18551 83.72416 0 203
ggplot(vaccination_data, aes(x = date, y = coverage)) +
geom_line(color = "blue") +
labs(title = "Vaccination Trends in the US", x = "Date", y = "Coverage (%)") +
theme_minimal()
vaccination_data <- vaccination_data %>%
mutate(month = format(date, "%Y-%m"))
ggplot(vaccination_data, aes(x = month, y = coverage)) +
geom_bar(stat = "identity", fill = "steelblue") +
labs(title = "Monthly Vaccination Coverage in the US", x = "Month", y = "Coverage (%)") +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 1)) +
scale_x_discrete(breaks = function(x) x[seq(1, length(x), by = 6)]) +
theme_minimal()
pre_2021 <- vaccination_data %>% filter(date < as.Date("2021-01-01"))
post_2021 <- vaccination_data %>% filter(date >= as.Date("2021-01-01"))
# Perform a t-test
t_test <- t.test(pre_2021$coverage, post_2021$coverage)
t_test
##
## Welch Two Sample t-test
##
## data: pre_2021$coverage and post_2021$coverage
## t = -38.993, df = 1447.2, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -90.17777 -81.53927
## sample estimates:
## mean of x mean of y
## 0.1290323 85.9875519
vaccination_data <- vaccination_data %>%
mutate(days_since_start = as.numeric(date - min(date)))
model <- lm(coverage ~ days_since_start, data = vaccination_data)
summary(model)
##
## Call:
## lm(formula = coverage ~ days_since_start, data = vaccination_data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -147.87 -52.60 -26.14 72.00 131.84
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 147.867778 3.912817 37.79 <2e-16 ***
## days_since_start -0.086290 0.004591 -18.80 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 75.23 on 1475 degrees of freedom
## Multiple R-squared: 0.1932, Adjusted R-squared: 0.1927
## F-statistic: 353.3 on 1 and 1475 DF, p-value: < 2.2e-16
The results indicate vaccination trends in the United States over the years. The linear model suggests a general increase in vaccination rates, with a significant rise after 2020, likely due to the COVID-19 vaccination campaigns. The hypothesis test confirms a statistically significant difference in coverage before and after 2020.
Further analyses could examine demographic factors influencing vaccination rates or the impact of state-level policies on coverage.