The goal of this project is to explore the relationship between COVID-19 vaccinations and confirmed COVID-19 cases in different countries. My research question is: How does the number of COVID-19 vaccinations administered relate to the number of confirmed COVID-19 cases in different countries?
The dataset I am using is from Our World in Data (https://ourworldindata.org/covid-vaccinations). It includes information on vaccination progress and COVID-19 case numbers for countries around the world. The main variables I am using are:
By using this dataset, I can see how vaccination numbers compare to total COVID-19 cases in countries such as the United States, India, Brazil, Canada, and the United Kingdom.
In this section, I will explore the relationship between COVID-19 vaccinations and confirmed cases across different countries. I will focus on the United States, India, Brazil, Canada, and the United Kingdom to see if countries with higher vaccination rates tend to have fewer COVID-19 cases.
# Load dplyr and ggplot2 for analysis
library(dplyr)
## Warning: package 'dplyr' was built under R version 4.4.3
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 4.4.3
# Filter data for selected countries and keep only needed columns
selected_countries <- c("United States", "India", "Brazil", "Canada", "United Kingdom")
covid_filtered <- covid_data %>%
filter(location %in% selected_countries) %>%
select(location, date, total_vaccinations, people_fully_vaccinated, total_cases)
# Remove rows with missing values
covid_filtered <- na.omit(covid_filtered)
# Show summary of key columns
summary(covid_filtered)
## location date total_vaccinations
## Length:4101 Length:4101 Min. :2.688e+04
## Class :character Class :character 1st Qu.:8.649e+07
## Mode :character Mode :character Median :3.515e+08
## Mean :6.774e+08
## 3rd Qu.:6.757e+08
## Max. :2.207e+09
## people_fully_vaccinated total_cases
## Min. : 1 Min. : 495343
## 1st Qu.: 31450822 1st Qu.: 8618240
## Median :148534163 Median : 30543908
## Mean :279473829 Mean : 32132335
## 3rd Qu.:230457960 3rd Qu.: 44690492
## Max. :951990527 Max. :103343569
# Create a new variable for vaccination-to-case ratio
covid_filtered <- covid_filtered %>%
mutate(vax_case_ratio = total_vaccinations / total_cases)
# Display the first few rows to check the new variable
head(covid_filtered)
## location date total_vaccinations people_fully_vaccinated total_cases
## 398 Brazil 2021-02-05 3401383 1962 9118513
## 399 Brazil 2021-02-06 3553681 19677 9118513
## 400 Brazil 2021-02-07 3605538 25688 9447165
## 401 Brazil 2021-02-08 3820207 33616 9447165
## 402 Brazil 2021-02-09 4120332 50655 9447165
## 403 Brazil 2021-02-10 4406835 80760 9447165
## vax_case_ratio
## 398 0.3730195
## 399 0.3897215
## 400 0.3816529
## 401 0.4043760
## 402 0.4361448
## 403 0.4664717
# Create a scatter plot of total vaccinations vs total cases
ggplot(covid_filtered, aes(x = total_vaccinations, y = total_cases, color = location)) +
geom_point(alpha = 0.6) +
labs(title = "COVID-19 Vaccinations vs. Confirmed Cases",
x = "Total Vaccinations",
y = "Total Cases",
color = "Country") +
theme_minimal()
## Conclusion and Future Directions
From the analysis, it appears that countries with higher numbers of COVID-19 vaccinations tend to have better control over confirmed COVID-19 cases, although other factors may also influence case numbers. The scatter plot shows the distribution and relative differences among the selected countries.
Future research could explore additional variables such as population size, vaccine types, and timing of vaccine rollout to better understand the relationship between vaccinations and case trends.