Introduction

The goal of this project is to explore the relationship between COVID-19 vaccinations and confirmed COVID-19 cases in different countries. My research question is: How does the number of COVID-19 vaccinations administered relate to the number of confirmed COVID-19 cases in different countries?

The dataset I am using is from Our World in Data (https://ourworldindata.org/covid-vaccinations). It includes information on vaccination progress and COVID-19 case numbers for countries around the world. The main variables I am using are:

By using this dataset, I can see how vaccination numbers compare to total COVID-19 cases in countries such as the United States, India, Brazil, Canada, and the United Kingdom.

Data Analysis

In this section, I will explore the relationship between COVID-19 vaccinations and confirmed cases across different countries. I will focus on the United States, India, Brazil, Canada, and the United Kingdom to see if countries with higher vaccination rates tend to have fewer COVID-19 cases.

# Load dplyr and ggplot2 for analysis
library(dplyr)
## Warning: package 'dplyr' was built under R version 4.4.3
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 4.4.3
# Filter data for selected countries and keep only needed columns
selected_countries <- c("United States", "India", "Brazil", "Canada", "United Kingdom")
covid_filtered <- covid_data %>%
filter(location %in% selected_countries) %>%
select(location, date, total_vaccinations, people_fully_vaccinated, total_cases) 

# Remove rows with missing values
covid_filtered <- na.omit(covid_filtered)

# Show summary of key columns
summary(covid_filtered) 
##    location             date           total_vaccinations 
##  Length:4101        Length:4101        Min.   :2.688e+04  
##  Class :character   Class :character   1st Qu.:8.649e+07  
##  Mode  :character   Mode  :character   Median :3.515e+08  
##                                        Mean   :6.774e+08  
##                                        3rd Qu.:6.757e+08  
##                                        Max.   :2.207e+09  
##  people_fully_vaccinated  total_cases       
##  Min.   :        1       Min.   :   495343  
##  1st Qu.: 31450822       1st Qu.:  8618240  
##  Median :148534163       Median : 30543908  
##  Mean   :279473829       Mean   : 32132335  
##  3rd Qu.:230457960       3rd Qu.: 44690492  
##  Max.   :951990527       Max.   :103343569
# Create a new variable for vaccination-to-case ratio
covid_filtered <- covid_filtered %>%
  mutate(vax_case_ratio = total_vaccinations / total_cases)

# Display the first few rows to check the new variable
head(covid_filtered)
##     location       date total_vaccinations people_fully_vaccinated total_cases
## 398   Brazil 2021-02-05            3401383                    1962     9118513
## 399   Brazil 2021-02-06            3553681                   19677     9118513
## 400   Brazil 2021-02-07            3605538                   25688     9447165
## 401   Brazil 2021-02-08            3820207                   33616     9447165
## 402   Brazil 2021-02-09            4120332                   50655     9447165
## 403   Brazil 2021-02-10            4406835                   80760     9447165
##     vax_case_ratio
## 398      0.3730195
## 399      0.3897215
## 400      0.3816529
## 401      0.4043760
## 402      0.4361448
## 403      0.4664717
# Create a scatter plot of total vaccinations vs total cases
ggplot(covid_filtered, aes(x = total_vaccinations, y = total_cases, color = location)) +
geom_point(alpha = 0.6) +
labs(title = "COVID-19 Vaccinations vs. Confirmed Cases",
x = "Total Vaccinations",
y = "Total Cases",
color = "Country") +
theme_minimal() 

## Conclusion and Future Directions

From the analysis, it appears that countries with higher numbers of COVID-19 vaccinations tend to have better control over confirmed COVID-19 cases, although other factors may also influence case numbers. The scatter plot shows the distribution and relative differences among the selected countries.

Future research could explore additional variables such as population size, vaccine types, and timing of vaccine rollout to better understand the relationship between vaccinations and case trends.

References