The purpose of this analysis is to investigate relationships between COVID-19 cases, deaths, and vaccination coverage using quantitative methods.
For each section:
All variables analyzed are continuous numeric variables.
covid_clean <- covid_clean %>%
mutate(death_to_case_ratio =
new_deaths_smoothed_per_million /
new_cases_smoothed_per_million)The death_to_case_ratio captures severity relative to
infections. While raw death counts increase with cases, this ratio
provides insight into mortality burden conditional on infection
levels.
ggplot(covid_clean,
aes(x = new_cases_smoothed_per_million,
y = new_deaths_smoothed_per_million)) +
geom_point(alpha = 0.3, color = "steelblue") +
geom_smooth(method = "lm", se = TRUE, color = "darkred") +
labs(title = "Relationship Between New COVID Cases and Deaths",
subtitle = "Smoothed values per million population",
x = "New Cases per Million",
y = "New Deaths per Million") +
theme_minimal()The scatterplot reveals a clear positive linear association between new cases and new deaths.
Key observations:
This pattern is epidemiologically logical: deaths occur as a consequence of infections, though healthcare capacity and demographic differences introduce variability.
cor_test_1 <- cor.test(covid_clean$new_cases_smoothed_per_million,
covid_clean$new_deaths_smoothed_per_million)
cor_test_1##
## Pearson's product-moment correlation
##
## data: covid_clean$new_cases_smoothed_per_million and covid_clean$new_deaths_smoothed_per_million
## t = 153.51, df = 41600, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.5951768 0.6074461
## sample estimates:
## cor
## 0.6013469
The Pearson correlation coefficient quantifies the strength and direction of the linear relationship.
If the correlation is strong, it confirms what we visually observe:
deaths increase proportionally with cases.
If moderate, it suggests meaningful variability that warrants further
investigation.
##
## One Sample t-test
##
## data: covid_clean$new_deaths_smoothed_per_million
## t = 123.45, df = 41601, p-value < 2.2e-16
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
## 2.043921 2.109869
## sample estimates:
## mean of x
## 2.076895
Suppose the 95% confidence interval is:
(Lower Bound, Upper Bound)
This means:
We are 95% confident that the true global mean number of new COVID deaths per million people lies between these two values.
Important interpretation elements:
This inference generalizes beyond the observed dataset to the broader global population represented by the data.
vax_gap represents the remaining percentage of a
population that is not fully vaccinated.
This provides a more interpretable measure of how far countries are from complete vaccination coverage.
ggplot(covid_clean,
aes(x = people_fully_vaccinated_per_hundred,
y = vax_gap)) +
geom_point(alpha = 0.3, color = "darkgreen") +
geom_smooth(method = "lm", se = TRUE, color = "black") +
labs(title = "Vaccination Coverage vs Remaining Vaccination Gap",
x = "Fully Vaccinated per Hundred",
y = "Vaccination Gap (%)") +
theme_minimal()The scatterplot shows an almost perfectly straight negative linear relationship.
This is expected because:
vax_gap = 100 − vaccination_rate
Therefore:
Any deviation from a perfect line may reflect reporting inconsistencies or rounding differences.
cor_test_2 <- cor.test(covid_clean$people_fully_vaccinated_per_hundred,
covid_clean$vax_gap)
cor_test_2##
## Pearson's product-moment correlation
##
## data: covid_clean$people_fully_vaccinated_per_hundred and covid_clean$vax_gap
## t = -9678578008, df = 41600, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -1 -1
## sample estimates:
## cor
## -1
Because one variable is a linear transformation of the other, we expect a correlation very close to -1.
A correlation near -1 confirms:
##
## One Sample t-test
##
## data: covid_clean$people_fully_vaccinated_per_hundred
## t = 286.89, df = 41601, p-value < 2.2e-16
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
## 25.22555 25.57260
## sample estimates:
## mean of x
## 25.39907
The 95% confidence interval estimates the true mean percentage of fully vaccinated individuals per hundred people globally.
If the interval is:
(Lower Bound, Upper Bound)
This implies:
We are 95% confident that the true global mean vaccination coverage lies between these values.
If the upper bound is substantially below 100%, it confirms that
global vaccination remains incomplete.
If the interval is wide, this suggests significant inequality in vaccine
distribution across countries.
This inference applies to the broader global population represented in the dataset, not just the observed sample.
This analysis demonstrates: