The purpose of this analysis is to investigate whether COVID-19 case fatality rates differ across continents and whether vaccination coverage helps explain variation in fatality rates.
This data dive includes:
All response variables analyzed are continuous numeric variables.
The response variable selected is
case_fatality_rate.
This variable represents the proportion of confirmed cases that result in death. It is one of the most important indicators of COVID-19 severity and is highly relevant to policymakers and public health officials.
The explanatory variable selected is
continent.
Continent is a categorical variable that may capture structural differences such as healthcare systems, demographic composition, and policy responses.
ggplot(covid_anova,
aes(x = continent,
y = case_fatality_rate,
fill = continent)) +
geom_boxplot(alpha = 0.7) +
labs(title = "Case Fatality Rate by Continent",
x = "Continent",
y = "Case Fatality Rate") +
theme_minimal() +
theme(legend.position = "none")The boxplots show noticeable differences in median fatality rates across continents. Some continents display higher variability and higher median fatality rates.
This suggests that geographic region may influence COVID-19 severity, which we formally test using ANOVA.
## Df Sum Sq Mean Sq F value Pr(>F)
## continent 5 0.61 0.12283 27.41 <2e-16 ***
## Residuals 39573 177.31 0.00448
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
If the p-value is less than 0.05, we reject the null hypothesis and conclude that fatality rates differ significantly by continent.
If the p-value is greater than 0.05, we fail to reject the null hypothesis, meaning there is not enough statistical evidence to conclude that fatality rates differ across continents.
From your ANOVA table:
• The F-statistic is 27.41. • The p-value is less than 0.0000000000000002. • Since the p-value is less than 0.05, we reject the null hypothesis. • There is strong statistical evidence that mean case fatality rates are not equal across continents. • At least one continent has a significantly different mean fatality rate.
The continuous explanatory variable selected is
people_fully_vaccinated_per_hundred.
Vaccination coverage is expected to reduce severe outcomes, so we anticipate a negative linear relationship with case fatality rate.
ggplot(covid_reg,
aes(x = people_fully_vaccinated_per_hundred,
y = case_fatality_rate)) +
geom_point(alpha = 0.4) +
geom_smooth(method = "lm", se = FALSE, color = "blue") +
labs(title = "Fatality Rate vs Vaccination Coverage",
x = "People Fully Vaccinated per Hundred",
y = "Case Fatality Rate") +
theme_minimal()The scatterplot shows a generally linear pattern. As vaccination coverage increases, fatality rates tend to decrease.
This supports fitting a linear regression model.
lm_model <- lm(case_fatality_rate ~ people_fully_vaccinated_per_hundred,
data = covid_reg)
summary(lm_model)##
## Call:
## lm(formula = case_fatality_rate ~ people_fully_vaccinated_per_hundred,
## data = covid_reg)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.03263 -0.02073 -0.01276 0.00157 2.25341
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.263e-02 5.881e-04 55.48 <2e-16 ***
## people_fully_vaccinated_per_hundred -1.868e-04 1.855e-05 -10.07 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.06696 on 39577 degrees of freedom
## Multiple R-squared: 0.002557, Adjusted R-squared: 0.002532
## F-statistic: 101.4 on 1 and 39577 DF, p-value: < 2.2e-16
The regression equation takes the form:
\[ \hat{y} = \beta_0 + \beta_1 x \]
β₀ (Intercept):
Represents the predicted fatality rate when vaccination coverage is
0%.
β₁ (Slope):
Represents the change in fatality rate for each 1% increase in
vaccination coverage.
If β₁ is negative and statistically significant, this indicates that higher vaccination coverage is associated with lower fatality rates.
• When vaccination coverage is 0%, the predicted case fatality rate is 0.03263. • This means the model predicts a fatality rate of 3.263% at 0% vaccination coverage.
If a population had no vaccinated individuals, the expected fatality rate would be about 3.26% according to this model.
• For every 1 percentage point increase in vaccination coverage, the case fatality rate decreases by 0.0001868. • That equals a decrease of 0.01868 percentage points in fatality rate per 1% increase in vaccination.
If vaccination coverage increases by 10%, the predicted fatality rate decreases by: 10 × 0.0001868 = 0.001868
That equals a reduction of 0.1868 percentage points.
The negative slope and p-value < 2e-16 mean:
• Vaccination coverage has a statistically significant negative association with fatality rate.
The R-squared value represents the proportion of variation in case fatality rate explained by vaccination coverage.
If R-squared is modest, this suggests that other variables such as median age, hospital capacity, or GDP per capita may also influence fatality rates.
R-squared (R² = 0.002557)
• Vaccination coverage explains 0.2557% of the variation in case fatality rates. • Over 99.7% of the variation is explained by other factors.
From this analysis:
This data dive demonstrates how ANOVA and linear regression can provide meaningful insight into global public health outcomes.
There is statistical evidence suggesting that both geography and vaccination coverage are important factors in understanding COVID-19 case fatality rates worldwide.
Interpret ANOVA results in context.
If p-value < 0.05: Reject H0 → Fatality rates differ significantly by continent.
If p-value ≥ 0.05: Fail to reject H0 → Not enough evidence to conclude differences exist.
Practical Meaning: Geographic region may influence COVID severity due to healthcare, demographic, or policy differences.
Select a continuous explanatory variable that may influence the response.
Build a linear regression model using the selected continuous predictor.
Interpret regression coefficients in context.
Evaluate model fit.