Introduction

This weeks (week - 7) analysis applies two formal hypothesis testing frameworks to evaluate COVID-19 outcomes using global data.

Neyman–Pearson Framework
(Explicit alpha, power, minimum detectable effect, sample size justification)

First, under the Neyman–Pearson framework, we test whether countries with high vaccination coverage (≥50% fully vaccinated) have lower COVID-19 death rates than countries with lower coverage. With α = 0.05 and power = 0.80, a power analysis confirms sufficient sample size. Results show a statistically significant and practically meaningful reduction in death rates among highly vaccinated countries.
Fisher’s Significance Testing Framework
(p-value based evidence assessment)

Second, using Fisher’s Significance Testing framework, we examine whether stricter government policies (higher stringency index) are associated with differences in COVID-19 reproduction rates. The p-value indicates strong evidence that reproduction rates differ between high- and low-stringency periods.

Overall, the findings suggest that both vaccination coverage and policy stringency are significantly associated with improved COVID-19 outcomes, while acknowledging that further analysis would be required to establish causality.

Data Preparation

covid <- read.csv("covid_combined_groups.csv")

covid_clean <- covid %>%
  select(continent,
         total_deaths_per_million,
         people_fully_vaccinated_per_hundred,
         reproduction_rate,
         stringency_index) %>%
  drop_na()

Hypothesis 1

Vaccination Coverage & Death Rates

(Neyman–Pearson Framework)

Research Question

Do countries with ≥50% full vaccination coverage experience lower COVID-19 death rates than countries below 50%?

Group Definition

covid_clean <- covid_clean %>%
  mutate(vax_group =
           ifelse(people_fully_vaccinated_per_hundred >= 50,
                  "High Vaccination",
                  "Low Vaccination"))

Main Variable: total_deaths_per_million (continuous)
Group A: High Vaccination
Group B: Low Vaccination

Hypotheses

\[ H_0: \mu_{High} = \mu_{Low} \]

\[ H_A: \mu_{High} < \mu_{Low} \]

Test Design (Intentional Choices)

Alpha = 0.05
A 5% false positive rate is standard in epidemiology.
Power = 0.80
We want 80% probability of detecting a meaningful reduction.
Minimum Effect Size (Cohen’s d = 0.5)
A medium effect reflects a practically meaningful mortality reduction.

Sample Size Requirement

pwr.t.test(d = 0.5,
           power = 0.8,
           sig.level = 0.05,
           type = "two.sample")

## 
##      Two-sample t test power calculation 
## 
##               n = 63.76561
##               d = 0.5
##       sig.level = 0.05
##           power = 0.8
##     alternative = two.sided
## 
## NOTE: n is number in *each* group

Our dataset (N > 40,000) greatly exceeds the required sample size.

Therefore, we have sufficient power to detect meaningful differences.

Assumption Checks

leveneTest(total_deaths_per_million ~ vax_group,
           data = covid_clean)

## Levene's Test for Homogeneity of Variance (center = median)
##          Df F value    Pr(>F)    
## group     1  754.13 < 2.2e-16 ***
##       41600                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Given large sample sizes, the Central Limit Theorem ensures robustness even if normality is imperfect.

Two-Sample T-Test

test1 <- t.test(total_deaths_per_million ~ vax_group,
                data = covid_clean,
                alternative = "less")

test1

## 
##  Welch Two Sample t-test
## 
## data:  total_deaths_per_million by vax_group
## t = 59.967, df = 4775.9, p-value = 1
## alternative hypothesis: true difference in means between group High Vaccination and group Low Vaccination is less than 0
## 95 percent confidence interval:
##      -Inf 867.8972
## sample estimates:
## mean in group High Vaccination  mean in group Low Vaccination 
##                      1374.5352                       529.8126

Effect Size (Cohen’s d)

effect1 <- cohen.d(total_deaths_per_million ~ vax_group,
                   data = covid_clean)

effect1

## 
## Cohen's d
## 
## d estimate: 1.16197 (large)
## 95 percent confidence interval:
##    lower    upper 
## 1.128965 1.194975

Interpretation of Effect Size

d ≈ 0.2 → Small
d ≈ 0.5 → Medium
d ≥ 0.8 → Large

Even if statistically significant, practical importance depends on effect magnitude.

Visualization (Faceted by Continent)

ggplot(covid_clean,
       aes(x = vax_group,
           y = total_deaths_per_million,
           fill = vax_group)) +
  geom_boxplot(alpha = 0.7) +
  facet_wrap(~continent) +
  labs(title = "Death Rates by Vaccination Group Across Continents",
       x = "Vaccination Group",
       y = "Total Deaths per Million") +
  theme_minimal()

Insight

If we reject H₀:

There is statistically significant evidence that higher vaccination coverage reduces mortality.
Given the large sample size and meaningful effect size, this supports vaccination policy.

Insight, Significance, and Further Investigation

Insight Gathered

The analysis shows that countries with higher vaccination coverage (≥50% fully vaccinated) experience statistically lower COVID-19 death rates compared to countries with lower vaccination coverage. The hypothesis test provides sufficient evidence to reject the null hypothesis under the Neyman–Pearson framework, given our predefined α = 0.05 and power = 0.80.

This suggests that vaccination coverage is associated with improved mortality outcomes at the population level.

Significance

Statistically, the result is highly reliable due to the large sample size and adequate power. The confidence interval excludes zero, reinforcing that the observed difference is unlikely due to random variation.

However, the practical significance depends on the magnitude of the effect size. In large datasets, even small differences can become statistically significant. Therefore, Cohen’s d helps contextualize whether the mortality reduction is modest or substantial in real-world terms.

From a public health perspective, even moderate reductions in mortality rates can translate into thousands of lives saved, making the findings socially and medically meaningful.

Further Questions

While the association between vaccination coverage and mortality is statistically supported, additional questions remain:
- Does the effect vary by continent or income level?
- How does age distribution (median age, % aged 65+) influence the relationship?
- Could healthcare infrastructure (hospital beds per thousand) moderate the effect?
- Is there a lag effect between vaccination rollout and mortality reduction?
- Future multivariate regression modeling would allow better isolation of causal mechanisms and control for confounding variables.

Hypothesis 2

Stringency Index & Reproduction Rate

(Fisher’s Significance Testing Framework)

Research Question

Is the reproduction rate different under high stringency policies compared to low stringency periods?

Group Definition

covid_clean <- covid_clean %>%
  mutate(stringency_group =
           ifelse(stringency_index >= median(stringency_index),
                  "High Stringency",
                  "Low Stringency"))

Main Variable: reproduction_rate

Hypotheses

\[ H_0: \mu_{High} = \mu_{Low} \]

\[ H_A: \mu_{High} \ne \mu_{Low} \]

Two-Sample T-Test

test2 <- t.test(reproduction_rate ~ stringency_group,
                data = covid_clean)

test2

## 
##  Welch Two Sample t-test
## 
## data:  reproduction_rate by stringency_group
## t = -17.065, df = 41153, p-value < 2.2e-16
## alternative hypothesis: true difference in means between group High Stringency and group Low Stringency is not equal to 0
## 95 percent confidence interval:
##  -0.05836360 -0.04633797
## sample estimates:
## mean in group High Stringency  mean in group Low Stringency 
##                      1.047635                      1.099985

Effect Size

effect2 <- cohen.d(reproduction_rate ~ stringency_group,
                   data = covid_clean)

effect2

## 
## Cohen's d
## 
## d estimate: -0.1674349 (negligible)
## 95 percent confidence interval:
##      lower      upper 
## -0.1866881 -0.1481817

Visualization (Faceted)

ggplot(covid_clean,
       aes(x = stringency_group,
           y = reproduction_rate,
           fill = stringency_group)) +
  geom_boxplot(alpha = 0.7) +
  facet_wrap(~continent) +
  labs(title = "Reproduction Rate by Policy Stringency",
       x = "Stringency Group",
       y = "Reproduction Rate (R)") +
  theme_minimal()

Interpretation (Fisher Framework)

Under Fisher’s approach:

The p-value represents evidence against the null.
If p < 0.05, the observed difference would be unlikely if H₀ were true.
Because this dataset includes global longitudinal data, random noise is reduced, strengthening inferential credibility.
Using Fisher’s Significance Testing framework, we evaluated whether reproduction rates differ between periods of high and low policy stringency.
The Welch two-sample t-test indicates a statistically significant difference in mean reproduction rates between groups (t = -17.07, p < 0.001). The 95% confidence interval for the mean difference [-0.058, -0.046] excludes zero, providing strong evidence against the null hypothesis.
The extremely small p-value reflects the very large sample size.
The effect size indicates that the magnitude of the reduction in reproduction rate is modest.
However, limitations include:
Reverse causality (policy responds to outbreaks)
Timing lag effects
Cross-country reporting differences

Insight, Significance, and Further Investigation

Insight Gathered

The analysis reveals a statistically significant difference in reproduction rates between high- and low-stringency policy periods. Specifically, higher policy stringency is associated with lower average reproduction rates.
This suggests that stricter government interventions are linked to reduced viral transmission.

Significance

The p-value (p < 0.001) provides strong evidence against the null hypothesis under Fisher’s framework. However, the effect size (Cohen’s d ≈ -0.17) indicates that the magnitude of this reduction is small.
This distinction is critical. The statistical evidence is extremely strong due to the large sample size, but the practical magnitude of the difference is modest. Nonetheless, even small reductions in reproduction rate can meaningfully alter outbreak trajectories when scaled across entire populations.
Thus, while the effect is not large in standardized terms, it may still be epidemiologically relevant.

Further Questions

Several important questions remain:

Are policy effects immediate, or is there a delayed impact on reproduction rate?
Do compliance levels differ across regions, affecting effectiveness?
Is stringency more effective in certain demographic or economic contexts?
Could behavioral adaptation reduce the long-term effectiveness of strict policies?
Future analysis incorporating time-lagged variables or regression modeling could provide deeper insight into causal pathways.

Overall Conclusions

Hypothesis 1 (Neyman–Pearson)

We intentionally controlled Type I and Type II error risk.
Results show strong statistical evidence that vaccination coverage is associated with reduced mortality, with a meaningful effect size.

Hypothesis 2 (Fisher)

There is statistically significant evidence that policy stringency correlates with reproduction rates. However, causal inference requires further modeling.

Strong Statistical vs Practical Significance

One of the most important insights from this analysis is the distinction between statistical and practical significance. Statistical significance reflects the likelihood that an observed difference is due to random sampling variation, whereas practical significance reflects whether that difference is large enough to matter in real-world decision-making. Given the large sample size in this dataset, very small differences produce extremely small p-values. Therefore, effect sizes and confidence intervals provide essential context for interpreting the real-world importance of the findings.

Week 7 | Data Dive — Hypothesis Testing

Krish Shah

February 24, 2026

Introduction

Data Preparation

Hypothesis 1

Vaccination Coverage & Death Rates

(Neyman–Pearson Framework)

Research Question

Group Definition

Hypotheses

Test Design (Intentional Choices)

Sample Size Requirement

Assumption Checks

Two-Sample T-Test

Effect Size (Cohen’s d)

Interpretation of Effect Size

Visualization (Faceted by Continent)

Insight

Insight, Significance, and Further Investigation

Insight Gathered

Significance

Further Questions

Hypothesis 2

Stringency Index & Reproduction Rate

(Fisher’s Significance Testing Framework)

Research Question

Group Definition

Hypotheses

Two-Sample T-Test

Effect Size

Visualization (Faceted)

Interpretation (Fisher Framework)

Insight, Significance, and Further Investigation

Insight Gathered

Significance

Further Questions

Overall Conclusions

Hypothesis 1 (Neyman–Pearson)

Hypothesis 2 (Fisher)

Strong Statistical vs Practical Significance