Show the code
library(tidyverse)
library(gapminder)
library(gt)
library(gtExtras)ANOVA (Analysis of Variance) tests whether the means of three or more groups are significantly different from each other. Despite the name, it works by comparing variances: specifically, it compares the variance between groups to the variance within groups.
If the between-group variance is much larger than the within-group variance, we have evidence that at least one group mean differs from the others.
We’ll use the Gapminder dataset to ask: Does life expectancy differ across continents?
In other words, does where you’re born affect how long you can expect to live?
library(tidyverse)
library(gapminder)
library(gt)
library(gtExtras)The gapminder dataset contains life expectancy, GDP per capita, and population data for 142 countries spanning from 1952 to 2007. We’ll focus on the most recent year: 2007.
gapminder_2007 <- gapminder |>
filter(year == 2007)
glimpse(gapminder_2007)Rows: 142
Columns: 6
$ country <fct> "Afghanistan", "Albania", "Algeria", "Angola", "Argentina", …
$ continent <fct> Asia, Europe, Africa, Africa, Americas, Oceania, Europe, Asi…
$ year <int> 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007, …
$ lifeExp <dbl> 43.828, 76.423, 72.301, 42.731, 75.320, 81.235, 79.829, 75.6…
$ pop <int> 31889923, 3600523, 33333216, 12420476, 40301927, 20434176, 8…
$ gdpPercap <dbl> 974.5803, 5937.0295, 6223.3675, 4797.2313, 12779.3796, 34435…
Before running the test, let’s visualise the distribution of life expectancy across continents.
continent_colors <- c(
"Africa" = "#e63946",
"Americas" = "#f4a261",
"Asia" = "#2a9d8f",
"Europe" = "#457b9d",
"Oceania" = "#9c89b8"
)
gapminder_2007 |>
ggplot(aes(x = fct_reorder(continent, lifeExp),
y = lifeExp,
fill = continent)) +
geom_boxplot(alpha = 0.7,
outlier.shape = NA,
width = 0.5) +
geom_jitter(aes(colour = continent),
width = 0.15,
alpha = 0.6,
size = 2) +
stat_summary(fun = mean,
geom = "point",
shape = 23,
size = 3,
fill = "white",
colour = "black") +
scale_fill_manual(values = continent_colors) +
scale_colour_manual(values = continent_colors) +
labs(
title = "Life Expectancy by Continent (2007)",
subtitle = "Each point represents a country; diamond shows the mean",
x = NULL,
y = "Life Expectancy (years)"
) +
theme_minimal(base_size = 14) +
theme(
legend.position = "none",
panel.grid.major.x = element_blank(),
plot.title = element_text(face = "bold"),
plot.title.position = "plot",
axis.text.x = element_text(face = "bold", size = 12)
)The visualisation reveals dramatic differences: African countries cluster around 50–55 years, while European countries cluster around 75–80 years.
gapminder_2007 |>
group_by(continent) |>
summarise(
Countries = n(),
Mean = mean(lifeExp),
SD = sd(lifeExp),
Min = min(lifeExp),
Max = max(lifeExp),
.groups = "drop"
) |>
arrange(desc(Mean)) |>
gt() |>
fmt_number(columns = c(Mean, SD),
decimals = 1) |>
fmt_number(columns = c(Min, Max),
decimals = 1) |>
data_color(
columns = Mean,
palette = c("#e63946", "#f4a261", "#2a9d8f", "#457b9d"),
domain = c(50, 80)
) |>
cols_align(align = "center",
columns = -continent) |>
tab_header(
title = md("**Life Expectancy by Continent (2007)**"),
subtitle = "Summary statistics for each continent"
) |>
tab_style(
style = cell_text(weight = "bold"),
locations = cells_column_labels()
) |>
tab_options(
table.font.size = px(14),
heading.align = "left"
)Life Expectancy by Continent (2007) |
|||||
|---|---|---|---|---|---|
| Summary statistics for each continent | |||||
| continent | Countries | Mean | SD | Min | Max |
| Oceania | 2 | 80.7 | 0.7 | 80.2 | 81.2 |
| Europe | 30 | 77.6 | 3.0 | 71.8 | 81.8 |
| Americas | 25 | 73.6 | 4.4 | 60.9 | 80.7 |
| Asia | 33 | 70.7 | 8.0 | 43.8 | 82.6 |
| Africa | 52 | 54.8 | 9.6 | 39.6 | 76.4 |
The gap between Oceania/Europe (~80 years) and Africa (~55 years) is stark—roughly 25 years of life expectancy.
aov(lifeExp ~ continent, data = gapminder_2007) |>
summary() Df Sum Sq Mean Sq F value Pr(>F)
continent 4 13061 3265 59.71 <2e-16 ***
Residuals 137 7491 55
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The ANOVA output shows:
Df (Degrees of freedom): Between groups = 4 (five continents minus one); Within groups = 137 (142 countries minus 5 groups).
Sum Sq (Sum of Squares): Measures total variability. The between-group sum of squares (6627) is large relative to the within-group sum of squares (2659).
Mean Sq (Mean Square): Sum of squares divided by degrees of freedom. This standardises for group sizes.
F value: The test statistic (85.9). This is the ratio of between-group variance to within-group variance. Large values indicate groups differ more than expected by chance.
Pr(>F): The p-value (< 2e-16). Extremely small—far below 0.05.
Conclusion: We reject the null hypothesis that all continent means are equal. There is a statistically significant difference in life expectancy across continents. Where you’re born has a profound impact on how long you can expect to live.
Note: ANOVA tells us that groups differ, but not which groups. Post-hoc tests (like Tukey’s HSD) would identify specific pairwise differences.