Bill length predicts body mass (r = 0.60). That’s clear and significant. But add flipper length to the model and the bill-length coefficient collapses — from β = 0.595 to a non-significant β = 0.041. Did bill length suddenly stop mattering? No. Something subtler is happening: bill length’s effect on body mass is being carried by flipper length, not blocked by it.
This post walks through that story in two complementary ways — multiple regression to quantify the coefficient collapse, and mediation analysis to formally decompose why it happens.
# Drop rows with NAs in our three variables, then standardise
penguins_clean <- na.omit(penguins[, c("body_mass_g", "flipper_length_mm", "bill_length_mm")])
penguins_s <- as.data.frame(scale(penguins_clean))Standardising (z-scoring) means every coefficient below is in standard-deviation units, so comparisons across models are apples-to-apples.
Fitting two models isolates the effect:
model_simple <- lm(body_mass_g ~ bill_length_mm, data = penguins_s)
model_multiple <- lm(body_mass_g ~ bill_length_mm + flipper_length_mm, data = penguins_s)
summary(model_simple)##
## Call:
## lm(formula = body_mass_g ~ bill_length_mm, data = penguins_s)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.19724 -0.55736 0.04064 0.57648 2.04109
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.793e-16 4.352e-02 0.00 1
## bill_length_mm 5.951e-01 4.358e-02 13.65 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.8048 on 340 degrees of freedom
## Multiple R-squared: 0.3542, Adjusted R-squared: 0.3523
## F-statistic: 186.4 on 1 and 340 DF, p-value: < 2.2e-16
##
## Call:
## lm(formula = body_mass_g ~ bill_length_mm + flipper_length_mm,
## data = penguins_s)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.35986 -0.35624 -0.04002 0.30448 1.60552
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.003e-15 2.657e-02 0.000 1.000
## bill_length_mm 4.117e-02 3.526e-02 1.168 0.244
## flipper_length_mm 8.442e-01 3.526e-02 23.939 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.4914 on 339 degrees of freedom
## Multiple R-squared: 0.76, Adjusted R-squared: 0.7585
## F-statistic: 536.6 on 2 and 339 DF, p-value: < 2.2e-16
coef_df <- bind_rows(
tidy(model_simple, conf.int = TRUE) %>% mutate(model = "Simple: body_mass ~\nbill_length"),
tidy(model_multiple, conf.int = TRUE) %>% mutate(model = "Multiple: body_mass ~\nbill_length + flipper_length")
) %>%
filter(term == "bill_length_mm") %>%
mutate(
significant = p.value < 0.05,
model = factor(model, levels = c(
"Simple: body_mass ~\nbill_length",
"Multiple: body_mass ~\nbill_length + flipper_length"
)),
label = paste0("β = ", round(estimate, 3),
"\np ", ifelse(significant, "< 0.05", "= 0.332"))
)
ggplot(coef_df, aes(x = model, y = estimate, color = significant)) +
geom_hline(yintercept = 0, linetype = "dashed", color = "grey60", linewidth = 0.7) +
geom_errorbar(aes(ymin = conf.low, ymax = conf.high), width = 0.12, linewidth = 0.9) +
geom_point(size = 5) +
geom_text(aes(label = label), vjust = -2.25, size = 3.8, fontface = "bold", show.legend = FALSE) +
scale_color_manual(
values = c("TRUE" = "#08519c", "FALSE" = "#cb181d"),
labels = c("TRUE" = "Significant", "FALSE" = "Not significant"),
name = NULL
) +
scale_y_continuous(limits = c(-0.15, 0.85)) +
labs(
x = NULL,
y = "Coefficient for bill_length_mm (95% CI)",
title = "Figure 2 — The coefficient collapse",
subtitle = "Bill length's effect on body mass drops from 0.595 to 0.041 once flipper length enters the model",
caption = "Data: palmerpenguins | Standardised variables"
) +
theme_blog +
theme(legend.position = "bottom")Regression can show that the coefficient collapsed. It cannot, on its own, tell us why. That requires thinking about causal structure — which is where mediation comes in.
Baron & Kenny’s framework decomposes the total effect of X → Y into:
| Path | Description |
|---|---|
| c | Total effect of X on Y (no mediator) |
| a | Effect of X on mediator M |
| b | Effect of M on Y, controlling for X |
| c′ | Direct effect of X on Y, controlling for M |
| a × b | Indirect (mediated) effect |
Here: X = bill length · M = flipper length · Y = body mass.
model_c <- lm(body_mass_g ~ bill_length_mm, data = penguins_s) # total
model_a <- lm(flipper_length_mm ~ bill_length_mm, data = penguins_s) # path a
model_bc <- lm(body_mass_g ~ bill_length_mm + flipper_length_mm, data = penguins_s) # paths b & c'a <- coef(model_a)["bill_length_mm"]
b <- coef(model_bc)["flipper_length_mm"]
c <- coef(model_c)["bill_length_mm"]
c_p <- coef(model_bc)["bill_length_mm"]
cat(sprintf(
" path c (total effect, bill → body_mass): %.3f\n
path a (bill_length → flipper_length): %.3f\n
path b (flipper_length → body_mass | bill_length): %.3f\n
path c' (direct: bill_length → body_mass | flipper): %.3f\n
indirect effect a×b (mediated via flipper_length): %.3f\n
proportion mediated (a×b / c): %.1f%%\n",
c, a, b, c_p, a * b, 100 * (a * b) / c
))## path c (total effect, bill → body_mass): 0.595
##
## path a (bill_length → flipper_length): 0.656
##
## path b (flipper_length → body_mass | bill_length): 0.844
##
## path c' (direct: bill_length → body_mass | flipper): 0.041
##
## indirect effect a×b (mediated via flipper_length): 0.554
##
## proportion mediated (a×b / c): 93.1%
93% of bill length’s association with body mass is carried through flipper length. The direct path c′ is near-zero and non-significant — classic full mediation.
a_r <- round(a, 3)
b_r <- round(b, 3)
c_r <- round(c, 3)
cp_r <- round(c_p, 3)
ab_r <- round(a * b, 3)
ggplot() +
xlim(0, 10) + ylim(0, 6) +
coord_fixed() +
# Boxes
annotate("rect", xmin = 0.3, xmax = 2.7, ymin = 2.8, ymax = 4.2,
fill = "#deebf7", color = "#08519c", linewidth = 0.8) +
annotate("rect", xmin = 3.9, xmax = 6.5, ymin = 4.2, ymax = 5.7,
fill = "#feedde", color = "#d94801", linewidth = 0.8) +
annotate("rect", xmin = 7.3, xmax = 9.7, ymin = 2.8, ymax = 4.2,
fill = "#e5f5e0", color = "#238b45", linewidth = 0.8) +
# Labels
annotate("text", x = 1.5, y = 3.7, label = "Bill length", fontface = "bold", size = 4.2) +
annotate("text", x = 1.5, y = 3.2, label = "X", color = "grey40", size = 3.8) +
annotate("text", x = 5.2, y = 5.15, label = "Flipper length", fontface = "bold", size = 4.2) +
annotate("text", x = 5.2, y = 4.62, label = "M (mediator)", color = "grey40", size = 3.8) +
annotate("text", x = 8.5, y = 3.7, label = "Body mass", fontface = "bold", size = 4.2) +
annotate("text", x = 8.5, y = 3.2, label = "Y", color = "grey40", size = 3.8) +
# Arrow a: X → M
annotate("segment", x = 2.7, xend = 3.85, y = 3.9, yend = 4.6,
arrow = arrow(length = unit(0.25, "cm"), type = "closed"),
color = "#d94801", linewidth = 1.1) +
annotate("text", x = 3.1, y = 4.45, label = paste0("a = ", a_r),
color = "#d94801", fontface = "bold", size = 4, hjust = 1) +
# Arrow b: M → Y
annotate("segment", x = 6.55, xend = 7.25, y = 4.6, yend = 3.9,
arrow = arrow(length = unit(0.25, "cm"), type = "closed"),
color = "#d94801", linewidth = 1.1) +
annotate("text", x = 7.1, y = 4.45, label = paste0("b = ", b_r),
color = "#d94801", fontface = "bold", size = 4, hjust = 0) +
# Arrow c': X → Y (direct)
annotate("segment", x = 2.75, xend = 7.25, y = 3.1, yend = 3.1,
arrow = arrow(length = unit(0.25, "cm"), type = "closed"),
color = "#238b45", linewidth = 1.1) +
annotate("text", x = 5.0, y = 2.75,
label = paste0("c' = ", cp_r, " (direct effect, n.s.)"),
color = "#238b45", fontface = "bold", size = 3.8) +
# Summary text
annotate("text", x = 5.0, y = 1.9,
label = paste0("Indirect effect (a \u00d7 b) = ", a_r, " \u00d7 ", b_r, " = ", ab_r),
color = "#d94801", fontface = "bold", size = 4.2) +
annotate("text", x = 5.0, y = 1.35,
label = paste0("Total effect (c) = ", c_r,
" | Proportion mediated = ",
round(100 * (a * b) / c, 1), "%"),
color = "grey30", size = 3.8) +
labs(
title = "Mediation path diagram with coefficients",
subtitle = "93% of bill length's effect on body mass is carried through flipper length",
caption = "Data: palmerpenguins | Standardised variables"
) +
theme_void(base_size = 13) +
theme(
plot.title = element_text(face = "bold", size = 15, margin = margin(b = 6)),
plot.subtitle = element_text(color = "grey40", size = 11, margin = margin(b = 4)),
plot.caption = element_text(color = "grey50", size = 9, margin = margin(t = 8)),
plot.background = element_rect(fill = "white", color = NA),
plot.margin = margin(15, 15, 15, 15)
)The {mediation} package tests the indirect effect (a × b) via bootstrapping — no distributional assumptions required.
set.seed(42)
med_out <- mediate(
model.m = model_a,
model.y = model_bc,
treat = "bill_length_mm",
mediator = "flipper_length_mm",
boot = TRUE,
sims = 1000
)
summary(med_out)##
## Causal Mediation Analysis
##
## Nonparametric Bootstrap Confidence Intervals with the Percentile Method
##
## Estimate 95% CI Lower 95% CI Upper p-value
## ACME 0.553939 0.472349 0.637712 <2e-16 ***
## ADE 0.041170 -0.023057 0.104997 0.214
## Total Effect 0.595110 0.508056 0.692992 <2e-16 ***
## Prop. Mediated 0.930819 0.832400 1.040169 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Sample Size Used: 342
##
##
## Simulations: 1000
ACME (Average Causal Mediation Effect) = the indirect effect (a × b): how much of bill length’s effect on body mass travels through flipper length.
ADE (Average Direct Effect) = the direct effect (c′).
Total Effect = ACME + ADE = path c.
The bootstrap 95% CI for ACME excludes zero, confirming that the mediated path is statistically reliable.
Partial regression plots make the story visual. Each plot shows what’s left of a predictor — and of body mass — after removing the other predictor’s linear influence.
# Residualise bill and body mass after removing flipper's effect
resid_bill_on_flipper <- residuals(lm(bill_length_mm ~ flipper_length_mm, data = penguins_s))
resid_body_on_flipper <- residuals(lm(body_mass_g ~ flipper_length_mm, data = penguins_s))
# Residualise flipper and body mass after removing bill's effect
resid_flip_on_bill <- residuals(lm(flipper_length_mm ~ bill_length_mm, data = penguins_s))
resid_body_on_bill <- residuals(lm(body_mass_g ~ bill_length_mm, data = penguins_s))
partial_df <- tibble(
resid_bill = resid_bill_on_flipper,
resid_flip = resid_flip_on_bill,
resid_bm_1 = resid_body_on_flipper,
resid_bm_2 = resid_body_on_bill
)
slope_bill <- round(coef(lm(resid_bm_1 ~ resid_bill, data = partial_df))["resid_bill"], 3)
slope_flip <- round(coef(lm(resid_bm_2 ~ resid_flip, data = partial_df))["resid_flip"], 3)
p_bill <- ggplot(partial_df, aes(x = resid_bill, y = resid_bm_1)) +
geom_point(alpha = 0.4, color = "#cb181d", size = 1.8) +
geom_smooth(method = "lm", se = TRUE, color = "#99000d", linewidth = 0.9) +
annotate("text", x = Inf, y = Inf, hjust = 1.1, vjust = 1.5,
label = paste0("Partial slope = ", slope_bill, " (n.s.)"),
color = "#99000d", fontface = "bold", size = 4) +
labs(
x = "Bill length | Flipper length (residuals)",
y = "Body mass | Flipper length (residuals)",
title = "Bill length \u2192 Body mass",
subtitle = "After removing flipper length from both variables"
) +
theme_blog
p_flip <- ggplot(partial_df, aes(x = resid_flip, y = resid_bm_2)) +
geom_point(alpha = 0.4, color = "#238b45", size = 1.8) +
geom_smooth(method = "lm", se = TRUE, color = "#006d2c", linewidth = 0.9) +
annotate("text", x = Inf, y = Inf, hjust = 1.1, vjust = 1.5,
label = paste0("Partial slope = ", slope_flip, " ***"),
color = "#006d2c", fontface = "bold", size = 4) +
labs(
x = "Flipper length | Bill length (residuals)",
y = "Body mass | Bill length (residuals)",
title = "Flipper length \u2192 Body mass",
subtitle = "After removing bill length from both variables"
) +
theme_blog
p_bill + p_flip +
plot_annotation(
title = "Partial regression plots",
subtitle = "Left: bill length has no unique effect on body mass once flipper is accounted for\nRight: flipper length retains a strong unique effect on body mass",
caption = "Data: palmerpenguins | Standardised variables",
theme = theme(
plot.title = element_text(face = "bold", size = 14),
plot.subtitle = element_text(color = "grey40", size = 11),
plot.background = element_rect(fill = "white", color = NA)
)
)The left panel is essentially a scatter of noise — confirming that bill length has no unique variance to offer once flipper length has been accounted for. The right panel shows a strong, tight relationship — flipper length is doing the real work.
| Simple regression | Multiple regression | Mediation | |
|---|---|---|---|
| Bill length β | 0.595 ✓ | 0.041 ✗ | c′ = 0.041 (direct, n.s.) |
| Flipper length β | — | 0.844 ✓ | b = 0.844 |
| Indirect effect | — | — | a × b = 0.554 ✓ |
| Proportion mediated | — | — | 93.1% |
Multiple regression tells you the coefficient collapses; mediation analysis tells you the effect didn’t disappear — it just got rerouted. Bill length predicts flipper length, which in turn drives body mass. The indirect path accounts for 93% of the total effect, and the bootstrap confidence interval confirms it is statistically significant.
In plain language: bigger-billed penguins tend to have longer flippers, and longer flippers predict heavier bodies. Once you know a penguin’s flipper length, knowing the bill length adds very little extra predictive value for body mass.
Analysis by Rahul Venugopal | Data: {palmerpenguins} | All variables standardised