In Tuesday’s lecture, we went from one predictor to many. The key idea: each coefficient in a multivariate model describes an association holding everything else constant — comparing people who are similar on all the other variables in the model.
Today, you’ll build up a multivariate model step by step using the CCHS data, practice distinguishing sliders from switches, translate coefficients into substantive, real-world quantities, and apply the interpretation checklist to a published regression table.
library(tidyverse)
library(broom)
cchs <- read_csv("data/cchs_teaching.csv")
Same reverse-coding as last practice, plus we’ll prepare a few categorical variables with informative labels.
cchs <- cchs %>%
mutate(
# Reverse-code so higher = better
mental_health = 6 - GEN_05,
general_health = 6 - GEN_01,
low_stress = 6 - GEN_10,
belonging = 5 - GEN_20,
life_satisfaction = LSM_01,
# Label sex
sex = ifelse(DHH_SEX == 1, "Male", "Female"),
# Label age groups
age_group = case_when(
DHHGAGE == 1 ~ "12-17",
DHHGAGE == 2 ~ "18-34",
DHHGAGE == 3 ~ "35-49",
DHHGAGE == 4 ~ "50-64",
DHHGAGE == 5 ~ "65+"
),
age_group = factor(age_group, levels = c("18-34", "35-49", "50-64", "65+")),
# Label income groups
income_group = case_when(
INCDGHH == 1 ~ "< $20k",
INCDGHH == 2 ~ "$20-39k",
INCDGHH == 3 ~ "$40-59k",
INCDGHH == 4 ~ "$60-79k",
INCDGHH == 5 ~ "$80k+"
),
income_group = factor(income_group,
levels = c("< $20k", "$20-39k", "$40-59k", "$60-79k", "$80k+")),
# Label food security
food_security = case_when(
FSCDVHF2 == 0 ~ "Secure",
FSCDVHF2 == 1 ~ "Marginally insecure",
FSCDVHF2 == 2 ~ "Moderately insecure",
FSCDVHF2 == 3 ~ "Severely insecure"
),
food_security = factor(food_security,
levels = c("Secure", "Marginally insecure",
"Moderately insecure", "Severely insecure"))
)
We’ll filter to adults (18+) with valid responses on the variables we need.
analysis_data <- cchs %>%
filter(
life_satisfaction <= 10,
belonging >= 1 & belonging <= 4,
mental_health >= 1 & mental_health <= 5,
!is.na(sex),
age_group != "12-17",
!is.na(income_group),
!is.na(food_security)
)
cat("Analysis sample:", nrow(analysis_data), "adults\n")
## Analysis sample: 60069 adults
Last practice, we found that stronger community belonging is associated with higher life satisfaction. Let’s fit that model again as our baseline.
m1 <- lm(life_satisfaction ~ belonging, data = analysis_data)
tidy(m1) %>%
mutate(across(where(is.numeric), ~ round(.x, 4)))
## # A tibble: 2 × 5
## term estimate std.error statistic p.value
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercept) 5.10 0.0249 205. 0
## 2 belonging 0.884 0.0087 101. 0
cat("R-squared:", round(summary(m1)$r.squared, 4), "\n")
## R-squared: 0.1461
But belonging isn’t the only thing that matters. What if people with better mental health also feel more belonging and report higher satisfaction? Mental health could be a confounder. Let’s add it.
m2 <- lm(life_satisfaction ~ belonging + mental_health, data = analysis_data)
tidy(m2) %>%
mutate(across(where(is.numeric), ~ round(.x, 4)))
## # A tibble: 3 × 5
## term estimate std.error statistic p.value
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercept) 2.88 0.0276 104. 0
## 2 belonging 0.562 0.008 69.8 0
## 3 mental_health 0.864 0.0065 132. 0
cat("R-squared:", round(summary(m2)$r.squared, 4), "\n")
## R-squared: 0.3391
cat("Belonging coefficient in Model 1:", round(coef(m1)[2], 3), "\n")
## Belonging coefficient in Model 1: 0.884
cat("Belonging coefficient in Model 2:", round(coef(m2)[2], 3), "\n")
## Belonging coefficient in Model 2: 0.562
cat("Change:", round(coef(m2)[2] - coef(m1)[2], 3), "\n")
## Change: -0.322
ANSWER KEY
The belonging coefficient shrank substantially, from 0.884 in Model 1 to 0.562 in Model 2 — a drop of 0.322, or about 36%. It got smaller when mental health was added.
Part of what looked like a “belonging” pattern was actually mental health. In Model 1, the belonging coefficient was absorbing the influence of mental health — people with stronger belonging also tend to have better mental health, and both are associated with higher life satisfaction. Without mental health in the model, belonging’s coefficient was inflated because it was picking up mental health’s contribution too. This is exactly the confounder logic from lecture: mental health influences both belonging and life satisfaction, so leaving it out biased the belonging coefficient upward.
“Holding mental health constant, a one-unit increase in community belonging (e.g., from ‘Somewhat Weak’ to ‘Somewhat Strong’) is associated with a 0.562-point increase in life satisfaction on the 0–10 scale, on average.” This comes from the
belongingrow in the Model 2 output, where the estimate is 0.562.R-squared jumped from 0.146 to 0.339. This means Model 1 explained about 15% of the variation in life satisfaction, while Model 2 explains about 34%. Mental health is a powerful predictor — adding it nearly doubled the explained variation. The remaining 66% is still unexplained, meaning many other factors (income, relationships, health, life events) affect life satisfaction beyond belonging and mental health.
Now let’s build a fuller model, adding predictors one at a time — watching what changes at each step, just like we did in lecture with the GPA example.
m3 <- lm(life_satisfaction ~ belonging + mental_health + sex, data = analysis_data)
tidy(m3) %>%
mutate(
across(c(estimate, std.error, statistic), ~ round(.x, 4)),
p.value = ifelse(p.value < 0.0001, "< 0.0001", as.character(round(p.value, 4)))
)
## # A tibble: 4 × 5
## term estimate std.error statistic p.value
## <chr> <dbl> <dbl> <dbl> <chr>
## 1 (Intercept) 2.93 0.0279 105. < 0.0001
## 2 belonging 0.560 0.008 69.6 < 0.0001
## 3 mental_health 0.870 0.0065 133. < 0.0001
## 4 sexMale -0.162 0.0129 -12.5 < 0.0001
m4 <- lm(life_satisfaction ~ belonging + mental_health + sex + income_group,
data = analysis_data)
tidy(m4) %>%
mutate(
across(c(estimate, std.error, statistic), ~ round(.x, 4)),
p.value = ifelse(p.value < 0.0001, "< 0.0001", as.character(round(p.value, 4)))
)
## # A tibble: 8 × 5
## term estimate std.error statistic p.value
## <chr> <dbl> <dbl> <dbl> <chr>
## 1 (Intercept) 2.49 0.0432 57.7 < 0.0001
## 2 belonging 0.560 0.008 69.8 < 0.0001
## 3 mental_health 0.86 0.0065 132. < 0.0001
## 4 sexMale -0.174 0.0129 -13.5 < 0.0001
## 5 income_group$20-39k 0.352 0.0394 8.93 < 0.0001
## 6 income_group$40-59k 0.394 0.039 10.1 < 0.0001
## 7 income_group$60-79k 0.439 0.0395 11.1 < 0.0001
## 8 income_group$80k+ 0.581 0.0364 16.0 < 0.0001
m5 <- lm(life_satisfaction ~ belonging + mental_health + sex + income_group + age_group,
data = analysis_data)
tidy(m5) %>%
mutate(
across(c(estimate, std.error, statistic), ~ round(.x, 4)),
p.value = ifelse(p.value < 0.0001, "< 0.0001", as.character(round(p.value, 4)))
)
## # A tibble: 11 × 5
## term estimate std.error statistic p.value
## <chr> <dbl> <dbl> <dbl> <chr>
## 1 (Intercept) 2.61 0.0449 58.2 < 0.0001
## 2 belonging 0.554 0.008 68.9 < 0.0001
## 3 mental_health 0.857 0.0066 130. < 0.0001
## 4 sexMale -0.176 0.0129 -13.7 < 0.0001
## 5 income_group$20-39k 0.301 0.0397 7.58 < 0.0001
## 6 income_group$40-59k 0.357 0.0392 9.12 < 0.0001
## 7 income_group$60-79k 0.414 0.0396 10.5 < 0.0001
## 8 income_group$80k+ 0.579 0.0365 15.9 < 0.0001
## 9 age_group35-49 -0.167 0.0213 -7.85 < 0.0001
## 10 age_group50-64 -0.174 0.0203 -8.59 < 0.0001
## 11 age_group65+ -0.001 0.0198 -0.052 0.9585
Notice something interesting: the age_group65+
coefficient is essentially zero (−0.001, p = 0.96). After adjusting for
mental health, belonging, income, and sex, adults aged 65 and older are
no less satisfied than 18–34 year-olds. The raw age
differences we might have expected largely disappear once we account for
these other factors.
glance(m5) %>%
select(r.squared, adj.r.squared, sigma, nobs) %>%
mutate(across(where(is.numeric), ~ round(.x, 4)))
## # A tibble: 1 × 4
## r.squared adj.r.squared sigma nobs
## <dbl> <dbl> <dbl> <dbl>
## 1 0.347 0.347 1.56 60069
glance() gives us model-level statistics in one tidy
row. r.squared is the proportion of variation explained,
sigma is the residual standard error (roughly, how far off
the model’s predictions are on average), and nobs is the
sample size.
# Extract key info from each model
models <- list(m1, m2, m3, m4, m5)
model_names <- paste("Model", 1:5)
comparison <- tibble(
Model = model_names,
Belonging = map_dbl(models, ~ round(coef(.x)["belonging"], 3)),
R_squared = map_dbl(models, ~ round(summary(.x)$r.squared, 4)),
N = map_dbl(models, nobs)
)
comparison
## # A tibble: 5 × 4
## Model Belonging R_squared N
## <chr> <dbl> <dbl> <dbl>
## 1 Model 1 0.884 0.146 60069
## 2 Model 2 0.562 0.339 60069
## 3 Model 3 0.56 0.341 60069
## 4 Model 4 0.56 0.345 60069
## 5 Model 5 0.554 0.347 60069
ANSWER KEY
The belonging coefficient shrank dramatically from Model 1 (0.884) to Model 2 (0.562) when mental health was added. After that, it barely changed: 0.560 in Model 3, 0.560 in Model 4, 0.554 in Model 5. The big drop tells us that a substantial portion of belonging’s apparent effect was really mental health acting as a confounder. But the stability from Model 2 onward tells us something important: the remaining association of about 0.55 per unit is robust. It doesn’t go away when we add sex, income, or age. This suggests belonging has a genuine, independent association with life satisfaction that isn’t explained away by these other factors.
The biggest jump in R-squared was from Model 1 (0.146) to Model 2 (0.339) — adding mental health nearly doubled the explained variation. Mental health explains so much additional variation because it is strongly associated with life satisfaction (as we saw in last week’s practice, r = 0.54) and captures something quite different from belonging. The other additions (sex, income, age) each added only small increments, moving R-squared from 0.339 to 0.347. This tells us that mental health is by far the most powerful individual predictor in this model.
The sex coefficient is a switch (categorical predictor). The reference group is Female (because R uses alphabetical order, and “Female” comes before “Male”). The coefficient of −0.162 means: “Holding belonging and mental health constant, men report life satisfaction about 0.16 points lower than women, on average, on the 0–10 scale.” This comes from the
sexMalerow in Model 3.There are four income rows because income has five categories, and R creates one coefficient for each non-reference category. The reference category is < $20k (the first level we set in the factor). The coefficient for
income_group$80k+is 0.581, which means: “Holding belonging, mental health, and sex constant, adults in households earning $80k or more report life satisfaction about 0.58 points higher than those in households earning less than $20k, on average.” Notice the income coefficients increase monotonically (0.352, 0.394, 0.439, 0.581) — a clear gradient where higher income groups report progressively higher satisfaction compared to the lowest group.
The distinction between sliders and switches matters because you cannot compare their coefficients at face value. A slider of 0.5 and a switch of 0.5 mean very different things if the slider spans a 10-point range.
cat("=== Model 5 Predictors ===\n\n")
## === Model 5 Predictors ===
cat("SLIDERS (continuous/ordinal):\n")
## SLIDERS (continuous/ordinal):
cat(" belonging: 1 (Very Weak) to 4 (Very Strong) — range = 3\n")
## belonging: 1 (Very Weak) to 4 (Very Strong) — range = 3
cat(" mental_health: 1 (Poor) to 5 (Excellent) — range = 4\n\n")
## mental_health: 1 (Poor) to 5 (Excellent) — range = 4
cat("SWITCHES (categorical):\n")
## SWITCHES (categorical):
cat(" sex: Male vs. Female (reference)\n")
## sex: Male vs. Female (reference)
cat(" income_group: 4 groups vs. < $20k (reference)\n")
## income_group: 4 groups vs. < $20k (reference)
cat(" age_group: 3 groups vs. 18-34 (reference)\n")
## age_group: 3 groups vs. 18-34 (reference)
From lecture:
“Holding everything else constant, a one-unit increase in [X] is associated with a b-hat change in [Y], on average.”
“Holding everything else constant, [Y] is b-hat units higher/lower in [category] vs. [reference], on average.”
Using the output from Model 5, write a full interpretation sentence for each of the following. Use the correct template (slider or switch) and name the units.
belonging coefficientmental_health coefficientsexMale coefficientincome_group$80k+ coefficient (vs. the
reference)age_group65+ coefficient (vs. the reference) — pay
special attention here. What does it mean when a coefficient is
essentially zero and the p-value is very large?ANSWER KEY
Belonging (slider): “Holding mental health, sex, income, and age constant, a one-unit increase in community belonging (e.g., from ‘Somewhat Weak’ to ‘Somewhat Strong’) is associated with a 0.554-point increase in life satisfaction on the 0–10 scale, on average.” (From the
belongingrow: estimate = 0.554.)Mental health (slider): “Holding belonging, sex, income, and age constant, a one-unit increase in self-rated mental health (e.g., from ‘Good’ to ‘Very Good’) is associated with a 0.857-point increase in life satisfaction on the 0–10 scale, on average.” (From the
mental_healthrow: estimate = 0.857.)Sex (switch): “Holding belonging, mental health, income, and age constant, men report life satisfaction about 0.176 points lower than women, on average, on the 0–10 scale.” (From the
sexMalerow: estimate = −0.176. The reference group is Female.)Income $80k+ (switch): “Holding belonging, mental health, sex, and age constant, adults in households earning $80,000 or more report life satisfaction about 0.579 points higher than those in households earning less than $20,000, on average, on the 0–10 scale.” (From the
income_group$80k+row: estimate = 0.579. The reference group is < $20k.)Age 65+ (switch): “Holding belonging, mental health, sex, and income constant, adults aged 65 and older report life satisfaction that is essentially no different from adults aged 18–34 (coefficient = −0.001, p = 0.96).” The coefficient is so close to zero that it is indistinguishable from no difference at all, and the very large p-value (0.96) confirms there is no evidence of a gap. This is substantively interesting: it means the raw age differences in life satisfaction that we might observe are not about age itself — they are explained by the other variables in the model (mental health, belonging, income, sex). Once you compare older and younger adults who are similar on these factors, age doesn’t matter.
From lecture: coefficients only matter when you can feel the size. A slope of 0.5 means nothing until you know how far people actually vary on that predictor.
m5_coefs <- coef(m5)
# Belonging: range is 1 to 4 (3 units)
belonging_range <- 3
belonging_effect <- round(m5_coefs["belonging"] * belonging_range, 2)
# Mental health: range is 1 to 5 (4 units)
mh_range <- 4
mh_effect <- round(m5_coefs["mental_health"] * mh_range, 2)
cat("=== Full-Range Effects ===\n\n")
## === Full-Range Effects ===
cat("Belonging (1 to 4, range = 3):\n")
## Belonging (1 to 4, range = 3):
cat(" 3 ×", round(m5_coefs["belonging"], 3), "=", belonging_effect, "points on life satisfaction\n\n")
## 3 × 0.554 = 1.66 points on life satisfaction
cat("Mental health (1 to 5, range = 4):\n")
## Mental health (1 to 5, range = 4):
cat(" 4 ×", round(m5_coefs["mental_health"], 3), "=", mh_effect, "points on life satisfaction\n\n")
## 4 × 0.857 = 3.43 points on life satisfaction
sd_ls <- round(sd(analysis_data$life_satisfaction), 2)
cat("SD of life satisfaction:", sd_ls, "\n\n")
## SD of life satisfaction: 1.93
cat("Belonging full-range effect as fraction of SD:",
round(belonging_effect / sd_ls, 2), "SD\n")
## Belonging full-range effect as fraction of SD: 0.86 SD
cat("Mental health full-range effect as fraction of SD:",
round(mh_effect / sd_ls, 2), "SD\n")
## Mental health full-range effect as fraction of SD: 1.78 SD
What does the model predict for specific kinds of people?
profiles <- tibble(
label = c(
"Young woman, strong belonging, excellent MH, high income",
"Young woman, weak belonging, poor MH, low income",
"Older man, moderate belonging, good MH, middle income"
),
belonging = c(4, 1, 2),
mental_health = c(5, 1, 3),
sex = c("Female", "Female", "Male"),
income_group = factor(c("$80k+", "< $20k", "$40-59k"),
levels = levels(analysis_data$income_group)),
age_group = factor(c("18-34", "18-34", "50-64"),
levels = levels(analysis_data$age_group))
)
profiles$predicted <- round(predict(m5, newdata = profiles), 2)
profiles %>% select(label, predicted)
## # A tibble: 3 × 2
## label predicted
## <chr> <dbl>
## 1 Young woman, strong belonging, excellent MH, high income 9.7
## 2 Young woman, weak belonging, poor MH, low income 4.03
## 3 Older man, moderate belonging, good MH, middle income 6.3
ANSWER KEY
Mental health has a larger full-range effect: 3.43 points (4 units × 0.857) compared to belonging’s 1.66 points (3 units × 0.554). Mental health’s full-range impact is about twice as large — roughly 1.77 points more. Even though belonging’s per-unit coefficient (0.554) is not dramatically smaller than mental health’s (0.857), mental health has a wider range (4 units vs. 3), which compounds the difference. This is exactly why you can’t compare slider coefficients at face value without considering their ranges.
“Moving from the worst to the best self-rated mental health, holding everything else constant, is associated with about 3.4 points on a 0–10 satisfaction scale.” That is large — it represents about 1.78 standard deviations of the outcome, and more than a third of the entire 0–10 scale. By Cohen’s conventional benchmarks (0.2 SD = small, 0.5 SD = medium, 0.8 SD = large), both full-range effects qualify as large: belonging sits right at the threshold (0.86 SD) and mental health is more than twice that (1.78 SD). To put it in concrete terms: the predicted gap between someone with “Poor” mental health and someone with “Excellent” mental health is roughly the difference between scoring a 5 (middling) and scoring an 8.4 (very satisfied). Mental health is the single most powerful predictor in the model.
The gap between the most advantaged profile (9.70) and the least advantaged (4.03) is 5.67 points — more than half the entire 0–10 scale. What drives it? All three factors contribute, but the arithmetic makes it clear. The belonging difference (4 vs. 1 = 3 units × 0.554 = 1.66 points), the mental health difference (5 vs. 1 = 4 units × 0.857 = 3.43 points), and the income difference ($80k+ vs. < $20k = 0.579 points) together account for 1.66 + 3.43 + 0.58 = 5.67 points. Mental health contributes the most (about 60% of the gap), belonging contributes about 29%, and income about 10%. The sex and age differences between these two profiles are zero (both are young women in the reference groups).
“0.85 per unit” is abstract — you don’t know whether 0.85 is a lot or a little unless you know how far people actually vary on the predictor and what the outcome scale looks like. “3.4 points across the full range” is concrete: you can immediately picture where someone starts and ends on the satisfaction scale. It tells you what the coefficient means for a real person comparing the best and worst scenarios, not just for a hypothetical one-unit nudge that may not correspond to any real-world difference. Substantive interpretation translates statistics into stories about people.
From lecture, every variable you add is a claim about how the world works.
Food insecurity is strongly associated with lower life satisfaction. Let’s add it.
m6 <- lm(life_satisfaction ~ belonging + mental_health + sex + income_group +
age_group + food_security, data = analysis_data)
tidy(m6) %>%
filter(str_detect(term, "food|belonging|mental|income")) %>%
mutate(
across(c(estimate, std.error, statistic), ~ round(.x, 4)),
p.value = ifelse(p.value < 0.0001, "< 0.0001", as.character(round(p.value, 4)))
)
## # A tibble: 9 × 5
## term estimate std.error statistic p.value
## <chr> <dbl> <dbl> <dbl> <chr>
## 1 belonging 0.541 0.008 67.9 < 0.0001
## 2 mental_health 0.815 0.0066 123. < 0.0001
## 3 income_group$20-39k 0.194 0.0394 4.93 < 0.0001
## 4 income_group$40-59k 0.200 0.039 5.14 < 0.0001
## 5 income_group$60-79k 0.232 0.0395 5.88 < 0.0001
## 6 income_group$80k+ 0.35 0.0366 9.55 < 0.0001
## 7 food_securityMarginally insecure -0.327 0.0316 -10.4 < 0.0001
## 8 food_securityModerately insecure -0.535 0.0255 -21.0 < 0.0001
## 9 food_securitySeverely insecure -1.01 0.0323 -31.4 < 0.0001
cat("Income ($80k+ vs < $20k) in Model 5:",
round(coef(m5)["income_group$80k+"], 3), "\n")
## Income ($80k+ vs < $20k) in Model 5: 0.579
cat("Income ($80k+ vs < $20k) in Model 6:",
round(coef(m6)["income_group$80k+"], 3), "\n\n")
## Income ($80k+ vs < $20k) in Model 6: 0.35
cat("R-squared Model 5:", round(summary(m5)$r.squared, 4), "\n")
## R-squared Model 5: 0.3471
cat("R-squared Model 6:", round(summary(m6)$r.squared, 4), "\n")
## R-squared Model 6: 0.3613
ANSWER KEY
The income coefficient ($80k+ vs. < $20k) shrank substantially, from 0.579 in Model 5 to 0.350 in Model 6 — a drop of about 40%. This happened because income and food security are strongly related: higher-income households are far less likely to experience food insecurity. When food security enters the model, it absorbs some of the variation that was previously attributed to income. Part of the “income” effect was really operating through food security.
This is genuinely ambiguous, and the ambiguity is the point:
Confounder argument: Food security could be a confounder if it has causes beyond just income (e.g., food deserts, disability, family structure) that independently affect life satisfaction. In that case, including it gives a cleaner estimate of income’s direct effect.
Mediator argument: Food security is plausibly a mediator — one of the mechanisms through which income affects life satisfaction. Having more money → being able to afford food → higher satisfaction. If this is the pathway, then adjusting for food security blocks the channel and the income coefficient underestimates the total effect of income.
The answer changes the interpretation: in Model 5, the income coefficient captures the total association of income with satisfaction (including through food security). In Model 6, it captures only the direct association of income — whatever income does for satisfaction that is not about food security. Both are valid; they just answer different questions.
Model 5 tells the story: “How much does income matter overall for life satisfaction, including all the pathways through which money helps?” Model 6 tells the story: “How much does income matter beyond food security — what does money buy you that isn’t about putting food on the table?” The income coefficient is smaller in Model 6 not because Model 5 was wrong, but because the two models are answering different questions.
These tools are not academic exercises. Published researchers use exactly the same sliders, switches, and “holding constant” logic.
Below is a coefficient plot with values drawn from published Canadian wellbeing research. The coefficients are illustrative — they reflect realistic patterns from studies using large national surveys like the CCHS, but are not a direct reproduction of any single paper. The outcome is life satisfaction on a 0–10 scale (same as ours). The researchers asked: what predicts how satisfied Canadians are with their lives, and how large are the effects?
# Coefficient plot based on typical findings from Canadian wellbeing research
# (e.g., Helliwell, Barrington-Leigh et al., Canadian Journal of Economics)
pub_data <- tibble(
term = c("Self-rated health\n(1-5 scale, slider)",
"Log household income\n(slider)",
"Unemployed\n(vs. employed)",
"Married\n(vs. single)",
"Separated/divorced\n(vs. single)",
"Widowed\n(vs. single)",
"Volunteer regularly\n(vs. not)",
"Trust in neighbours\n(1-5 scale, slider)",
"Female\n(vs. male)",
"Age 45-64\n(vs. 25-44)",
"Age 65+\n(vs. 25-44)",
"Immigrant\n(vs. Canadian-born)"),
estimate = c(0.62, 0.35, -0.55, 0.44, -0.50, -0.25, 0.22, 0.30, 0.06, -0.15, 0.35, 0.10),
se = c(0.03, 0.04, 0.08, 0.04, 0.06, 0.07, 0.04, 0.03, 0.03, 0.04, 0.05, 0.05)
) %>%
mutate(
ci_lo = estimate - 1.96 * se,
ci_hi = estimate + 1.96 * se,
significant = !(ci_lo <= 0 & ci_hi >= 0),
term = factor(term, levels = rev(term))
)
ggplot(pub_data, aes(x = estimate, y = term, color = significant)) +
geom_vline(xintercept = 0, linetype = "dashed", color = "grey70", linewidth = 1) +
geom_segment(aes(x = ci_lo, xend = ci_hi), linewidth = 1.3) +
geom_point(size = 4) +
scale_color_manual(values = c("TRUE" = "#D64045", "FALSE" = "#94a3b8"),
labels = c("CI excludes zero", "CI crosses zero")) +
labs(
title = "What Predicts Life Satisfaction in Canada?",
subtitle = "Illustrative coefficients based on patterns from Canadian wellbeing research\n(Helliwell & Barrington-Leigh, 2010 and related studies) | Outcome: life satisfaction 0-10",
x = "Coefficient (change in life satisfaction)", y = NULL
) +
theme_minimal() +
theme(legend.position = "bottom", legend.title = element_blank(),
axis.text.y = element_text(size = 11))
Apply the checklist from lecture to this table. Work through each question:
ANSWER KEY
The outcome is life satisfaction, measured on a 0–10 scale. Each coefficient represents the predicted change in satisfaction (in points on that scale) associated with a one-unit change in the predictor (for sliders) or the difference vs. the reference group (for switches), holding all other variables constant.
Two sliders:
Self-rated health (0.62 per unit, range 1–5): 4 units × 0.62 = 2.48 points. Moving from the worst to the best self-rated health predicts about 2.5 points higher satisfaction — a quarter of the entire scale and roughly 1.3 SD of the outcome, a large effect by Cohen’s benchmarks.
Trust in neighbours (0.30 per unit, range 1–5): 4 units × 0.30 = 1.20 points. The full range of neighbourhood trust predicts about 1.2 points — meaningful but smaller than health.
Two switches:
Unemployed (−0.55, reference = employed): “Holding everything else constant, unemployed Canadians report life satisfaction about 0.55 points lower than employed Canadians, on average.” This is a direct comparison: the reference group is people who are employed.
Married (+0.44, reference = single): “Holding everything else constant, married Canadians report life satisfaction about 0.44 points higher than single (never-married) Canadians, on average.”
At face value, self-rated health (0.62) looks only slightly larger than married (0.44). But health is a slider and marriage is a switch. The slider spans 4 units, so its full-range effect is 4 × 0.62 = 2.48 points. The marriage switch is a one-time comparison: married vs. single = 0.44 points. Health’s full predicted impact is about 5.6 times larger than the married/single gap. You cannot compare them at face value because the slider coefficient describes a per-unit rate while the switch coefficient describes the total group difference. You need to scale the slider across its range first.
The female coefficient (0.06) is technically significant — its CI [0.00, 0.12] barely excludes zero. But it is not important in any practical sense. A 0.06-point difference on a 10-point scale is negligible — about 3% of a standard deviation. This is the same lesson as the sex/mental health model from last week’s practice: with 60,000+ respondents, even trivially small differences can be statistically significant. “Significant” means unlikely to be zero; “important” means large enough to matter. These are different questions.
Income and employment status are both in the model because they capture different aspects of economic circumstances. Income measures material resources; employment measures daily structure, social connection, and identity. Unemployment has effects beyond lost income — social isolation, loss of purpose, stigma. They are best understood as independent confounders of each other and of life satisfaction, not as one mediating the other. If employment were purely a mediator of income (income → job → satisfaction), then including it would block the income pathway. But the unemployment coefficient (−0.55) is large after controlling for income, which suggests unemployment affects satisfaction through channels other than just money. Both belong in the model.
Here is the complete output from our Model 5. Write a short paragraph (4–6 sentences) that a non-statistician could understand, covering:
tidy(m5) %>%
mutate(
across(c(estimate, std.error, statistic), ~ round(.x, 4)),
p.value = ifelse(p.value < 0.0001, "< 0.0001", as.character(round(p.value, 4)))
)
## # A tibble: 11 × 5
## term estimate std.error statistic p.value
## <chr> <dbl> <dbl> <dbl> <chr>
## 1 (Intercept) 2.61 0.0449 58.2 < 0.0001
## 2 belonging 0.554 0.008 68.9 < 0.0001
## 3 mental_health 0.857 0.0066 130. < 0.0001
## 4 sexMale -0.176 0.0129 -13.7 < 0.0001
## 5 income_group$20-39k 0.301 0.0397 7.58 < 0.0001
## 6 income_group$40-59k 0.357 0.0392 9.12 < 0.0001
## 7 income_group$60-79k 0.414 0.0396 10.5 < 0.0001
## 8 income_group$80k+ 0.579 0.0365 15.9 < 0.0001
## 9 age_group35-49 -0.167 0.0213 -7.85 < 0.0001
## 10 age_group50-64 -0.174 0.0203 -8.59 < 0.0001
## 11 age_group65+ -0.001 0.0198 -0.052 0.9585
cat("R-squared:", round(summary(m5)$r.squared, 4), "\n")
## R-squared: 0.3471
cat("SD of life satisfaction:", round(sd(analysis_data$life_satisfaction), 2), "\n")
## SD of life satisfaction: 1.93
Write your paragraph here. Use the templates, scale the sliders, name the units, and remember: the goal is to make the numbers mean something to someone who has never taken a statistics course.
ANSWER KEY
Here is a model response:
“Using data from over 60,000 Canadian adults in the 2022 Canadian Community Health Survey, we examined what predicts life satisfaction on a 0–10 scale. The strongest predictor is self-rated mental health: moving from the lowest to the highest rating is associated with about 3.4 points higher satisfaction, holding other factors constant — that’s more than a third of the entire scale. Community belonging matters too, though less dramatically: the gap between the weakest and strongest sense of belonging predicts about 1.7 points. Higher household income is associated with modestly higher satisfaction (about 0.6 points comparing $80k+ to under $20k), and men report about 0.2 points lower satisfaction than women. Interestingly, after accounting for these factors, adults aged 65+ are no less satisfied than 18–34 year-olds — the apparent age gap disappears. Altogether, the model explains about 35% of the variation in life satisfaction, meaning that most of what makes people satisfied or dissatisfied — relationships, personality, life events, physical health, and much more — is not captured here. These results describe patterns, not causes: we cannot say that improving someone’s mental health would cause their satisfaction to rise by 3.4 points, only that the two tend to go together.”
What makes this a good paragraph: It names the outcome and its units, prioritizes predictors by substantive size (not p-values), translates coefficients into full-range effects, notes the surprising null finding for age, acknowledges what the model misses (65% unexplained), and explicitly flags the correlation-vs-causation distinction. It avoids jargon like “coefficient,” “p-value,” and “R-squared” in favor of language a general reader can follow.
ANSWER KEY
These are reflection questions, so answers will vary. Here are model responses:
“Holding constant means we’re comparing people who are alike on everything else in the model — like asking, ‘among people with the same mental health, income, sex, and age, does belonging still matter?’” The key idea is that we’re making apples-to-apples comparisons rather than comparing people who differ on many things at once.
Open-ended, but common surprises include: the belonging coefficient dropping by 36% when mental health was added (many students expect coefficients to be stable); the 65+ age group showing zero difference after adjustment (most students expect older adults to be less satisfied); and the income coefficient dropping by 40% when food security was added (the mediator/confounder distinction is new territory). Any of these are good answers if the student explains why it was surprising.
Open-ended, but strong answers tend to focus on either item 4 (“For sliders: what is the range?”) or item 6 (“How big is the effect relative to the SD of the outcome?”). Item 4 prevents the most common beginner mistake: assuming a coefficient of 0.5 is “small” when the predictor spans 10 units. Item 6 prevents the second most common mistake: declaring a coefficient “significant” without asking whether it matters. The checklist item about why variables were included (item 8) is also a strong answer — it’s the most conceptually deep and guards against the “throw everything in” approach.
Multivariate regression fits the equation: \(\widehat{y} = b_0 + b_1 x_1 + b_2 x_2 + \ldots\) Each coefficient describes the association with Y holding all other predictors constant.
“Holding constant” means comparing people who are similar on everything else in the model. It approximates the within-group slope.
Sliders (continuous predictors): interpret per one-unit change, but always consider the range. Multiply coefficient by range for the full predicted gap.
Switches (categorical predictors): interpret as the predicted difference between the named category and the reference group.
Substantive interpretation means translating coefficients into real-world quantities: full-range effects, comparisons to the outcome SD, and predicted profiles for specific kinds of people.
Model thinking: not every variable belongs. Confounders sharpen estimates. Mediators block mechanisms. The reason you include a variable shapes what the model finds.
| Function | What it does | Example |
|---|---|---|
lm() |
Fit a linear regression | lm(y ~ x1 + x2, data = df) |
tidy() |
Clean coefficient table | tidy(model) |
glance() |
Model-level summary | glance(model) |
coef() |
Extract coefficients | coef(model) |
predict() |
Predicted values for new data | predict(model, newdata = df) |
factor() |
Set reference category | factor(x, levels = c("ref", ...)) |
case_when() |
Recode with labels | case_when(x == 1 ~ "Low", ...) |
Remember: The goal is not to add as many predictors as possible. It is to build a model that tells a clear, defensible story — and to interpret it in terms a real person would understand.