- Introduce multi-level linear regression models
- Why / when are these types of models are important
- How do they differ structurally (and conceptually) from non-hierarchical (flat) linear models
# Required packages to install / load library(tidyverse) library(modelr) library(lme4) # package used to run multilevel models using lmer, glmer library(lmerTest) # obtain p-values for lmer models
lme4::sleepstudy data published in Belenky et al. (2003).
See
?lme4::sleepstudy
glimpse(sleepstudy)
Rows: 180 Columns: 3 $ Reaction <dbl> 249.5600, 258.7047, 250.8006, 321.4398, 356.8519, 414.6901, 3… $ Days <dbl> 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0… $ Subject <fct> 308, 308, 308, 308, 308, 308, 308, 308, 308, 308, 309, 309, 3…
Modelling the data as coming from
\[ y_i \sim \mathcal{N}(\mu, \sigma^2) \]
m_lm <- lm(Reaction ~ 1, data = sleepstudy)
returns the mean \(\hat\mu\)
coef(m_lm)
(Intercept) 298.5079
and the standard deviation \(\hat\sigma\)
sigma(m_lm)
[1] 56.32876
This model assumes, incorrectly, that all observations are independent! Why “incorrectly”?
Also, it assumes that all data come from the same normal distribution.
The independence violation can be by-passed by modelling each participant individually.
For example participant 308:
sleepstudy_308 <- filter(sleepstudy, Subject == 308) m_lm_308 <- lm(Reaction ~ 1, data = sleepstudy_308)
Compare the model coefficients \(\hat\mu_\text{[308]}\) and \(\hat\sigma_\text{[308]}\) of participant 308 to the model coefficients of participant 309 using another model. Which participant is faster? Which participant is less consistent in their response times?
Participant 308: average reaction time is 342.13383 with a standard deviation of 79.8217633.
Participant 309: average reaction time is 215.23298 with a standard deviation of 10.8121933.
m_lmer <- lmer(outcome ~ 1 + ( 1 | random intercepts), data = data)
Fixed effects: predictor variables are added as usual
Random effects: + ( ... | ... ).
1Task: Run an lmer model for the sleepdata data with Reaction as outcome and random intercepts for each participant.
summary(m_lmer)
Linear mixed model fit by REML. t-tests use Satterthwaite's method [
lmerModLmerTest]
Formula: Reaction ~ 1 + (1 | Subject)
Data: sleepstudy
REML criterion at convergence: 1904.3
Scaled residuals:
Min 1Q Median 3Q Max
-2.4983 -0.5501 -0.1476 0.5123 3.3446
Random effects:
Groups Name Variance Std.Dev.
Subject (Intercept) 1278 35.75
Residual 1959 44.26
Number of obs: 180, groups: Subject, 18
Fixed effects:
Estimate Std. Error df t value Pr(>|t|)
(Intercept) 298.51 9.05 17.00 32.98 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The mean of the distribution of all subject means (the meta / hyper model): try also fixef(m_lmer) which returns 298.508.
summary(m_lmer)$coef
Estimate Std. Error df t value Pr(>|t|) (Intercept) 298.5079 9.049936 17 32.98453 7.47567e-17
Compare to the coefficient of the flat model.
summary(m_lm)$coef
Estimate Std. Error t value Pr(>|t|) (Intercept) 298.5079 4.198498 71.09874 3.774665e-133
Note the size of the standard error.
VarCorr(m_lmer)
Groups Name Std.Dev. Subject (Intercept) 35.754 Residual 44.259
Residual: intra (within) group variability showing that each participant’s reaction time has a standard deviation of 44.259. Same as:
sigma(m_lmer)
[1] 44.25907
VarCorr(m_lmer)
Groups Name Std.Dev. Subject (Intercept) 35.754 Residual 44.259
Subject and (Intercept) is the inter (between) group variability showing how much the participants’ reaction times vary around the overall mean (i.e. intercept).
In other words the subject means come from a normal distribution with mean 298.508 (fixed effects intercept) and a standard deviation of 35.753.
VarCorr(m_lmer)
Groups Name Std.Dev. Subject (Intercept) 35.754 Residual 44.259
(d_ranefs <- coef(m_lmer)$Subject)
(Intercept) 308 336.3371 309 226.2981 310 239.9712 330 302.5951 331 307.9840 332 306.1335 333 313.8130 334 295.7280 335 256.5062 337 365.4614 349 278.8472 350 311.5970 351 291.2152 352 332.2509 369 305.0345 370 292.6061 371 295.4523 372 315.3113
So the SD of the subject intercepts should be close to 35.754.
summarise(d_ranefs, sd = sd(`(Intercept)`))
sd 1 33.29383
In any normal distribution approximately \(\pm 1.96 \times \sigma\) contains around 95% of the area under the curve.
On the long run, 95% of the participant means are within this range:
intercept_mean <- fixef(m_lmer) # get the fixed effects 'means' for the normal distributions intercept_SD <- 35.753 # SD of intercept derived from random effects component
intercept_mean - qnorm(0.975) * intercept_SD # 1.96 SD below the mean
[1] 228.4333
intercept_mean + qnorm(0.975) * intercept_SD # 1.96 SD above the mean
[1] 368.5825
By-subject deviations from the mean:
coef(m_lmer)$Subject
(Intercept) 308 336.3371 309 226.2981 310 239.9712 330 302.5951 331 307.9840 332 306.1335 333 313.8130 334 295.7280
So far we didn’t make any assumptions about the effect of Days on reaction times.
library(psyntur) scatterplot(data = sleepstudy, x = Days, y = Reaction, best_fit_line = TRUE)
Task: To test whether there is an overall effect of sleep deprivation on reactions times add Days as a predictor to the next model (everything else as before). Call your model m_lmer_days.
summary(m_lmer_days)$coef
Estimate Std. Error df t value Pr(>|t|) (Intercept) 251.40510 9.7467163 22.8102 25.79383 2.241351e-18 Days 10.46729 0.8042214 161.0000 13.01543 6.412601e-27
The Days line shows the overall sleep deprivation effect.
You might not see the p-value if you didn’t use lmerTest.
But check out the sleep-deprivation effects for each participant!
library(psyntur) scatterplot(data = sleepstudy, x = Days, y = Reaction, best_fit_line = TRUE) + facet_wrap(~Subject)
This shows a separate linear regression Reaction ~ Days for each participant.
We can model the variability for both intercepts and slopes (deprivation effect).
Task: Create a new model and add random slopes for the sleep deprivation effect (each participant has their own sleep deprivation effect). Everything else should be as m_lmer_days. Call your model m_lmer_days_rs.
Change (1 | Subject) by adding Days to 1 (or replace 1 with Days). When predictors or random slopes are added we can omit 1 if we want.
Compare the model coefficients (in particular their standard error and t-value).
# With by-participant random slope adjustment summary(m_lmer_days_rs)$coef
Estimate Std. Error df t value Pr(>|t|) (Intercept) 251.40510 6.824597 16.99973 36.838090 1.171558e-17 Days 10.46729 1.545790 16.99998 6.771481 3.263824e-06
# Without by-participant random slope adjustment summary(m_lmer_days)$coef
Estimate Std. Error df t value Pr(>|t|) (Intercept) 251.40510 9.7467163 22.8102 25.79383 2.241351e-18 Days 10.46729 0.8042214 161.0000 13.01543 6.412601e-27
Compare the model coefficients (in particular their standard error and t-value).
# With by-participant random slope adjustment summary(m_lmer_days_rs)$coef
Estimate Std. Error df t value Pr(>|t|) (Intercept) 251.40510 6.824597 16.99973 36.838090 1.171558e-17 Days 10.46729 1.545790 16.99998 6.771481 3.263824e-06
(Intercept): baseline reaction times at day 0.Days is the slope, so change in average reaction time for each day, i.e. the average sleep deprivation effect for target population.Intercept and slope show the population level effects, not the participant-level effects.
VarCorr(m_lmer_days_rs)
Groups Name Std.Dev. Corr
Subject (Intercept) 24.7407
Days 5.9221 0.066
Residual 25.5918
New here is the Days line, and the correlation Corr.
Corr”VarCorr(m_lmer_days_rs)
Groups Name Std.Dev. Corr
Subject (Intercept) 24.7407
Days 5.9221 0.066
Residual 25.5918
VarCorr(m_lmer_days_rs)
Groups Name Std.Dev. Corr
Subject (Intercept) 24.7407
Days 5.9221 0.066
Residual 25.5918
New here is the Days line, and the correlation Corr.
Days is the between-subject variability in the average sleep deprivation effect around the slope of fixed effects.fixef(m_lmer_days)
(Intercept) Days 251.40510 10.46729
# Sleep deprivation effect is assumed to be constant coef(m_lmer_days)$Subject
(Intercept) Days 308 292.1888 10.46729 309 173.5556 10.46729 310 188.2965 10.46729 330 255.8115 10.46729 331 261.6213 10.46729 332 259.6263 10.46729 333 267.9056 10.46729 334 248.4081 10.46729 335 206.1230 10.46729 337 323.5878 10.46729 349 230.2089 10.46729 350 265.5165 10.46729 351 243.5429 10.46729 352 287.7835 10.46729 369 258.4415 10.46729 370 245.0424 10.46729 371 248.1108 10.46729 372 269.5209 10.46729
VarCorr(m_lmer_days_rs)
Groups Name Std.Dev. Corr
Subject (Intercept) 24.7407
Days 5.9221 0.066
Residual 25.5918
# Sleep deprivation effect is assumed to vary across participants. coef(m_lmer_days_rs)$Subject
(Intercept) Days 308 253.6637 19.6662617 309 211.0064 1.8476053 310 212.4447 5.0184295 330 275.0957 5.6529356 331 273.6654 7.3973743 332 260.4447 10.1951090 333 268.2456 10.2436499 334 244.1725 11.5418676 335 251.0714 -0.2848792 337 286.2956 19.0955511 349 226.1949 11.6407181 350 238.3351 17.0815038 351 255.9830 7.4520239 352 272.2688 14.0032871 369 254.6806 11.3395008 370 225.7921 15.2897709 371 252.2122 9.4791297 372 263.7197 11.7513080
Sleep deprivation effect is assumed to vary but all participants come from a distribution with a constant mean starting reaction time.
m_lmer_days_rs_2 <- lmer(Reaction ~ Days + (0 + Days|Subject), data = sleepstudy)
fixef(m_lmer_days_rs_2)
(Intercept) Days 251.40510 10.46729
coef(m_lmer_days_rs_2)$Subject
(Intercept) Days 308 251.4051 20.0866918 309 251.4051 -4.2326711 310 251.4051 -0.8189202 330 251.4051 9.1273878 331 251.4051 10.6754843 332 251.4051 11.5352979 333 251.4051 12.7430088 334 251.4051 10.4774867 335 251.4051 -0.4337385 337 251.4051 24.3577325 349 251.4051 7.9069248 350 251.4051 15.2012144 351 251.4051 8.1041559 352 251.4051 17.1349527 369 251.4051 11.8340809 370 251.4051 11.5298510 371 251.4051 9.5898795 372 251.4051 13.5923281
Earlier we calculate the 95% confidence interval for the intercept (that contains 95% of the participant means).
Task: For model m_lmer_days_rs, calculate the 95% confidence interval for the sleep deprivation effect. You will need the slope coefficient from the fixed effects using fixef and the standard deviation of the sleep deprivation slopes from the random effects using VarCorr.
Hint: The correct values are -1.14 for the 2.5% lower bound of the confidence interval and 22.074 for the 97.5% upper bound.
Load data published in Martin et al. (2010):
d_martin <- read_csv("https://raw.githubusercontent.com/jensroes/bristol-ws-2025/refs/heads/main/data/martin-etal-2010-exp3a.csv")
glimpse(d_martin)
Rows: 1,036 Columns: 4 $ item <dbl> 11, 11, 11, 11, 11, 11, 25, 25, 25, 25, 25, 37, 37, 37, 37, 5, … $ ppt <dbl> 10, 9, 11, 6, 7, 2, 10, 9, 11, 7, 2, 10, 9, 6, 2, 10, 9, 11, 6,… $ rt <dbl> 1055, 2010, 461, 977, 1152, 1079, 1194, 1267, 1160, 684, 1367, … $ nptype <chr> "conjoined", "conjoined", "conjoined", "conjoined", "conjoined"…
nptype is short for noun phrase type, so a language unit like “The cat and the hat” (conjoined) vs “The cat” (simple).
Average over by participant (collapse across items):
d_martin_ppt <- summarise(d_martin, rt = mean(rt), .by = c(ppt, nptype)) arrange(d_martin_ppt, ppt)
# A tibble: 24 × 3
ppt nptype rt
<dbl> <chr> <dbl>
1 1 conjoined 1382.
2 1 simple 1323.
3 2 conjoined 1077.
4 2 simple 1108.
5 3 conjoined 1162.
6 3 simple 1092.
7 4 conjoined 1047.
8 4 simple 1021.
9 5 conjoined 1108.
10 5 simple 1057.
# ℹ 14 more rows
Task: Model reaction times rt with predictor for nptype and
ppt.ppt and random slopes for nptype.Check fixed effects and random effects for differences. What effect did the random slopes have on the estimates?
NB. This model is equivalent to a repeated measures ANOVA.
Multi-level regression models as opposed to standard ANOVAs can be extended to includes various random errors using the format (1|randomeffect_1) + (1|randomeffect_2).
Task: Using the unaggregated format of the data
glimpse(d_martin)
Rows: 1,036 Columns: 4 $ item <dbl> 11, 11, 11, 11, 11, 11, 25, 25, 25, 25, 25, 37, 37, 37, 37, 5, … $ ppt <dbl> 10, 9, 11, 6, 7, 2, 10, 9, 11, 7, 2, 10, 9, 6, 2, 10, 9, 11, 6,… $ rt <dbl> 1055, 2010, 461, 977, 1152, 1079, 1194, 1267, 1160, 684, 1367, … $ nptype <chr> "conjoined", "conjoined", "conjoined", "conjoined", "conjoined"…
model reaction times rt with predictor for nptype and
ppt and items itemppt and items item with random slope adjustments for nptype for both random intercept termsCheck fixed effects and random effects for differences. What effect did the random slopes have on the estimates?
Belenky, G., Wesensten, N. J., Thorne, D. R., Thomas, M. L., Sing, H. C., Redmond, D. P., Russo, M. B., & Balkin, T. J. (2003). Patterns of performance degradation and restoration during sleep restriction and subsequent recovery: A sleep dose-response study. Journal of Sleep Research, 12(1), 1–12.
Martin, R. C., Crowther, J. E., Knight, M., Tamborello II, F. P., & Yang, C.-L. (2010). Planning in sentence production: Evidence for the phrase as a default planning scope. Cognition, 116(2), 177–192.