Session objectives

  • Introduce multi-level linear regression models
  • Why / when are these types of models are important
  • How do they differ structurally (and conceptually) from non-hierarchical (flat) linear models

What are multi-level data?

  • Data are often multilevel, i.e. hierarchical, nested, clustered.
  • Lower level units of analysis belong to higher-level units of analysis.
    • e.g., students may belong to specific school units (assuming both data points are captured)
  • Repeated measure designs: observations are nested within participants.
  • Longitudinal data are “multilevel”: measures across different time points are nested within individuals.

Packages

# Required packages to install / load
library(tidyverse)
library(modelr)
library(lme4) # package used to run multilevel models using lmer, glmer
library(lmerTest) # obtain p-values for lmer models

Reaction time and sleep deprivation

lme4::sleepstudy data published in Belenky et al. (2003).

See

?lme4::sleepstudy

Reaction times and sleep deprivation

glimpse(sleepstudy)
Rows: 180
Columns: 3
$ Reaction <dbl> 249.5600, 258.7047, 250.8006, 321.4398, 356.8519, 414.6901, 3…
$ Days     <dbl> 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0…
$ Subject  <fct> 308, 308, 308, 308, 308, 308, 308, 308, 308, 308, 309, 309, 3…
  • 18 ppts sleep deprived for 9 days
  • Reaction times recorded on days 0 - 9 (10 measurements per ppt)
  • To what extent does sleep deprivation affects affect reaction times?

Normal linear regression model

Modelling the data as coming from

\[ y_i \sim \mathcal{N}(\mu, \sigma^2) \]

m_lm <- lm(Reaction ~ 1, data = sleepstudy)

returns the mean \(\hat\mu\)

coef(m_lm)
(Intercept) 
   298.5079 

and the standard deviation \(\hat\sigma\)

sigma(m_lm)
[1] 56.32876

This model assumes, incorrectly, that all observations are independent! Why “incorrectly”?

Also, it assumes that all data come from the same normal distribution.

Normal linear regression model

The independence violation can be by-passed by modelling each participant individually.

For example participant 308:

sleepstudy_308 <- filter(sleepstudy, Subject == 308)
m_lm_308 <- lm(Reaction ~ 1, data = sleepstudy_308)

Compare the model coefficients \(\hat\mu_\text{[308]}\) and \(\hat\sigma_\text{[308]}\) of participant 308 to the model coefficients of participant 309 using another model. Which participant is faster? Which participant is less consistent in their response times?

Normal linear regression model

Participant 308: average reaction time is 342.13383 with a standard deviation of 79.8217633.

Participant 309: average reaction time is 215.23298 with a standard deviation of 10.8121933.

By-subject reaction times

Model assumptions

  • The models assume independence of observations and that all the data points were sampled from the same normal distribution.
  • In other words all participants should be roughly similar to each other.
  • This is true for the individual participant models but not plausible for the model with all participant data.
  • For example, the estimated average reaction time for participant 308 is 342.13383 but the average of the model with all participants was 298.5078917. Participant 309 was different from both.
  • Instead it’s likely that each participant has their own normal distribution.

What do we do?

  • Each participants reaction times can be described as a normal distribution with a mean and a standard deviation.
  • Each other these participants comes from a population with a mean and standard deviation (some participants are slower, some are faster).
  • There are thus two sources of variability in the reaction:
    • Each participant might be some times faster or slower (within subject variability)
    • Some participants are slower, some are faster (between subject variability)
  • Each distribution is essentially a model; modelling the variability of these distributions essentially means we are running a “model of models” which is what a multi-level model is.

Multi-level normal model: Syntax

m_lmer <- lmer(outcome ~ 1 + ( 1 | random intercepts), data = data)

Fixed effects: predictor variables are added as usual

Random effects: + ( ... | ... ).

  • On the right: random intercepts variable (for subgroups in the data).
  • On the left: random slopes variable (subgroup specific changes in the outcome)
  • To omit random slopes, replace it with 1

Task: Run an lmer model for the sleepdata data with Reaction as outcome and random intercepts for each participant.

Multi-level normal model: Assumption

  • Each subject’s reaction are drawn from a normal distributions with a mean and a standard deviation.
  • These normal distributions come from a hyper / meta normal distribution also with a mean and a standard deviation.
  • For data that are not assumed to come from a normal distribution (binary outcomes, count), there are generalised multi-level models.

Outcome: random and fixed effects

summary(m_lmer)
Linear mixed model fit by REML. t-tests use Satterthwaite's method [
lmerModLmerTest]
Formula: Reaction ~ 1 + (1 | Subject)
   Data: sleepstudy

REML criterion at convergence: 1904.3

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-2.4983 -0.5501 -0.1476  0.5123  3.3446 

Random effects:
 Groups   Name        Variance Std.Dev.
 Subject  (Intercept) 1278     35.75   
 Residual             1959     44.26   
Number of obs: 180, groups:  Subject, 18

Fixed effects:
            Estimate Std. Error     df t value Pr(>|t|)    
(Intercept)   298.51       9.05  17.00   32.98   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Multi-level normal model: Fixed effects

The mean of the distribution of all subject means (the meta / hyper model): try also fixef(m_lmer) which returns 298.508.

summary(m_lmer)$coef
            Estimate Std. Error df  t value    Pr(>|t|)
(Intercept) 298.5079   9.049936 17 32.98453 7.47567e-17

Compare to the coefficient of the flat model.

summary(m_lm)$coef
            Estimate Std. Error  t value      Pr(>|t|)
(Intercept) 298.5079   4.198498 71.09874 3.774665e-133

Note the size of the standard error.

Multi-level normal model: Random effects

VarCorr(m_lmer)
 Groups   Name        Std.Dev.
 Subject  (Intercept) 35.754  
 Residual             44.259  

Residual: intra (within) group variability showing that each participant’s reaction time has a standard deviation of 44.259. Same as:

sigma(m_lmer)
[1] 44.25907

Multi-level normal model: Random effects

VarCorr(m_lmer)
 Groups   Name        Std.Dev.
 Subject  (Intercept) 35.754  
 Residual             44.259  

Subject and (Intercept) is the inter (between) group variability showing how much the participants’ reaction times vary around the overall mean (i.e. intercept).

In other words the subject means come from a normal distribution with mean 298.508 (fixed effects intercept) and a standard deviation of 35.753.

Multi-level normal model: Random effects

VarCorr(m_lmer)
 Groups   Name        Std.Dev.
 Subject  (Intercept) 35.754  
 Residual             44.259  
(d_ranefs <- coef(m_lmer)$Subject)
    (Intercept)
308    336.3371
309    226.2981
310    239.9712
330    302.5951
331    307.9840
332    306.1335
333    313.8130
334    295.7280
335    256.5062
337    365.4614
349    278.8472
350    311.5970
351    291.2152
352    332.2509
369    305.0345
370    292.6061
371    295.4523
372    315.3113

So the SD of the subject intercepts should be close to 35.754.

summarise(d_ranefs, sd = sd(`(Intercept)`))
        sd
1 33.29383

What does this mean?

In any normal distribution approximately \(\pm 1.96 \times \sigma\) contains around 95% of the area under the curve.

On the long run, 95% of the participant means are within this range:

intercept_mean <- fixef(m_lmer) # get the fixed effects 'means' for the normal distributions 
intercept_SD <- 35.753 # SD of intercept derived from random effects component 
intercept_mean - qnorm(0.975) * intercept_SD  # 1.96 SD below the mean 
[1] 228.4333
intercept_mean + qnorm(0.975) * intercept_SD  # 1.96 SD above the mean
[1] 368.5825

What does this mean?

By-subject deviations from the mean:

coef(m_lmer)$Subject
    (Intercept)
308    336.3371
309    226.2981
310    239.9712
330    302.5951
331    307.9840
332    306.1335
333    313.8130
334    295.7280

By-subject sleep depriviation effects

So far we didn’t make any assumptions about the effect of Days on reaction times.

library(psyntur)
scatterplot(data = sleepstudy, x = Days, y = Reaction, best_fit_line = TRUE)

Overall sleep depriviation effects

Task: To test whether there is an overall effect of sleep deprivation on reactions times add Days as a predictor to the next model (everything else as before). Call your model m_lmer_days.

Overall sleep depriviation effects

summary(m_lmer_days)$coef
             Estimate Std. Error       df  t value     Pr(>|t|)
(Intercept) 251.40510  9.7467163  22.8102 25.79383 2.241351e-18
Days         10.46729  0.8042214 161.0000 13.01543 6.412601e-27

The Days line shows the overall sleep deprivation effect.

You might not see the p-value if you didn’t use lmerTest.

By-subject sleep depriviation effects

But check out the sleep-deprivation effects for each participant!

library(psyntur)
scatterplot(data = sleepstudy, x = Days, y = Reaction, best_fit_line = TRUE) +
  facet_wrap(~Subject)

This shows a separate linear regression Reaction ~ Days for each participant.

By-subject sleep depriviation effects

  • Again, reaction times are not independent by cluster within subject.
  • For all participants, reaction times were collect on consecutive time points.
  • Size of effect on reaction times varies across participants (one is even getting faster!).

By-subject sleep depriviation effects

We can model the variability for both intercepts and slopes (deprivation effect).

Task: Create a new model and add random slopes for the sleep deprivation effect (each participant has their own sleep deprivation effect). Everything else should be as m_lmer_days. Call your model m_lmer_days_rs.

Change (1 | Subject) by adding Days to 1 (or replace 1 with Days). When predictors or random slopes are added we can omit 1 if we want.

Overall sleep deprivation effect

Compare the model coefficients (in particular their standard error and t-value).

# With by-participant random slope adjustment
summary(m_lmer_days_rs)$coef
             Estimate Std. Error       df   t value     Pr(>|t|)
(Intercept) 251.40510   6.824597 16.99973 36.838090 1.171558e-17
Days         10.46729   1.545790 16.99998  6.771481 3.263824e-06
# Without by-participant random slope adjustment
summary(m_lmer_days)$coef
             Estimate Std. Error       df  t value     Pr(>|t|)
(Intercept) 251.40510  9.7467163  22.8102 25.79383 2.241351e-18
Days         10.46729  0.8042214 161.0000 13.01543 6.412601e-27

Overall sleep deprivation effect: Fixed effect

Compare the model coefficients (in particular their standard error and t-value).

# With by-participant random slope adjustment
summary(m_lmer_days_rs)$coef
             Estimate Std. Error       df   t value     Pr(>|t|)
(Intercept) 251.40510   6.824597 16.99973 36.838090 1.171558e-17
Days         10.46729   1.545790 16.99998  6.771481 3.263824e-06
  • Coefficients table for a linear regression
  • (Intercept): baseline reaction times at day 0.
  • Days is the slope, so change in average reaction time for each day, i.e. the average sleep deprivation effect for target population.

Overall sleep deprivation effect: random effects

Intercept and slope show the population level effects, not the participant-level effects.

VarCorr(m_lmer_days_rs)
 Groups   Name        Std.Dev. Corr 
 Subject  (Intercept) 24.7407       
          Days         5.9221  0.066
 Residual             25.5918       

New here is the Days line, and the correlation Corr.

Correlation “Corr

VarCorr(m_lmer_days_rs)
 Groups   Name        Std.Dev. Corr 
 Subject  (Intercept) 24.7407       
          Days         5.9221  0.066
 Residual             25.5918       
  • Correlation between the random intercepts and the random slopes, i.e. individual subject intercepts and individual subject slopes.
  • This correlation is very low, so there the slopes and intercepts are more or less unrelated.
  • Positive correlation: Deprivation slowdown effect is stronger for slower participants compared to faster participants.
  • Negative correlation: Deprivation slowdown effect is stronger for faster participants compared to slower participants.
  • Perfect corrections are red flags!

Overall sleep deprivation effect: random effects

VarCorr(m_lmer_days_rs)
 Groups   Name        Std.Dev. Corr 
 Subject  (Intercept) 24.7407       
          Days         5.9221  0.066
 Residual             25.5918       

New here is the Days line, and the correlation Corr.

  • Days is the between-subject variability in the average sleep deprivation effect around the slope of fixed effects.

Constant and random intercepts and slopes

Constant and random intercepts and slopes

Random intercepts and constant slopes

fixef(m_lmer_days)
(Intercept)        Days 
  251.40510    10.46729 
# Sleep deprivation effect is assumed to be constant
coef(m_lmer_days)$Subject
    (Intercept)     Days
308    292.1888 10.46729
309    173.5556 10.46729
310    188.2965 10.46729
330    255.8115 10.46729
331    261.6213 10.46729
332    259.6263 10.46729
333    267.9056 10.46729
334    248.4081 10.46729
335    206.1230 10.46729
337    323.5878 10.46729
349    230.2089 10.46729
350    265.5165 10.46729
351    243.5429 10.46729
352    287.7835 10.46729
369    258.4415 10.46729
370    245.0424 10.46729
371    248.1108 10.46729
372    269.5209 10.46729

Random intercepts and random slopes

VarCorr(m_lmer_days_rs)
 Groups   Name        Std.Dev. Corr 
 Subject  (Intercept) 24.7407       
          Days         5.9221  0.066
 Residual             25.5918       
# Sleep deprivation effect is assumed to vary across participants.
coef(m_lmer_days_rs)$Subject
    (Intercept)       Days
308    253.6637 19.6662617
309    211.0064  1.8476053
310    212.4447  5.0184295
330    275.0957  5.6529356
331    273.6654  7.3973743
332    260.4447 10.1951090
333    268.2456 10.2436499
334    244.1725 11.5418676
335    251.0714 -0.2848792
337    286.2956 19.0955511
349    226.1949 11.6407181
350    238.3351 17.0815038
351    255.9830  7.4520239
352    272.2688 14.0032871
369    254.6806 11.3395008
370    225.7921 15.2897709
371    252.2122  9.4791297
372    263.7197 11.7513080

Constant intercepts and random slopes

Sleep deprivation effect is assumed to vary but all participants come from a distribution with a constant mean starting reaction time.

m_lmer_days_rs_2 <- lmer(Reaction ~ Days + (0 + Days|Subject), data = sleepstudy)
fixef(m_lmer_days_rs_2)
(Intercept)        Days 
  251.40510    10.46729 
coef(m_lmer_days_rs_2)$Subject
    (Intercept)       Days
308    251.4051 20.0866918
309    251.4051 -4.2326711
310    251.4051 -0.8189202
330    251.4051  9.1273878
331    251.4051 10.6754843
332    251.4051 11.5352979
333    251.4051 12.7430088
334    251.4051 10.4774867
335    251.4051 -0.4337385
337    251.4051 24.3577325
349    251.4051  7.9069248
350    251.4051 15.2012144
351    251.4051  8.1041559
352    251.4051 17.1349527
369    251.4051 11.8340809
370    251.4051 11.5298510
371    251.4051  9.5898795
372    251.4051 13.5923281

95% inner range of the normal distribution

Earlier we calculate the 95% confidence interval for the intercept (that contains 95% of the participant means).

Task: For model m_lmer_days_rs, calculate the 95% confidence interval for the sleep deprivation effect. You will need the slope coefficient from the fixed effects using fixef and the standard deviation of the sleep deprivation slopes from the random effects using VarCorr.

Hint: The correct values are -1.14 for the 2.5% lower bound of the confidence interval and 22.074 for the 97.5% upper bound.

Exercise

Load data published in Martin et al. (2010):

d_martin <- read_csv("https://raw.githubusercontent.com/jensroes/bristol-ws-2025/refs/heads/main/data/martin-etal-2010-exp3a.csv")
glimpse(d_martin)
Rows: 1,036
Columns: 4
$ item   <dbl> 11, 11, 11, 11, 11, 11, 25, 25, 25, 25, 25, 37, 37, 37, 37, 5, …
$ ppt    <dbl> 10, 9, 11, 6, 7, 2, 10, 9, 11, 7, 2, 10, 9, 6, 2, 10, 9, 11, 6,…
$ rt     <dbl> 1055, 2010, 461, 977, 1152, 1079, 1194, 1267, 1160, 684, 1367, …
$ nptype <chr> "conjoined", "conjoined", "conjoined", "conjoined", "conjoined"…

nptype is short for noun phrase type, so a language unit like “The cat and the hat” (conjoined) vs “The cat” (simple).

Exercise

Average over by participant (collapse across items):

d_martin_ppt <- summarise(d_martin, rt = mean(rt), .by = c(ppt, nptype))
arrange(d_martin_ppt, ppt)
# A tibble: 24 × 3
     ppt nptype       rt
   <dbl> <chr>     <dbl>
 1     1 conjoined 1382.
 2     1 simple    1323.
 3     2 conjoined 1077.
 4     2 simple    1108.
 5     3 conjoined 1162.
 6     3 simple    1092.
 7     4 conjoined 1047.
 8     4 simple    1021.
 9     5 conjoined 1108.
10     5 simple    1057.
# ℹ 14 more rows

Exercise 1

Task: Model reaction times rt with predictor for nptype and

  1. random intercepts for participants ppt.
  2. random intercepts for participants ppt and random slopes for nptype.

Check fixed effects and random effects for differences. What effect did the random slopes have on the estimates?

NB. This model is equivalent to a repeated measures ANOVA.

Exercise 2

Multi-level regression models as opposed to standard ANOVAs can be extended to includes various random errors using the format (1|randomeffect_1) + (1|randomeffect_2).

Task: Using the unaggregated format of the data

glimpse(d_martin)
Rows: 1,036
Columns: 4
$ item   <dbl> 11, 11, 11, 11, 11, 11, 25, 25, 25, 25, 25, 37, 37, 37, 37, 5, …
$ ppt    <dbl> 10, 9, 11, 6, 7, 2, 10, 9, 11, 7, 2, 10, 9, 6, 2, 10, 9, 11, 6,…
$ rt     <dbl> 1055, 2010, 461, 977, 1152, 1079, 1194, 1267, 1160, 684, 1367, …
$ nptype <chr> "conjoined", "conjoined", "conjoined", "conjoined", "conjoined"…

model reaction times rt with predictor for nptype and

  1. random intercepts for participants ppt and items item
  2. random intercepts for participants ppt and items item with random slope adjustments for nptype for both random intercept terms

Check fixed effects and random effects for differences. What effect did the random slopes have on the estimates?

References

Belenky, G., Wesensten, N. J., Thorne, D. R., Thomas, M. L., Sing, H. C., Redmond, D. P., Russo, M. B., & Balkin, T. J. (2003). Patterns of performance degradation and restoration during sleep restriction and subsequent recovery: A sleep dose-response study. Journal of Sleep Research, 12(1), 1–12.

Martin, R. C., Crowther, J. E., Knight, M., Tamborello II, F. P., & Yang, C.-L. (2010). Planning in sentence production: Evidence for the phrase as a default planning scope. Cognition, 116(2), 177–192.