Linear Mixed-Effects Models

============================================================================================================ PREVIOUS: https://rpubs.com/sherloconan/1184363

Getting Started

Statistical Model (using sum coding) for Two-Way Repeated-Measures Design

\[\begin{align} \tag{1} y_{ijkr}=&\ \overbrace{\mu+\alpha_i+\beta_j+(\alpha\beta)_{ij}}^{\text{fixed effects}}+\overbrace{s_k+(\alpha s)_{ik}+(\beta s)_{jk}+(\alpha\beta s)_{ijk}}^{\text{random effects}}+\epsilon_{ijkr}, \\ \\ &\text{for}\ \ i=1,\dotsb,a;\ \ j=1,\dotsb,b;\ \ k=1,\dotsb,n;\ \ r=1,\dotsb,R, \end{align}\]

where \(\textstyle\sum_{i=1}^a \alpha_i=0\), \(\quad\textstyle\sum_{j=1}^b \beta_j=0\), \(\quad\textstyle\sum_{i=1}^a (\alpha\beta)_{ij}=\sum_{j=1}^b (\alpha\beta)_{ij}=0\), \(\quad\epsilon_{ijkr}\overset{\mathrm{i.i.d.}}{\sim}\mathcal{N}(0,\ \sigma_{\epsilon}^2)\),

and \(\textbf{s}=\left(s_k,\ (\alpha s)_{1k},\dotsb,(\alpha s)_{ak},\ \ (\beta s)_{1k},\dotsb,(\beta s)_{bk},\ (\alpha\beta s)_{11k},\dotsb,\ (\alpha\beta s)_{abk}\right)^\top\sim\boldsymbol{\mathcal{N}}(\boldsymbol{0},\ \boldsymbol{\Sigma})\).

Random intercepts, \(s_k\): individual differences in the mean across all conditions.
Random slopes, \((\alpha s)_{ik}\) or \((\beta s)_{jk}\): individual differences in the direction and size of a main effect \(A\) or \(B\).
Random interactions, \((\alpha\beta s)_{ijk}\) for \(R>1\): individual differences in the direction and size of an interaction effect between \(A\) and \(B\).
Correlations between random effects. Do individual differences in one condition predict individual differences in another?

Regression Model (using dummy coding) for a \(2\times3\) Repeated-Measures Design

\(\textbf{y}=\textbf{X}\boldsymbol{\gamma}+\textbf{Zu}+\boldsymbol{\epsilon}\)

\[\begin{align} \tag{2} y_{ijkr}&=(\gamma_{00}+u_{00k}) \\ &+(\gamma_{10}+u_{10k})\cdot D_{A_2} \\ &+(\gamma_{01}+u_{01k})\cdot D_{B_2}+(\gamma_{02}+u_{02k})\cdot D_{B_3} \\ &+(\gamma_{11}+u_{11k})\cdot D_{A_2 B_2}+(\gamma_{12}+u_{12k})\cdot D_{A_2 B_3} \\ &+\epsilon_{ijkr} \end{align}\]

There are \(ab-1=5\) dummy variables, \(D\)’s, in Eq. 2.

From Eq. 1 to Eq. 2,

		\(B_1\)	\(B_2\)	\(B_3\)
\(A_1\)		\(\mu+\alpha_1+\beta_1+(\alpha\beta)_{11}\) \(+s_k+(\alpha s)_{1k}+(\beta s)_{1k}+(\alpha\beta s)_{11k}\) \(=\) \(\gamma_{00}+u_{00k}\)	\(\mu+\alpha_1+\beta_2+(\alpha\beta)_{12}\) \(+s_k+(\alpha s)_{1k}+(\beta s)_{2k}+(\alpha\beta s)_{12k}\) \(=\) \(\gamma_{00}+u_{00k}+\gamma_{01}+u_{01k}\)	\(\mu+\alpha_1+\beta_3+(\alpha\beta)_{13}\) \(+s_k+(\alpha s)_{1k}+(\beta s)_{3k}+(\alpha\beta s)_{13k}\) \(=\) \(\gamma_{00}+u_{00k}+\gamma_{02}+u_{02k}\)
\(A_2\)		\(\mu+\alpha_2+\beta_1+(\alpha\beta)_{21}\) \(+s_k+(\alpha s)_{2k}+(\beta s)_{1k}+(\alpha\beta s)_{21k}\) \(=\) \(\gamma_{00}+u_{00k}+\gamma_{10}+u_{10k}\)	\(\mu+\alpha_2+\beta_2+(\alpha\beta)_{22}\) \(+s_k+(\alpha s)_{2k}+(\beta s)_{2k}+(\alpha\beta s)_{22k}\) \(=\) \(\gamma_{00}+u_{00k}+\gamma_{10}+u_{10k}\) \(\gamma_{01}+u_{01k}+\gamma_{11}+u_{11k}\)	\(\mu+\alpha_2+\beta_3+(\alpha\beta)_{23}\) \(+s_k+(\alpha s)_{2k}+(\beta s)_{3k}+(\alpha\beta s)_{23k}\) \(=\) \(\gamma_{00}+u_{00k}+\gamma_{10}+u_{10k}\) \(\gamma_{02}+u_{02k}+\gamma_{12}+u_{12k}\)

Specifically,

\[\begin{equation} \begin{pmatrix} 1 & 1 & 0 & 1 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 \\ 1 & 0 & 1 & 1 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 \\ 1 & 1 & 0 & 0 & 1 & 0 & 0 & 0 & 1 & 0 & 0 & 0 \\ 1 & 0 & 1 & 0 & 1 & 0 & 0 & 0 & 0 & 1 & 0 & 0 \\ 1 & 1 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 1 & 0 \\ 1 & 0 & 1 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 1 \end{pmatrix} \cdot\textbf{s}= \begin{pmatrix} 1 & 0 & 0 & 0 & 0 & 0 \\ 1 & 1 & 0 & 0 & 0 & 0 \\ 1 & 0 & 1 & 0 & 0 & 0 \\ 1 & 1 & 1 & 0 & 1 & 0 \\ 1 & 0 & 0 & 1 & 0 & 0 \\ 1 & 1 & 0 & 1 & 0 & 1 \end{pmatrix} \cdot\textbf{u} \end{equation}\]

The random effects variance-covariance matrix is given by \(\textbf{G}=\left(\rho_{pq}\cdot\sigma_{p}\cdot\sigma_{q}\right)_{pq}\) with \({ab+1\choose2}=21\) parameters.

\[\begin{equation} \tag{3} \begin{pmatrix} u_{00k} \\ u_{10k} \\ u_{01k} \\ u_{02k} \\ u_{11k} \\ u_{12k} \end{pmatrix} \sim\boldsymbol{\mathcal{N}}\left( \begin{pmatrix} 0 \\ 0 \\ 0 \\ 0 \\ 0 \\ 0 \end{pmatrix},\ \begin{pmatrix} \sigma_{00}^2 & \rho_{00,10}\cdot\sigma_{00}\cdot\sigma_{10} & \rho_{00,01}\cdot\sigma_{00}\cdot\sigma_{01} & \rho_{00,02}\cdot\sigma_{00}\cdot\sigma_{02} & \rho_{00,11}\cdot\sigma_{00}\cdot\sigma_{11} & \rho_{00,12}\cdot\sigma_{00}\cdot\sigma_{12} \\ \bullet & \sigma_{10}^2 & \rho_{10,01}\cdot\sigma_{10}\cdot\sigma_{01} & \rho_{10,02}\cdot\sigma_{10}\cdot\sigma_{02} & \rho_{10,11}\cdot\sigma_{10}\cdot\sigma_{11} & \rho_{10,12}\cdot\sigma_{10}\cdot\sigma_{12} \\ \bullet & \bullet & \sigma_{01}^2 & \rho_{01,02}\cdot\sigma_{01}\cdot\sigma_{02} & \rho_{01,11}\cdot\sigma_{01}\cdot\sigma_{11} & \rho_{01,12}\cdot\sigma_{01}\cdot\sigma_{12} \\ \bullet & \bullet & \bullet & \sigma_{02}^2 & \rho_{02,11}\cdot\sigma_{02}\cdot\sigma_{11} & \rho_{02,12}\cdot\sigma_{02}\cdot\sigma_{12} \\ \bullet & \bullet & \bullet & \bullet & \sigma_{11}^2 & \rho_{11,12}\cdot\sigma_{11}\cdot\sigma_{12} \\ \bullet & \bullet & \bullet & \bullet & \bullet & \sigma_{12}^2 \end{pmatrix} \right) \end{equation}\]

If the data is aggregated (\(R=1\)), there are \({a+b\choose2}=10\) parameters in Eq. 4.

Data

Stroop: Aggregated response times of 19 participants on a Stroop task (\(R=1\), \(n=19\)).

As in Stroop’s experiment, participants responded to congruent and incongruent words. Additionally, they saw colored nonwords, such as “XXXX”, that made up a neutral condition. Furthermore, there were “break” trials where a black square was shown, indicating that participants did not need to respond. Altogether, this experiment uses a 3 (Congruency: congruent vs. neutral vs. incongruent) by 2 (Preceding trial: Break vs. Stroop task) repeated-measures design (van den Bergh et al., 2022, 2023).

Data Source: The wide-format data in JASP, which demonstrates the use of a repeated-measures analysis of variance.

A [factor] preceding trial (with two levels, \(a=2\))
B [factor] congruency (with three levels, \(b=3\))

colnames(stroop) <- c("a1b1", "a1b2", "a1b3", "a2b1", "a2b2", "a2b3") # wide format
round(cov(stroop[,c(1,4,2,5,3,6)])) # sample covariance

##      a1b1  a2b1 a1b2  a2b2 a1b3  a2b3
## a1b1 5444   106 6688  -103 4904  -363
## a2b1  106  8859 -265 11415 -594  7871
## a1b2 6688  -265 8682  -305 6090  -655
## a2b2 -103 11415 -305 16165 -622 10381
## a1b3 4904  -594 6090  -622 4933  -981
## a2b3 -363  7871 -655 10381 -981  7590

df <- data.frame("DV"=c(as.matrix(stroop)),
                 "ID"=rep(paste0("s",1:19), 2*3),
                 "A"=rep(paste0("a",1:2), each=19*3),
                 "B"=rep(paste0("b",1:3), each=19, time=2),
                 stringsAsFactors=T) # long format

scroll to top

Backward Stepwise Elimination

Bates et al. (2015a) proposed parsimonious mixed models through (1) starting with a maximal model; (2) then, fitting a zero-correlation model; (3) next, removing variance components until the likelihood ratio test showed no further improvement; and (4) finally, adding correlation parameters for the remaining variance components.

Use the “lme4” package (v 1.1-36).

Step 1

fit1 <- lmer(DV ~ A * B + (A + B | ID), df) # maximal mixed model for the aggregated data

## boundary (singular) fit: see help('isSingular')

VarCorr(fit1) # inspect random effects standard deviations and correlations

##  Groups   Name        Std.Dev. Corr                
##  ID       (Intercept)  75.3055                     
##           Aa2         129.4031 -0.662              
##           Bb2          17.8853  0.474  0.346       
##           Bb3           8.6579 -0.547 -0.265 -0.996
##  Residual              23.1706

The warning often arises from zero variance or perfect correlation. One correlation estimate is close to \(-1\).

\[\begin{equation} \tag{4} \begin{pmatrix} u_{00k} \\ u_{10k} \\ u_{01k} \\ u_{02k} \end{pmatrix} \sim\boldsymbol{\mathcal{N}}\left( \begin{pmatrix} 0 \\ 0 \\ 0 \\ 0 \end{pmatrix},\ \begin{pmatrix} \sigma_{00}^2 & \rho_{00,10}\cdot\sigma_{00}\cdot\sigma_{10} & \rho_{00,01}\cdot\sigma_{00}\cdot\sigma_{01} & \rho_{00,02}\cdot\sigma_{00}\cdot\sigma_{02} \\ \bullet & \sigma_{10}^2 & \rho_{10,01}\cdot\sigma_{10}\cdot\sigma_{01} & \rho_{10,02}\cdot\sigma_{10}\cdot\sigma_{02} \\ \bullet & \bullet & \sigma_{01}^2 & \rho_{01,02}\cdot\sigma_{01}\cdot\sigma_{02} \\ \bullet & \bullet & \bullet & \sigma_{02}^2 \end{pmatrix} \right) \end{equation}\]

Boundary Estimation Problem: The (restricted) maximum likelihood estimate of the random-effects covariance matrix in Eq. 4 is close to not being positive definite (i.e., having at least one non-positive eigenvalue).

Note that \(\hat{\textbf{G}}\succ0\) implies that the existence of \(\hat{\textbf{G}}^{-1}\), but not vice versa. Discussion

scroll to top

Step 2

# fit <- lmer(DV ~ A * B + (1 | ID) + (0 + A | ID) + (0 + B | ID), df) # SAME
fit <- lmer(DV ~ A * B + (A + B || ID), df) # || syntax sets correlations to 0

## boundary (singular) fit: see help('isSingular')

VarCorr(fit) # inspect random effects standard deviations and correlations

##  Groups   Name        Std.Dev.  Corr        
##  ID       (Intercept)  0.012261             
##  ID.1     Aa1         59.842058             
##           Aa2         69.538445 -1.000      
##  ID.2     Bb1         57.281467             
##           Bb2         74.938080 1.000       
##           Bb3         48.654535 1.000  1.000
##  Residual             23.248088

(1 | ID): Random intercepts for ID, \(s_k\overset{\mathrm{i.i.d.}}{\sim}\mathcal{N}(0,\ \sigma_s^2)\)
(0 + A | ID): Uncorrelated random slopes for A across ID, \((\boldsymbol{\alpha s})_{k}\sim\boldsymbol{\mathcal{N}}(\boldsymbol{0},\ \boldsymbol{\Sigma}_{\alpha s})\)
(0 + B | ID): Uncorrelated random slopes for B across ID, \((\boldsymbol{\beta s})_{k}\sim\boldsymbol{\mathcal{N}}(\boldsymbol{0},\ \boldsymbol{\Sigma}_{\beta s})\)

# equivalent to  summary(aov(DV ~ A*B + Error(ID/(A+B)), df))
fit2 <- lmer(DV ~ A * B + (1 | ID) + (1 | A:ID) + (1 | B:ID), df)
VarCorr(fit2) # inspect random effects standard deviations

##  Groups   Name        Std.Dev.  
##  B:ID     (Intercept)  6.3118820
##  A:ID     (Intercept) 88.8421406
##  ID       (Intercept)  0.0077882
##  Residual             26.0911518

(1 | ID): Random intercepts for ID, \(s_k\overset{\mathrm{i.i.d.}}{\sim}\mathcal{N}(0,\ \sigma_s^2)\)
(1 | A:ID): Random intercepts for each subject’s response to levels of A, \((\alpha s)_{ik}\overset{\mathrm{i.i.d.}}{\sim}\mathcal{N}(0,\ \sigma_{\alpha s}^2)\)
(1 | B:ID): Random intercepts for each subject’s response to levels of B, \((\beta s)_{jk}\overset{\mathrm{i.i.d.}}{\sim}\mathcal{N}(0,\ \sigma_{\beta s}^2)\)

scroll to top

Step 3

Compare nested linear mixed-effects models with the likelihood ratio test (Pinheiro & Bates, 2000, chap. 2.4).

Note the message “refitting model(s) with ML (instead of REML)”.
REML is preferred for estimating variance components, but for model comparison, especially when fixed effects differ, ML is appropriate.

fit3 <- lmer(DV ~ A * B + (1 | ID) + (1 | A:ID) , df) # drop (1 | B:ID)

## boundary (singular) fit: see help('isSingular')

VarCorr(fit3) # inspect random effects standard deviations

##  Groups   Name        Std.Dev.  
##  A:ID     (Intercept) 8.8834e+01
##  ID       (Intercept) 4.8258e-06
##  Residual             2.6845e+01

(LRT_3_2 <- anova(fit3, fit2)) # likelihood ratio test

## refitting model(s) with ML (instead of REML)

## Data: df
## Models:
## fit3: DV ~ A * B + (1 | ID) + (1 | A:ID)
## fit2: DV ~ A * B + (1 | ID) + (1 | A:ID) + (1 | B:ID)
##      npar    AIC    BIC  logLik -2*log(L)  Chisq Df Pr(>Chisq)
## fit3    9 1219.3 1244.0 -600.66    1201.3                     
## fit2   10 1221.2 1248.6 -600.61    1201.2 0.1164  1      0.733

.5 * LRT_3_2$`Pr(>Chisq)`[2] # adjusted p-value  <==

## [1] 0.3664979

\(\mathcal{H}_0:\ \sigma_{\beta s}^2=0\) versus \(\mathcal{H}_1:\ \sigma_{\beta s}^2>0\)

The likelihood ratio test statistic under \(\mathcal{H}_0\) follows a mixture of \(\chi^2\)-distributions, not the standard \(\chi^2\). Specifically, it asymptotically follows a 50:50 mixture of (degenerate) \(\chi^2_0\)- and \(\chi^2_1\)-distributions due to the variance parameter lying on the boundary of the parameter space (Self & Liang, 1987; Stram & Lee, 1994; see also the Wilks’ theorem).

However, the anova() output may not show the adjusted \(p\)-value for boundary effects (Bates et al., 2015b, p. 25).

scroll to top

More Steps

Compare nested linear mixed-effects models with the likelihood ratio test (Bates et al., 2015b, p. 25; Pinheiro & Bates, 2000, chap. 2.4; Self & Liang, 1987; Stram & Lee, 1994).

“buildmer” R Package

Vignettes by Voeten (2021)

buildmer(DV ~ A * B + (A + B | ID), df) # automatically find and compare mixed models

## Determining predictor order

## Fitting via lm: DV ~ 1

## Currently evaluating LRT for: A, B

## Fitting via lm: DV ~ 1 + A

## Fitting via lm: DV ~ 1 + B

## Updating formula: DV ~ 1 + A

## Currently evaluating LRT for: B

## Fitting via lm: DV ~ 1 + A + B

## Updating formula: DV ~ 1 + A + B

## Currently evaluating LRT for: A:B

## Fitting via lm: DV ~ 1 + A + B + A:B

## Updating formula: DV ~ 1 + A + B + A:B

## Fitting via gam, with REML: DV ~ 1 + A + B + A:B

## Currently evaluating LRT for: 1 | ID

## Fitting via lmer, with REML: DV ~ 1 + A + B + A:B + (1 | ID)

## Updating formula: DV ~ 1 + A + B + A:B + (1 | ID)

## Currently evaluating LRT for: A | ID, B | ID

## Fitting via lmer, with REML: DV ~ 1 + A + B + A:B + (1 + A | ID)

## Fitting via lmer, with REML: DV ~ 1 + A + B + A:B + (1 + B | ID)

## boundary (singular) fit: see help('isSingular')

## Updating formula: DV ~ 1 + A + B + A:B + (1 + A | ID)

## Currently evaluating LRT for: B | ID

## Fitting via lmer, with REML: DV ~ 1 + A + B + A:B + (1 + A + B | ID)

## boundary (singular) fit: see help('isSingular')

## Ending the ordering procedure due to having reached the maximal
##     feasible model - all higher models failed to converge. The types of
##     convergence failure are: Singular fit

## Fitting ML and REML reference models

## Fitting via lmer, with REML: DV ~ 1 + A + B + A:B + (1 + A | ID)

## Fitting via lmer, with ML: DV ~ 1 + A + B + A:B + (1 + A | ID)

## Testing terms

## Fitting via lmer, with ML: DV ~ 1 + A + B + (1 + A | ID)

## Fitting via lmer, with REML: DV ~ 1 + A + B + A:B + (1 | ID)

##   grouping term     block       score Iteration          LRT
## 1     <NA>    1   NA NA 1          NA         1           NA
## 2     <NA>    A   NA NA A  -4.5792432         1           NA
## 3     <NA>    B   NA NA B  -2.0236820         1           NA
## 4     <NA>  A:B NA NA A:B  -0.1911587         1 1.083918e-01
## 5       ID    1   NA ID 1 -34.0003828         1           NA
## 6       ID    A   NA ID A -62.2097359         1 9.608486e-28

## Updating formula: DV ~ 1 + A + B + (1 + A | ID)

## Fitting ML and REML reference models

## Fitting via lmer, with REML: DV ~ 1 + A + B + (1 + A | ID)

## Fitting via lmer, with ML: DV ~ 1 + A + B + (1 + A | ID)

## Testing terms

## Fitting via lmer, with ML: DV ~ 1 + A + (1 + A | ID)

## Fitting via lmer, with REML: DV ~ 1 + A + B + (1 | ID)

##   grouping term   block      score Iteration          LRT
## 1     <NA>    1 NA NA 1         NA         2           NA
## 2     <NA>    A NA NA A  -4.579243         2           NA
## 3     <NA>    B NA NA B  -2.023682         2 1.284788e-08
## 5       ID    1 NA ID 1 -34.000383         2           NA
## 6       ID    A NA ID A -62.209736         2 7.964508e-28

## All terms are significant

## Finalizing by converting the model to lmerTest

## Linear mixed model fit by REML ['lmerModLmerTest']
## Formula: DV ~ 1 + A + B + (1 + A | ID)
##    Data: df
## REML criterion at convergence: 1178.062
## Random effects:
##  Groups   Name        Std.Dev. Corr 
##  ID       (Intercept)  76.15        
##           Aa2         128.87   -0.63
##  Residual              27.27        
## Number of obs: 114, groups:  ID, 19
## Fixed Effects:
## (Intercept)          Aa2          Bb2          Bb3  
##     436.772       44.930       40.211        9.211

scroll to top

Linear Mixed-Effects Models

Achieving Parsimony in Model Selection

Zhengxiao Wei, Farouk S. Nathoo, Michael E. J. Masson

2025-03-17

Getting Started

Backward Stepwise Elimination

Step 1

Step 2

Step 3

More Steps

“buildmer” R Package