Choosing random effects variance structure in mixed models

Author

Emi Tanaka

Published

July 23, 2024

The simulation setup is the same as Ben Bolker’s notes here. I think of this as choosing variance structure of the random effects instead of random effect terms.

The full source code for this document can be found by clicking the “Code” text on the top right of this document or here.

library(lme4)
library(Matrix)
library(asreml)
library(tidyverse)
set.seed(101)
dd <- expand.grid(plot = factor(1:4),
                  ttt = factor(LETTERS[1:5]),
                  rep = 1:6)
## sample a random 5x5 correlation matrix
cchol <- nimble::rlkj_corr_cholesky(n = 1, eta = 1.2, p = 5) *
  ## scale by random SDs
  rlnorm(5, meanlog = 0, sdlog = 0.2)
theta <- t(cchol)[lower.tri(cchol, diag = TRUE)]
dd$y <- suppressMessages(
  simulate( ~ ttt + (ttt|plot),
            family = gaussian,
            newdata = dd,
            newparams = list(beta = seq(1, by = -0.5, length.out = 5),
                             theta = theta,
                             sigma = 1)))[[1]]

1 The maximal model in lme4

Hits a boundary issue. (I’m fitting no intercepts so the design matrix is nicer, but same thing with intercept).

m1 <- lmer(y ~ -1 + ttt + (ttt - 1|plot), data  = dd)
boundary (singular) fit: see help('isSingular')

The above model in mathematical form corresponds to:

\boldsymbol{y} = \mathbf{X}\boldsymbol{\beta} + \mathbf{Z}\boldsymbol{b} + \boldsymbol{e} where \mathbf{X} = \boldsymbol{1}_6 \otimes \mathbf{I}_5\otimes \boldsymbol{1}_4 (I think – need to double check), or as shown below:

model.matrix(~-1 + ttt, data = dd) |> image()

and \mathbf{Z} takes the form as below:

model.matrix(~-1 + ttt:plot, data = dd) |> image()

assuming \boldsymbol{b} \sim N(\boldsymbol{0}, \mathbf{\Sigma}\otimes\mathbf{I}_4) and \boldsymbol{e} \sim N(\boldsymbol{0},\sigma^2\mathbf{I}_{120}) where \mathbf{\Sigma} is a 5 \times 5 unstructured covariance matrix estimated as below:

VarCorr(m1)
 Groups   Name Std.Dev. Corr                   
 plot     tttA 0.93367                         
          tttB 1.66874  0.055                  
          tttC 1.47660  0.881 0.260            
          tttD 1.29841  0.907 0.429 0.802      
          tttE 1.38137  0.455 0.445 0.822 0.429
 Residual      0.98665                         

Note that there are 16 variance parameters to estimate in this maximal model.

So the estimate of \mathbf{\Sigma} is given as:

           tttA       tttB      tttC      tttD      tttE
[1,] 0.87173920 0.08513669 1.2145945 1.0993370 0.5869230
[2,] 0.08513669 2.78467872 0.6400763 0.9290503 1.0250053
[3,] 1.21459449 0.64007629 2.1803559 1.5378092 1.6767729
[4,] 1.09933703 0.92905029 1.5378092 1.6858594 0.7692284
[5,] 0.58692301 1.02500533 1.6767729 0.7692284 1.9081918

and \hat{\sigma}^2 =0.973.

Looking at \hat{\mathbf{\Sigma}}, it doesn’t give confidence that a compound symmetric model would be appropriate.

2 Model fitting using asreml

Instead of assuming a 5\times 5 unstructured matrix for \mathbf{\Sigma}, I’ll progressively build it up. I’m using \sigma_{ij} to denote the i,j-th element of \mathbf{\Sigma}. It’s a similar/same strategy as using a reduced rank or factor analytic model.

2.1 Diagonal model

Assume that \Sigma_{ij} = 0 for all i\neq j or \mathbf{\Sigma} = \begin{bmatrix}\sigma_{11} & 0 & 0 & 0 & 0\\ 0 & \sigma_{22} & 0 & 0 & 0\\ 0 & 0 & \sigma_{33} & 0 & 0\\ 0 & 0 & 0 & \sigma_{44} & 0\\ 0 & 0 & 0 & 0 & \sigma_{55}\\\end{bmatrix}.

ma0 <- asreml(y ~ ttt, random=~ id(plot):diag(ttt), data = dd)
summary(ma0)$varcomp |> 
  mutate(stddev = sqrt(component)) # std dev calculated to compare with lme4 fit
               component std.error   z.ratio bound %ch    stddev
plot:ttt!ttt_A 0.7451592 0.7493761 0.9943728     P 0.0 0.8632261
plot:ttt!ttt_B 2.7638679 2.3691121 1.1666261     P 0.8 1.6624885
plot:ttt!ttt_C 2.0835486 1.8379506 1.1336260     P 0.2 1.4434502
plot:ttt!ttt_D 1.6093789 1.4545710 1.1064286     P 0.0 1.2686130
plot:ttt!ttt_E 1.8571089 1.6556468 1.1216818     P 0.1 1.3627578
units!R        1.0321769 0.1459915 7.0701175     P 0.0 1.0159611

The variance component estimates for this fit are:

\hat{\sigma}_{11} =0.745, \hat{\sigma}_{22} =2.764, \hat{\sigma}_{33} =2.084, \hat{\sigma}_{44} =1.609, \hat{\sigma}_{55} =1.857, \hat{\sigma}^2 = 1.032.

So there are 6 variance parameters to estimate in this model.

2.2 FA1

Assume that \mathbf{\Sigma} = \mathbf{\Lambda}\mathbf{\Lambda}^\top + \mathbf{\Psi} where \mathbf{\Lambda} = (\lambda_{ij}) is a 5 \times 1 matrix and \mathbf{\Psi} = (\psi_{ij}) is a 5 \times 5 diagonal matrix. Basic idea: \mathbf{\Lambda}\mathbf{\Lambda}^\top + \mathbf{\Psi} is a lower-order approximate of the fully unstructured matrix \mathbf{\Sigma}.

mafa1 <- asreml(y ~ ttt, random=~ id(plot):fa(ttt, 1), data = dd)

There are no model convergence issue with the above fit.

summary(mafa1)$varcomp
                      component std.error   z.ratio bound %ch
plot:fa(ttt, 1)!A!var 0.0000000        NA        NA     B  NA
plot:fa(ttt, 1)!B!var 2.5740989 2.2497241 1.1441843     P 0.0
plot:fa(ttt, 1)!C!var 0.0000000        NA        NA     B  NA
plot:fa(ttt, 1)!D!var 0.3281680 0.4686836 0.7001910     P 0.1
plot:fa(ttt, 1)!E!var 0.8414349 0.8719854 0.9649644     P 0.0
plot:fa(ttt, 1)!A!fa1 0.8675657 0.4522498 1.9183329     P 0.0
plot:fa(ttt, 1)!B!fa1 0.4355132 1.0813018 0.4027674     P 0.2
plot:fa(ttt, 1)!C!fa1 1.4482954 0.6458269 2.2425441     P 0.1
plot:fa(ttt, 1)!D!fa1 1.1322973 0.6596411 1.7165352     P 0.0
plot:fa(ttt, 1)!E!fa1 1.0066507 0.8146676 1.2356582     P 0.0
units!R               1.0281760 0.1430686 7.1865932     P 0.0

From above we see that \hat{\psi}_{11} = \hat{\psi}_{33} = 0, which probably is why there is a singularity issue in lme4.

So the estimate of \mathbf{\Sigma} based on FA1 model is:

          [,1]      [,2]      [,3]      [,4]      [,5]
[1,] 0.7526702 0.3778363 1.2564914 0.9823423 0.8733357
[2,] 0.3778363 2.7637706 0.6307517 0.4931303 0.4384096
[3,] 1.2564914 0.6307517 2.0975596 1.6399009 1.4579276
[4,] 0.9823423 0.4931303 1.6399009 1.6102651 1.1398279
[5,] 0.8733357 0.4384096 1.4579276 1.1398279 1.8547806

From the FA1 model, \hat{\sigma}^2 = 1.028.

2.3 RR1 + DIAG

We can fit an equivalent model to FA1 using reduced rank + diagonal model as below:

marr1 <- asreml(y ~ ttt, random=~ id(plot):rr(ttt, 1) + id(plot):diag(ttt), data = dd)

The estimates are very similar to FA1 so the variance estimate will be pretty much the same.

summary(marr1)$varcomp
                         component std.error   z.ratio bound %ch
plot:ttt!ttt_A        1.020178e-06        NA        NA     B  NA
plot:ttt!ttt_B        2.574238e+00 2.2496502 1.1442835     P 0.0
plot:ttt!ttt_C        4.602842e-07        NA        NA     B  NA
plot:ttt!ttt_D        3.280683e-01 0.4688595 0.6997155     P 0.0
plot:ttt!ttt_E        8.416915e-01 0.8701676 0.9672751     P 0.2
plot:rr(ttt, 1)!A!var 0.000000e+00        NA        NA     F  NA
plot:rr(ttt, 1)!B!var 0.000000e+00        NA        NA     F  NA
plot:rr(ttt, 1)!C!var 0.000000e+00        NA        NA     F  NA
plot:rr(ttt, 1)!D!var 0.000000e+00        NA        NA     F  NA
plot:rr(ttt, 1)!E!var 0.000000e+00        NA        NA     F  NA
plot:rr(ttt, 1)!A!fa1 8.676867e-01 0.4524620 1.9177007     U 0.1
plot:rr(ttt, 1)!B!fa1 4.350655e-01 1.0811552 0.4024080     U 0.2
plot:rr(ttt, 1)!C!fa1 1.448247e+00 0.6454739 2.2436952     U 0.1
plot:rr(ttt, 1)!D!fa1 1.132690e+00 0.6597443 1.7168624     U 0.0
plot:rr(ttt, 1)!E!fa1 1.005937e+00 0.8137107 1.2362343     U 0.1
units!R               1.028218e+00 0.1430774 7.1864448     P 0.0

2.4 FA2

Now I assume that \mathbf{\Lambda} = (\lambda_{ij}) is a 5 \times 2 matrix instead. I constraint upper triangle of \mathbf{\Lambda} to be zero (in this case \lambda_{12} = 0) so there are 9 + 5 (for \psi_{11}, ..., \psi_{55}) + 1 (for \sigma) = 15 parameters to estimate.

mafa2 <- asreml(y ~ ttt, random=~ id(plot):fa(ttt, 2), data = dd) 
# didn't converge so do more iteration
mafa2 <- update(mafa2)

Took longer but the model eventually converged:

mafa2$converge
[1] TRUE
summary(mafa2)$varcomp
                         component std.error    z.ratio bound %ch
plot:fa(ttt, 2)!A!var  0.000000000        NA         NA     F  NA
plot:fa(ttt, 2)!B!var  2.363713543 2.0766121  1.1382547     P 0.2
plot:fa(ttt, 2)!C!var  0.000000000        NA         NA     F  NA
plot:fa(ttt, 2)!D!var  0.000000000        NA         NA     B  NA
plot:fa(ttt, 2)!E!var  0.009629968        NA         NA     B  NA
plot:fa(ttt, 2)!A!fa1  0.909446485 0.4472515  2.0334120     P 0.4
plot:fa(ttt, 2)!B!fa1  0.452365006 1.0950019  0.4131180     P 0.5
plot:fa(ttt, 2)!C!fa1  1.296933926 0.6890338  1.8822501     P 0.1
plot:fa(ttt, 2)!D!fa1  1.267340034 0.5348338  2.3695958     P 0.3
plot:fa(ttt, 2)!E!fa1  0.674115713 0.8094547  0.8328023     P 0.6
plot:fa(ttt, 2)!A!fa2  0.000000000        NA         NA     F  NA
plot:fa(ttt, 2)!B!fa2  0.452166330 1.0749079  0.4206559     P 1.1
plot:fa(ttt, 2)!C!fa2  0.685035475 0.6117287  1.1198355     P 0.7
plot:fa(ttt, 2)!D!fa2 -0.094438671 0.4955271 -0.1905822     P 0.3
plot:fa(ttt, 2)!E!fa2  1.195240880 0.5634649  2.1212339     P 0.1
units!R                1.001327706 0.1371023  7.3035061     P 0.0

So the estimate of \mathbf{\Sigma} based on FA2 model is:

          [,1]      [,2]      [,3]      [,4]      [,5]
[1,] 0.8270929 0.4114018 1.1794920 1.1525779 0.6130722
[2,] 0.4114018 2.7728020 0.8964375 0.5305983 0.8453940
[3,] 1.1794920 0.8964375 2.1513112 1.5789624 1.6930659
[4,] 1.1525779 0.5305983 1.5789624 1.6150694 0.7414569
[5,] 0.6130722 0.8453940 1.6930659 0.7414569 1.8926627

From the FA2 model, \hat{\sigma}^2 = 1.001.

The difference between FA2 and maximal model in lme4:

(i.e. \hat{\mathbf{\Sigma}}_{\text{FA2}} - \hat{\mathbf{\Sigma}}_{\text{maximal in lme4}})

            tttA        tttB        tttC        tttD        tttE
[1,] -0.04464629  0.32626507 -0.03510249  0.05324091  0.02614915
[2,]  0.32626507 -0.01187669  0.25636121 -0.39845200 -0.17961129
[3,] -0.03510249  0.25636121 -0.02904472  0.04115321  0.01629304
[4,]  0.05324091 -0.39845200  0.04115321 -0.07079002 -0.02777152
[5,]  0.02614915 -0.17961129  0.01629304 -0.02777152 -0.01552905

The maximal model requires 16 variance parameters to estimate so there’s no point fitting FA3 (which requires 18 parameters to estimate).

2.5 US

Here I’ll try fitting unstructured matrix for \mathbf{\Sigma} but it’s usually unstable – and yup, results in singularity issue in the average information matrix that doesn’t even fit the model. Somehow the ai.sing = TRUE doesn’t allow the fit either.

maus <- asreml(y ~ ttt, random=~ id(plot):us(ttt), data = dd, ai.sing = TRUE) 
Error in asreml(y ~ ttt, random = ~id(plot):us(ttt), data = dd, ai.sing = TRUE): Error   : 3 singularities in the Average Information matrix.
          : To continue the analysis anyway, try running it again with the optional argument ai.sing = TRUE