Factor-specific variances in R

Under construction!

library(lme4)
packageVersion("lme4")

## [1] '1.1.12'

library(nlme)
library(lattice)  ## for qqmath, dotplot
library("ggplot2"); theme_set(theme_bw())

A topic that has come up several times on the R list and elsewhere: (how) can one allow different variances among random effects within different categories? It turns out that this morphs into a couple of other topics of interest: narrowly, how can one fit a random effects model with diagonal variance-covariance model within a grouping factor? More generally, how can one fit random effects model with different variance-covariance structures in the random effects?

Easy example: factor-specific residual variances

We don’t even need a mixed model for this one, we can just use generalized least squares (gls() in the nlme package). Suppose \[ \begin{split} Y_{ij} & = \delta_i + \epsilon_{ij} \\ \epsilon_{ij} & \sim \text{Normal}(0,\sigma^2_i) ; \end{split} \] that is, the residual variance differs by group.

set.seed(101)
dsd <- c(1,2,3)
d <- expand.grid(f=factor(letters[1:3]),rep=1:100)
delta <- 4:6
d$y <- rnorm(nrow(d),mean=delta[d$f],sd=dsd[d$f])
ggplot(d,aes(f,y))+geom_boxplot()

library("nlme")
m1 <- gls(y~f-1,data=d,weights=varIdent(form=~1|f))

We could look at the whole thing, but restricting ourselves for brevity to the fixed-effect parameter estimates and the estimated variances (parameterized as ratios of residual variances in groups \(2 \dots n\) to the variance in the first group):

coef(m1)

##       fa       fb       fc 
## 4.041845 5.015904 5.619241

summary(m1$modelStruct$varStruct)

## Variance function:
##  Structure: Different standard deviations per stratum
##  Formula: ~1 | f 
##  Parameter estimates:
##        a        b        c 
## 1.000000 1.905709 3.094312

The linear mixed-effect (lme()) and nonlinear mixed-effect (nlme()) functions in the nlme package also accept a weights/varIdent argument of this type, to specify R-side (residual) heteroscedasticity.

Hard example: factor-specific random-effects variance

Example 1

Now consider the two-level model \[ \begin{split} Y_{ijk} & = \delta_i + b_{ij} + \epsilon_{ijk} \\ b_{ij} & \sim \text{Normal}(0,\sigma^2_{b_j}) \\ \epsilon_{ijk} & \sim \text{Normal}(0,\sigma^2_0) ; \end{split} \] that is, now the structured variance is at the level of the among-group variance. This question is asked in http://thread.gmane.org/gmane.comp.lang.r.lme4.devel/7107, although it’s not 100% clear whether the question really refers to scenario #1 above.

set.seed(101)
dsd <- c(1,2,3)
nblock <- 25
ntot <- 750
nfac <- 3
d <- expand.grid(f=letters[1:nfac],g=1:nblock,rep=seq(ntot/(nblock*nfac)))
d$fg <- with(d,interaction(g,f))
delta <- 4:6
## have to be a bit careful to get everything in the right order here ...
u <- rnorm(nblock*nfac,mean=0,sd=rep(dsd,each=nblock))
d$y <- rnorm(nrow(d),mean=delta[d$f],sd=0.2)+u[d$fg]
ggplot(d,aes(fg,y,colour=f))+geom_boxplot()+coord_flip()

This can be done in lme: badly, as follows:

lme(y~f-1,random=~f|g,data=d)

## Linear mixed-effects model fit by REML
##   Data: d 
##   Log-restricted-likelihood: -67.13305
##   Fixed: y ~ f - 1 
##       fa       fb       fc 
## 3.881319 4.691717 5.973926 
## 
## Random effects:
##  Formula: ~f | g
##  Structure: General positive-definite, Log-Cholesky parametrization
##             StdDev    Corr         
## (Intercept) 0.8490982 (Intr) fb    
## fb          2.0454336 -0.212       
## fc          2.9908972 -0.329  0.106
## Residual    0.1898101              
## 
## Number of Observations: 750
## Number of Groups: 25

Or correctly:

lme(y~f-1,random=list(g=pdDiag(~f)),data=d)

## Linear mixed-effects model fit by REML
##   Data: d 
##   Log-restricted-likelihood: -69.06322
##   Fixed: y ~ f - 1 
##       fa       fb       fc 
## 3.881319 4.691717 5.973926 
## 
## Random effects:
##  Formula: ~f | g
##  Structure: Diagonal
##         (Intercept)      fb       fc  Residual
## StdDev:   0.8483291 2.04441 2.989323 0.1898166
## 
## Number of Observations: 750
## Number of Groups: 25

It can only be done in lme4 the bad way (at present).

lmer(y~f-1+(f|g),data=d)

## Linear mixed model fit by REML ['lmerMod']
## Formula: y ~ f - 1 + (f | g)
##    Data: d
## REML criterion at convergence: 134.2661
## Random effects:
##  Groups   Name        Std.Dev. Corr       
##  g        (Intercept) 0.8491              
##           fb          2.0454   -0.21      
##           fc          2.9909   -0.33  0.11
##  Residual             0.1898              
## Number of obs: 750, groups:  g, 25
## Fixed Effects:
##    fa     fb     fc  
## 3.881  4.692  5.974

We get approximately the right standard deviation values, but with bogus correlations (the off-diagonal elements must be zero in this case because of the way we defined the model …)

W. Volterman points out that we can get the right variance components, although with some bogus information thrown in, by specifying an interaction of the nested blocks with the higher-level groups, as follows:

lmer(y~f-1+(fg|g),data=d)

However, for this example the model takes too long to run (see below for an example that does work, and a better way to do this).

Example 2

A much smaller example proposed by WV:

q3 <- structure(list(score = c(2L, 5L, 2L, 7L, 9L, 8L, 4L, 3L, 6L, 
4L, 2L, 6L, 10L, 8L, 9L, 8L, 6L, 2L, 3L, 1L, 1L, 3L, 2L, 8L, 
6L), course = structure(c(2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 3L, 3L, 3L, 3L, 3L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L
), .Label = c("C", "E", "G"), class = "factor"), section = structure(c(5L, 
5L, 5L, 6L, 6L, 7L, 7L, 7L, 7L, 7L, 8L, 8L, 9L, 9L, 9L, 1L, 2L, 
2L, 2L, 2L, 3L, 3L, 3L, 4L, 4L), .Label = c("C1", "C2", "C3", 
"C4", "E1", "E2", "E3", "G1", "G2"), class = "factor")), .Names = c("score", 
"course", "section"), class = "data.frame", row.names = c(NA, 
-25L))

q3.lme4<-lmer(score~0+course+(0+course|section),data=q3,
        control=lmerControl(check.nobs.vs.nRE="ignore"))

## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control
## $checkConv, : unable to evaluate scaled gradient

## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control
## $checkConv, : Model failed to converge: degenerate Hessian with 1 negative
## eigenvalues

summary(q3.lme4)

## Linear mixed model fit by REML ['lmerMod']
## Formula: score ~ 0 + course + (0 + course | section)
##    Data: q3
## Control: lmerControl(check.nobs.vs.nRE = "ignore")
## 
## REML criterion at convergence: 105.4
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -1.2852 -0.6592 -0.1555  0.6728  1.6392 
## 
## Random effects:
##  Groups   Name    Variance Std.Dev. Corr       
##  section  courseC  6.657   2.580               
##           courseE  4.648   2.156    -0.08      
##           courseG 11.133   3.337     0.31  0.20
##  Residual          3.282   1.812               
## Number of obs: 25, groups:  section, 9
## 
## Fixed effects:
##         Estimate Std. Error t value
## courseC    4.769      1.438   3.318
## courseE    5.247      1.384   3.790
## courseG    6.555      2.499   2.623
## 
## Correlation of Fixed Effects:
##         coursC coursE
## courseE 0.000        
## courseG 0.000  0.000 
## convergence code: 0
## unable to evaluate scaled gradient
## Model failed to converge: degenerate  Hessian with 1 negative eigenvalues

(x1<-ranef(q3.lme4))

## $section
##        courseC    courseE     courseG
## C1  2.16396546 -0.1421228  0.87972396
## C2 -1.57514553  0.1034508 -0.64034907
## C3 -2.37843840  0.1562087 -0.96691435
## C4  1.78961847 -0.1175368  0.72753945
## E1  0.17108276 -1.8186815 -0.57620769
## E2 -0.19142031  2.0348781  0.64470466
## E3  0.02033755 -0.2161967 -0.06849697
## G1 -0.54124631 -0.2944998 -2.22653229
## G2  0.54124631  0.2944998  2.22653229

Plots include estimates which should be zero (but aren’t: they could be eliminated, with enough hacking around).

dotplot(ranef(q3.lme4,condVar=TRUE))

## $section

qqmath(ranef(q3.lme4,condVar=TRUE))

## $section

We can fit the same model with lme()

q3.nlme <- lme(score~0+course,random=~0+course|section,data=q3)
summary(q3.nlme)

## Linear mixed-effects model fit by REML
##  Data: q3 
##       AIC      BIC    logLik
##   125.387 136.2974 -52.69349
## 
## Random effects:
##  Formula: ~0 + course | section
##  Structure: General positive-definite, Log-Cholesky parametrization
##          StdDev   Corr         
## courseC  2.580088 coursC coursE
## courseE  2.155847 0            
## courseG  3.336581 0      0     
## Residual 1.811521              
## 
## Fixed effects: score ~ 0 + course 
##            Value Std.Error DF  t-value p-value
## courseC 4.769270  1.437556  6 3.317624  0.0161
## courseE 5.246728  1.384453  6 3.789748  0.0091
## courseG 6.554693  2.499413  6 2.622493  0.0395
##  Correlation: 
##         coursC coursE
## courseE 0            
## courseG 0      0     
## 
## Standardized Within-Group Residuals:
##        Min         Q1        Med         Q3        Max 
## -1.2851952 -0.6591829 -0.1554550  0.6727893  1.6392136 
## 
## Number of Observations: 25
## Number of Groups: 9

x2<-ranef(q3.nlme)

Estimated random effects are roughly the same (at least, the ones with any meaning are)

colGrp <- substr(names(x2),7,7)
rowGrp <- substr(rownames(x2),1,1)
x2mat <- as.matrix(x2)
x2mat[rowGrp[row(x2)]!=colGrp[col(x2)]] <- NA
x2 <- as.data.frame(x2mat)
all(abs(na.omit(unlist(x1[[1]]-x2)))<4e-6)

## [1] FALSE

We can do better with the development version of lme4 and some hacking, fitting by wrapping deviance function, setting relevant \(\theta\) parameters to zero. In the long run it would be better (more efficient) to change Lind/Lambdat in mkReTrms.

If the lower triangle of the Cholesky factor of a \(2 \times 2\) matrix is \(\{\theta_1,\theta_2,\theta_3\}\) then the matrix is \[ \left( \begin{array}{cc} \theta_1^2 & \theta_1 \theta_2 \\ \theta_1 \theta_2 & \theta_2^2 + \theta_3^2 \end{array} \right) \] So if we set \(\theta_2=0\) we get a diagonal matrix with variances \(\theta_1^2\), \(\theta_3^2\).

Implementation

See help("modular",package="lme4") for a discussion of the modular steps of lmer-fitting …

The first two modular steps process the formula and construct a deviance function:

lmod <- lFormula(score~0+course+(0+course|section),data=q3,
          control=lmerControl(check.nobs.vs.nRE="ignore"))
devfun <- do.call(mkLmerDevfun,lmod)

Now we need a wrapper function that will take a vector of diagonal elements and place them into the right positions to construct a Cholesky factor with the off-diagonal elements equal to zero … we use the lower component stored in the environment of the function (which is copied from the original deviance function) to figure out how long the vector should be and where the diagonal elements fall (these are identified as the elements with lower bounds equal to zero).

devfunw <- function(theta) {
  n <- length(lower)  ## from environment
  th <- numeric(n)
  diag_el <- which(lower==0)
  th[diag_el] <- theta
  devfun(th)
}
environment(devfunw) <- environment(devfun)

Now we test the function with simple inputs (\(\theta_1=\theta_2=\theta_3=1\)); run the function in with Nelder_Mead(); and use mkMerMod to construct the final output. (I would like to use the built-in optimizeLmer() function for the optimization step, but there are a few little glitches there: among other things, it would be nice if optimizeLmer() took a start argument!)

devfunw(c(1,1,1)) ## test

## [1] 106.2482

opt <- Nelder_Mead(devfunw,par=c(1,1,1))
(res <- mkMerMod(environment(devfun), opt, lmod$reTrms, fr = lmod$fr))

## Linear mixed model fit by REML ['lmerMod']
## REML criterion at convergence: 105.387
## Random effects:
##  Groups   Name    Std.Dev. Corr     
##  section  courseC 2.580             
##           courseE 2.156    0.00     
##           courseG 3.337    0.00 0.00
##  Residual         1.812             
## Number of obs: 25, groups:  section, 9
## Fixed Effects:
## courseC  courseE  courseG  
##   4.769    5.247    6.555

Build a more robust/complete function. It assumes there is a single RE grouping factor and that all off-diagonal elements of the Cholesky factor (corresponding to all off-diagonal elements of the variance-covariance matrix of the RE) are zero.

diagLmer <- function(formula,data,
                     control=lmerControl(check.nobs.vs.nRE="ignore")) {
   lmod <- lFormula(formula,data,control=control)
   devfun <- do.call(mkLmerDevfun, lmod)
   lower <- environment(devfun)$lower
   n <- length(lower)
   n0 <- (-1 + sqrt(1+8*n))/2  ## number of diagonal elements
   devfunw <- function(theta) {
     th <- numeric(n)
     diag_el <- which(lower==0)
     th[diag_el] <- theta
     devfun(th)
   }
   opt <- Nelder_Mead(devfunw,par=rep(1,n0),
                      lower=rep(0,n0))
   mkMerMod(environment(devfun), opt, lmod$reTrms, fr = lmod$fr)
}

print(diagLmer(score~0+course+(0+course|section),data=q3),cor=FALSE)

## Linear mixed model fit by REML ['lmerMod']
## REML criterion at convergence: 105.387
## Random effects:
##  Groups   Name    Std.Dev. Corr     
##  section  courseC 2.580             
##           courseE 2.156    0.00     
##           courseG 3.337    0.00 0.00
##  Residual         1.812             
## Number of obs: 25, groups:  section, 9
## Fixed Effects:
## courseC  courseE  courseG  
##   4.769    5.247    6.555

fit2 <- diagLmer(y~f-1+(f|g),data=d)
print(fit2,cor=FALSE)

## Linear mixed model fit by REML ['lmerMod']
## REML criterion at convergence: 138.1264
## Random effects:
##  Groups   Name        Std.Dev. Corr     
##  g        (Intercept) 0.8483            
##           fb          2.0444   0.00     
##           fc          2.9893   0.00 0.00
##  Residual             0.1898            
## Number of obs: 750, groups:  g, 25
## Fixed Effects:
##    fa     fb     fc  
## 3.881  4.692  5.974

Autogregressive models

This should probably be broken out into a separate document. I’m excited about it because it suggests a natural way to set up a multilevel GLMM with temporal autocorrelation:

create a factor f.obs with a separate value for each level
cross this factor with a single-level grouping factor one so that the random effects represent a single draw from a MVN distribution with an AR1 variance-covariance matrix;
the parameters of the RE model are \(\sigma^2\), \(\rho\). The Cholesky parameters \(\theta_{i=1 \dots n}\) (the first column) are \(\sigma \rho^{i-1}\); the Cholesky parameters beyond the first column in the \(j^\text{th}\) off-diagonal (where \(j=1\) represents the diagonal) are \(\sigma \rho^{j-1} \sqrt{1-\rho^2}\).

The variance-covariance matrix for \(n=3\) is: \[ \left( \begin{array}{ccx} \theta_1^2 & \theta_1 \theta_2 & \theta_1 \theta_3 \\ \theta_1 \theta_2 & \theta_2^2 + \theta_4^2 & \theta_2 \theta_3 + \theta_4 \theta_5 \\ \theta_1 \theta_3 & \theta_2 \theta_3 + \theta_4 \theta_5 & \theta_3^2 + \theta_5^2 + \theta_6^2 \end{array} \right) \]

For an AR1 model, by hand, this gives a Cholesky factor of: \[ \left( \begin{array}{ccx} 1 & 0 & 0 \\ \rho & \sqrt{1-\rho^2} & 0 \\ \rho^2 & \rho \sqrt{1-\rho^2 } & \rho \sqrt{1-\rho^2} \end{array} \right) \]

For what it’s worth, here’s the C code from the nlme package (src/corStruct.c) that returns the “transpose inverse square root factor” of an AR1 matrix (see ?corFactor.corStruct)

static void
AR1_fact(double *par, longint *n, double *mat, double *logdet)
{
    longint i, np1 = *n + 1;
    double aux = sqrt(1 - *par * (*par)), aux1 = - (*par)/aux;

    *logdet -= (*n - 1) * log(aux);
    aux = 1/aux;
    mat[0] = 1;
    for(i = 1; i < *n; i++) {
        mat[i * np1] = aux;
        mat[i + *n * (i - 1)] = aux1;
    }
}

ar1_chol <- function(par,n) {
  mat <- matrix(0,nrow=n,ncol=n)
  mat[1] <- 1
  aux <- sqrt(1-par^2)
  np1 <- n+1
  aux1 <- -par/aux
  aux <- 1/aux
  for (i in 1:(n-1)) {
    mat[i*np1+1] <- aux
    mat[i + n*(i-1)+1] <- aux1
  }
  mat
}
(a1 <- ar1_chol(0.5,3))

##            [,1]       [,2]     [,3]
## [1,]  1.0000000  0.0000000 0.000000
## [2,] -0.5773503  1.1547005 0.000000
## [3,]  0.0000000 -0.5773503 1.154701

a1 %*% t(a1)

##            [,1]       [,2]       [,3]
## [1,]  1.0000000 -0.5773503  0.0000000
## [2,] -0.5773503  1.6666667 -0.6666667
## [3,]  0.0000000 -0.6666667  1.6666667

library("nlme")
n <- 3
ii <- Initialize(corAR1(0.5),data=data.frame(x=1:n))
(mm <- matrix(corFactor(ii),nrow=n)) ## or corMatrix(ii,corr=FALSE)

##            [,1]       [,2]     [,3]
## [1,]  1.0000000  0.0000000 0.000000
## [2,] -0.5773503  1.1547005 0.000000
## [3,]  0.0000000 -0.5773503 1.154701

ss <- solve(mm)
ss %*% t(ss)

##      [,1] [,2] [,3]
## [1,] 1.00  0.5 0.25
## [2,] 0.50  1.0 0.50
## [3,] 0.25  0.5 1.00

chol(ss %*% t(ss))

##      [,1]      [,2]      [,3]
## [1,]    1 0.5000000 0.2500000
## [2,]    0 0.8660254 0.4330127
## [3,]    0 0.0000000 0.8660254

n <- 9
ii2 <- Initialize(corAR1(0.5),data=data.frame(x=1:n))
chol(corMatrix(ii2))

##       [,1]      [,2]      [,3]      [,4]      [,5]       [,6]       [,7]
##  [1,]    1 0.5000000 0.2500000 0.1250000 0.0625000 0.03125000 0.01562500
##  [2,]    0 0.8660254 0.4330127 0.2165064 0.1082532 0.05412659 0.02706329
##  [3,]    0 0.0000000 0.8660254 0.4330127 0.2165064 0.10825318 0.05412659
##  [4,]    0 0.0000000 0.0000000 0.8660254 0.4330127 0.21650635 0.10825318
##  [5,]    0 0.0000000 0.0000000 0.0000000 0.8660254 0.43301270 0.21650635
##  [6,]    0 0.0000000 0.0000000 0.0000000 0.0000000 0.86602540 0.43301270
##  [7,]    0 0.0000000 0.0000000 0.0000000 0.0000000 0.00000000 0.86602540
##  [8,]    0 0.0000000 0.0000000 0.0000000 0.0000000 0.00000000 0.00000000
##  [9,]    0 0.0000000 0.0000000 0.0000000 0.0000000 0.00000000 0.00000000
##             [,8]        [,9]
##  [1,] 0.00781250 0.003906250
##  [2,] 0.01353165 0.006765823
##  [3,] 0.02706329 0.013531647
##  [4,] 0.05412659 0.027063294
##  [5,] 0.10825318 0.054126588
##  [6,] 0.21650635 0.108253175
##  [7,] 0.43301270 0.216506351
##  [8,] 0.86602540 0.433012702
##  [9,] 0.00000000 0.866025404

By inspection, the first row/column is \(\rho^{i-1}\); the diagonals leading down therefrom are \(\rho^{i-1} \sqrt{1-\rho^2}\) …

Want the inverse-Cholesky factor for applying to residuals; want the Cholesky factor for applying to \(\theta\).

Thoughts on syntax, design, etc..

(More stuff that should really be separate)

Classes of variance-covariance matrices:

identity (MCMCglmm: cor)
constant variance (MCMCglmm: idv, nlme: pdIdent)
diagonal (MCMCglmm: idh, nlme: pdDiag)
compound symmetry (nlme: corCompSymm)
unstructured (MCMCglmm: us, nlme: pdSymm)
blocked (nlme: pdBlocked)
time series: AR1 etc.
spatial: Gaussian etc.
phylogenetic and pedigree: Brownian etc.
row/column, Toeplitz

Would be worth look at http://www.vsni.co.uk/software/asreml/htmlhelp/asreml/vcode.htm for comparison.

When designing this it would be worth following the nlme design to some extent, creating objects of some sort that, for specified parameters and dimensions, can return (1) correlation (or V-C) matrices [for reporting]; (2) Cholesky factors [for forward computation, when constructing \(L\) matrices for RE computation for G-side effects]; (3) inverse Cholesky factors [for computation of R-side effects], upon request. If possible it would be nice to make the design a little more transparent than nlme’s …