lme4 convergence warnings: troubleshooting

For large data sets and large, complex models (lots of random-effects parameters, or for GLMMs also lots of fixed-effect parameters), it is fairly common to get convergence warnings. (A lot of these were removed in version 1.1-7, but they still happen.) Here I’m going to brain-dump/show a variety of examples of tests that rule out various hypotheses about what’s causing the problem.

check for singularity
lower convergence tolerances/run longer
restart from the ending parameters
try other optimizers
recompute gradient/Hessian more precisely
consider absolute vs. scaled gradient

library("lme4")
packageVersion("lme4")

## [1] '1.1.8'

## but should work equivalently with 1.1-7 (latest CRAN version)
library("numDeriv")
library("RCurl") ## to source() from Github
library("ggplot2"); theme_set(theme_bw())
library("reshape2")
library("plyr")
library("RColorBrewer")

Example

Example from Stack Overflow:

source("SO_23478792_dat.R")
df$SUR.ID <- factor(df$SUR.ID)
df$replicate <- factor(df$replicate)
Rdet <- cbind(df$ValidDetections,df$FalseDetections)
Unit <- factor(1:length(df$ValidDetections))

m1 <- glmer(Rdet ~ tm:Area + tm:c.distance +
            c.distance:Area + c.tm.depth:Area +
            c.receiver.depth:Area + c.temp:Area +
            c.wind:Area +
            c.tm.depth + c.receiver.depth +
            c.temp +c.wind + tm + c.distance + Area +
            replicate +
            (1|SUR.ID) + (1|Day) + (1|Unit) ,
            data = df, family = binomial(link=logit))

## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl =
## control$checkConv, : Model failed to converge with max|grad| = 1.52673
## (tol = 0.001, component 17)

## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, : Model is nearly unidentifiable: very large eigenvalue
##  - Rescale variables?;Model is nearly unidentifiable: large eigenvalue ratio
##  - Rescale variables?

Data set size:

nrow(df)

## [1] 220

length(getME(m1,"theta"))

## [1] 3

length(fixef(m1))

## [1] 16

Rescale and center continuous parameters

It doesn’t necessarily mess up the fit, but large differences in the scales of parameters often lead to problems (especially with calculating standard deviations of fixed effects); we might as well clean up this problem before we proceed:

numcols <- grep("^c\\.",names(df))
dfs <- df
dfs[,numcols] <- scale(dfs[,numcols])
m1_sc <- update(m1,data=dfs)

## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl =
## control$checkConv, : Model failed to converge with max|grad| = 1.28888
## (tol = 0.001, component 19)

Gets rid of scaling warnings, but we still have a large relative gradient.

Check singularity

If the fit is singular or near-singular, there might be a higher chance of a false positive (we’re not necessarily screening out gradient and Hessian checking on singular directions properly); a higher chance that the model has actually misconverged (because the optimization problem is difficult on the boundary); and a reasonable argument that the random effects model should be simplified.

The definition of singularity is that some of the constrained parameters of the random effects theta parameters are on the boundary (equal to zero, or very very close to zero, say \(<10^{-6}\)):

tt <- getME(m1_sc,"theta")
ll <- getME(m1_sc,"lower")
min(tt[ll==0])

## [1] 0.007956743

Not a problem in this case.

Double-checking gradient calculations

Compare absolute and scaled gradient; compare gradient and Hessian with numDeriv equivalents.

We compute the gradient and Hessian in a quick-but-less-accurate way; we can use a more precise (but slower) algorithm implemented by the numDeriv package (Richardson extrapolation).

Extract pre-computed information:

derivs1 <- m1_sc@optinfo$derivs
sc_grad1 <- with(derivs1,solve(Hessian,gradient))
max(abs(sc_grad1))

## [1] 1.288884

One general problem is that large scaled gradients are often associated with small absolute gradients: we might decide that we’re more interested in testing the (parallel) minimum of these two quantities:

max(pmin(abs(sc_grad1),abs(derivs1$gradient)))

## [1] 0.07784106

This is a lot smaller, although still larger than the tolerance we typically set (0.001).

What if we redo the calculations with numDeriv?

dd <- update(m1_sc,devFunOnly=TRUE)
pars <- unlist(getME(m1_sc,c("theta","fixef")))
grad2 <- grad(dd,pars)
hess2 <- hessian(dd,pars)
sc_grad2 <- solve(hess2,grad2)
max(pmin(abs(sc_grad2),abs(grad2)))

## [1] 0.07784101

Not particularly different, in this case.

Restart

Try restarting from previous fit … restart didn’t converge in 10000 evals, so bumped up max number of iterations.

ss <- getME(m1_sc,c("theta","fixef"))
m2 <- update(m1_sc,start=ss,control=glmerControl(optCtrl=list(maxfun=2e4)))

## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl =
## control$checkConv, : Model failed to converge with max|grad| = 1.11384
## (tol = 0.001, component 19)

Still have a problem.

Try a different optimizer

Try bobyqa for both phases – current GLMM default is bobyqa for first phase, Nelder-Mead for second phase. We are thinking about changing the default to nloptwrap, which is generally much faster. As we’ll see below, nloptwrap would not help with the convergence warning in this case …

m3 <- update(m1_sc,start=ss,control=glmerControl(optimizer="bobyqa",
                            optCtrl=list(maxfun=2e5)))

If we set the number of evaluations large enough so the optimization actually finishes, we do avoid the warning.

Try lots of different optimizers

Use this recipe to source the allFit script from Github (it’s also been incorporated into the mixed package):

afurl <- "https://raw.githubusercontent.com/lme4/lme4/master/misc/issues/allFit.R"
eval(parse(text=getURL(afurl)))

## Loading required package: optimx
## Loading required package: nloptr
## Loading required package: dfoptim

aa <- allFit(m2)

is.OK <- sapply(aa,is,"merMod")  ## nlopt NELDERMEAD failed, others succeeded
aa.OK <- aa[is.OK]
lapply(aa.OK,function(x) x@optinfo$conv$lme4$messages)

## $bobyqa.
## NULL
## 
## $Nelder_Mead.
## [1] "Model failed to converge with max|grad| = 1.24692 (tol = 0.001, component 19)"
## 
## $optimx.nlminb
## NULL
## 
## $`optimx.L-BFGS-B`
## [1] "Model failed to converge with max|grad| = 1.01503 (tol = 0.001, component 19)"
## 
## $nloptwrap.NLOPT_LN_NELDERMEAD
## [1] "Model failed to converge with max|grad| = 1.27814 (tol = 0.001, component 19)"
## 
## $nloptwrap.NLOPT_LN_BOBYQA
## [1] "Model failed to converge with max|grad| = 1.27814 (tol = 0.001, component 19)"

All but nlminb and built-in bobyqa give convergence warnings.

So what?

In this case, we can overcome the failure to converge by using a different optimizer. But how bad was the convergence failure, really, in this case? Should we have worried at all?

Log-likelihoods are all approximately the same: the two optimizers that finish withoug convergence warnings have log-likelihoods about 0.07 below the worst fits …

(lliks <- sort(sapply(aa.OK,logLik)))

## nloptwrap.NLOPT_LN_NELDERMEAD     nloptwrap.NLOPT_LN_BOBYQA 
##                     -107.1693                     -107.1693 
##                  Nelder_Mead.               optimx.L-BFGS-B 
##                     -107.1161                     -107.1130 
##                       bobyqa.                 optimx.nlminb 
##                     -107.1098                     -107.1098

aa.fixef <- t(sapply(aa.OK,fixef))
aa.fixef.m <- melt(aa.fixef)
models <- levels(aa.fixef.m$Var1)
ylabs <- substr(models,1,3)
aa.fixef.m <- transform(aa.fixef.m,Var1=factor(Var1,levels=names(lliks)))
(gplot1 <- ggplot(aa.fixef.m,aes(x=value,y=Var1,colour=Var1))+geom_point()+
     facet_wrap(~Var2,scale="free")+
         scale_colour_brewer(palette="Dark2")+
             scale_y_discrete(breaks=models,
                              labels=ylabs)+
                                  labs(x="",y=""))

Coefficients of variation of fixed-effect parameter estimates:

summary(unlist(daply(aa.fixef.m,"Var2",summarise,sd(value)/abs(mean(value)))))

##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
## 0.003513 0.017890 0.023470 0.021620 0.028250 0.036460

Not as interesting: all optimizers except N-M give very close to zero std.dev. for Day and SUR.ID

aa.stddev <- t(sapply(aa.OK,function(x) sqrt(unlist(VarCorr(x)))))
print(aa.stddev,digits=3)

##                                Unit      Day   SUR.ID
## bobyqa.                       0.687 1.38e-08 3.73e-08
## Nelder_Mead.                  0.685 1.29e-03 2.98e-03
## optimx.nlminb                 0.687 0.00e+00 0.00e+00
## optimx.L-BFGS-B               0.686 0.00e+00 0.00e+00
## nloptwrap.NLOPT_LN_NELDERMEAD 0.592 2.66e-05 5.78e-04
## nloptwrap.NLOPT_LN_BOBYQA     0.592 2.66e-05 5.78e-04

aa.stddev.m <- melt(aa.stddev)
aa.stddev.m <- transform(aa.stddev.m,Var1=factor(Var1,levels=names(lliks)))
gplot1 %+% aa.stddev.m

I previously tried the lme4.0 package for comparison, but I can’t quite get it to work for various reasons (apparent knitr/caching problems and/or downdated X'X is not positive definite errors): presented here without evaluation.

library("lme4.0")
m5 <- glmer(Rdet ~ tm:Area + tm:c.distance +
            c.distance:Area + c.tm.depth:Area +
            c.receiver.depth:Area + c.temp:Area +
            c.wind:Area +
            c.tm.depth + c.receiver.depth +
            c.temp +c.wind + tm + c.distance + Area +
            replicate +
            (1|SUR.ID) + (1|Day) + (1|Unit) ,
            data = dfs, family = binomial(link=logit))
detach("package:lme4.0")