For large data sets and large, complex models (lots of random-effects parameters, or for GLMMs also lots of fixed-effect parameters), it is fairly common to get convergence warnings. (A lot of these were removed in version 1.1-7, but they still happen.) Here Iâ€™m going to brain-dump/show a variety of examples of tests that rule out various hypotheses about whatâ€™s causing the problem.

- check for singularity
- lower convergence tolerances/run longer
- restart from the ending parameters
- try other optimizers
- recompute gradient/Hessian more precisely
- consider absolute vs.Â scaled gradient

```
library("lme4")
packageVersion("lme4")
```

`## [1] '1.1.8'`

```
## but should work equivalently with 1.1-7 (latest CRAN version)
library("numDeriv")
library("RCurl") ## to source() from Github
library("ggplot2"); theme_set(theme_bw())
library("reshape2")
library("plyr")
library("RColorBrewer")
```

Example from Stack Overflow:

```
source("SO_23478792_dat.R")
df$SUR.ID <- factor(df$SUR.ID)
df$replicate <- factor(df$replicate)
Rdet <- cbind(df$ValidDetections,df$FalseDetections)
Unit <- factor(1:length(df$ValidDetections))
```

```
m1 <- glmer(Rdet ~ tm:Area + tm:c.distance +
c.distance:Area + c.tm.depth:Area +
c.receiver.depth:Area + c.temp:Area +
c.wind:Area +
c.tm.depth + c.receiver.depth +
c.temp +c.wind + tm + c.distance + Area +
replicate +
(1|SUR.ID) + (1|Day) + (1|Unit) ,
data = df, family = binomial(link=logit))
```

```
## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl =
## control$checkConv, : Model failed to converge with max|grad| = 1.52673
## (tol = 0.001, component 17)
```

```
## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, : Model is nearly unidentifiable: very large eigenvalue
## - Rescale variables?;Model is nearly unidentifiable: large eigenvalue ratio
## - Rescale variables?
```

Data set size:

`nrow(df)`

`## [1] 220`

`length(getME(m1,"theta"))`

`## [1] 3`

`length(fixef(m1))`

`## [1] 16`

It doesnâ€™t *necessarily* mess up the fit, but large differences in the scales of parameters often lead to problems (especially with calculating standard deviations of fixed effects); we might as well clean up this problem before we proceed:

```
numcols <- grep("^c\\.",names(df))
dfs <- df
dfs[,numcols] <- scale(dfs[,numcols])
m1_sc <- update(m1,data=dfs)
```

```
## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl =
## control$checkConv, : Model failed to converge with max|grad| = 1.28888
## (tol = 0.001, component 19)
```

Gets rid of scaling warnings, but we still have a large relative gradient.

If the fit is singular or near-singular, there might be a higher chance of a false positive (weâ€™re not necessarily screening out gradient and Hessian checking on singular directions properly); a higher chance that the model has actually misconverged (because the optimization problem is difficult on the boundary); and a reasonable argument that the random effects model should be simplified.

The definition of singularity is that some of the constrained parameters of the random effects `theta`

parameters are on the boundary (equal to zero, or very very close to zero, say \(<10^{-6}\)):

```
tt <- getME(m1_sc,"theta")
ll <- getME(m1_sc,"lower")
min(tt[ll==0])
```

`## [1] 0.007956743`

Not a problem in this case.

Compare absolute and scaled gradient; compare gradient and Hessian with `numDeriv`

equivalents.

We compute the gradient and Hessian in a quick-but-less-accurate way; we can use a more precise (but slower) algorithm implemented by the `numDeriv`

package (Richardson extrapolation).

Extract pre-computed information:

```
derivs1 <- m1_sc@optinfo$derivs
sc_grad1 <- with(derivs1,solve(Hessian,gradient))
max(abs(sc_grad1))
```

`## [1] 1.288884`

One general problem is that large *scaled* gradients are often associated with small *absolute* gradients: we might decide that weâ€™re more interested in testing the (parallel) minimum of these two quantities:

`max(pmin(abs(sc_grad1),abs(derivs1$gradient)))`

`## [1] 0.07784106`

This is a lot smaller, although still larger than the tolerance we typically set (0.001).

What if we redo the calculations with `numDeriv`

?

```
dd <- update(m1_sc,devFunOnly=TRUE)
pars <- unlist(getME(m1_sc,c("theta","fixef")))
grad2 <- grad(dd,pars)
hess2 <- hessian(dd,pars)
sc_grad2 <- solve(hess2,grad2)
max(pmin(abs(sc_grad2),abs(grad2)))
```

`## [1] 0.07784101`

Not particularly different, in this case.

Try restarting from previous fit â€¦ restart didnâ€™t converge in 10000 evals, so bumped up max number of iterations.

```
ss <- getME(m1_sc,c("theta","fixef"))
m2 <- update(m1_sc,start=ss,control=glmerControl(optCtrl=list(maxfun=2e4)))
```

```
## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl =
## control$checkConv, : Model failed to converge with max|grad| = 1.11384
## (tol = 0.001, component 19)
```

Still have a problem.

Try `bobyqa`

for both phases â€“ current GLMM default is `bobyqa`

for first phase, Nelder-Mead for second phase. We are thinking about changing the default to `nloptwrap`

, which is generally much faster. As weâ€™ll see below, `nloptwrap`

would *not* help with the convergence warning in this case â€¦

```
m3 <- update(m1_sc,start=ss,control=glmerControl(optimizer="bobyqa",
optCtrl=list(maxfun=2e5)))
```

If we set the number of evaluations large enough so the optimization actually finishes, we do avoid the warning.

Use this recipe to source the `allFit`

script from Github (itâ€™s also been incorporated into the `mixed`

package):

```
afurl <- "https://raw.githubusercontent.com/lme4/lme4/master/misc/issues/allFit.R"
eval(parse(text=getURL(afurl)))
```

```
## Loading required package: optimx
## Loading required package: nloptr
## Loading required package: dfoptim
```

`aa <- allFit(m2)`

```
is.OK <- sapply(aa,is,"merMod") ## nlopt NELDERMEAD failed, others succeeded
aa.OK <- aa[is.OK]
lapply(aa.OK,function(x) x@optinfo$conv$lme4$messages)
```

```
## $bobyqa.
## NULL
##
## $Nelder_Mead.
## [1] "Model failed to converge with max|grad| = 1.24692 (tol = 0.001, component 19)"
##
## $optimx.nlminb
## NULL
##
## $`optimx.L-BFGS-B`
## [1] "Model failed to converge with max|grad| = 1.01503 (tol = 0.001, component 19)"
##
## $nloptwrap.NLOPT_LN_NELDERMEAD
## [1] "Model failed to converge with max|grad| = 1.27814 (tol = 0.001, component 19)"
##
## $nloptwrap.NLOPT_LN_BOBYQA
## [1] "Model failed to converge with max|grad| = 1.27814 (tol = 0.001, component 19)"
```

All but `nlminb`

and built-in `bobyqa`

give convergence warnings.

In this case, we can overcome the failure to converge by using a different optimizer. But how bad was the convergence failure, really, in this case? Should we have worried at all?

Log-likelihoods are all *approximately* the same: the two optimizers that finish withoug convergence warnings have log-likelihoods about 0.07 below the worst fits â€¦

`(lliks <- sort(sapply(aa.OK,logLik)))`

```
## nloptwrap.NLOPT_LN_NELDERMEAD nloptwrap.NLOPT_LN_BOBYQA
## -107.1693 -107.1693
## Nelder_Mead. optimx.L-BFGS-B
## -107.1161 -107.1130
## bobyqa. optimx.nlminb
## -107.1098 -107.1098
```

```
aa.fixef <- t(sapply(aa.OK,fixef))
aa.fixef.m <- melt(aa.fixef)
models <- levels(aa.fixef.m$Var1)
ylabs <- substr(models,1,3)
aa.fixef.m <- transform(aa.fixef.m,Var1=factor(Var1,levels=names(lliks)))
(gplot1 <- ggplot(aa.fixef.m,aes(x=value,y=Var1,colour=Var1))+geom_point()+
facet_wrap(~Var2,scale="free")+
scale_colour_brewer(palette="Dark2")+
scale_y_discrete(breaks=models,
labels=ylabs)+
labs(x="",y=""))
```

Coefficients of variation of fixed-effect parameter estimates:

`summary(unlist(daply(aa.fixef.m,"Var2",summarise,sd(value)/abs(mean(value)))))`

```
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.003513 0.017890 0.023470 0.021620 0.028250 0.036460
```

Not as interesting: all optimizers except N-M give very close to zero std.dev. for `Day`

and `SUR.ID`

```
aa.stddev <- t(sapply(aa.OK,function(x) sqrt(unlist(VarCorr(x)))))
print(aa.stddev,digits=3)
```

```
## Unit Day SUR.ID
## bobyqa. 0.687 1.38e-08 3.73e-08
## Nelder_Mead. 0.685 1.29e-03 2.98e-03
## optimx.nlminb 0.687 0.00e+00 0.00e+00
## optimx.L-BFGS-B 0.686 0.00e+00 0.00e+00
## nloptwrap.NLOPT_LN_NELDERMEAD 0.592 2.66e-05 5.78e-04
## nloptwrap.NLOPT_LN_BOBYQA 0.592 2.66e-05 5.78e-04
```

```
aa.stddev.m <- melt(aa.stddev)
aa.stddev.m <- transform(aa.stddev.m,Var1=factor(Var1,levels=names(lliks)))
gplot1 %+% aa.stddev.m
```

I previously tried the `lme4.0`

package for comparison, but I canâ€™t quite get it to work for various reasons (apparent `knitr`

/caching problems and/or `downdated X'X is not positive definite`

errors): presented here without evaluation.

```
library("lme4.0")
m5 <- glmer(Rdet ~ tm:Area + tm:c.distance +
c.distance:Area + c.tm.depth:Area +
c.receiver.depth:Area + c.temp:Area +
c.wind:Area +
c.tm.depth + c.receiver.depth +
c.temp +c.wind + tm + c.distance + Area +
replicate +
(1|SUR.ID) + (1|Day) + (1|Unit) ,
data = dfs, family = binomial(link=logit))
detach("package:lme4.0")
```