For large data sets and large, complex models (lots of random-effects parameters, or for GLMMs also lots of fixed-effect parameters), it is fairly common to get convergence warnings. (A lot of these were removed in version 1.1-7, but they still happen.) Here I’m going to brain-dump/show a variety of examples of tests that rule out various hypotheses about what’s causing the problem.
library("lme4")
packageVersion("lme4")
## [1] '1.1.8'
## but should work equivalently with 1.1-7 (latest CRAN version)
library("numDeriv")
library("RCurl") ## to source() from Github
library("ggplot2"); theme_set(theme_bw())
library("reshape2")
library("plyr")
library("RColorBrewer")
Example from Stack Overflow:
source("SO_23478792_dat.R")
df$SUR.ID <- factor(df$SUR.ID)
df$replicate <- factor(df$replicate)
Rdet <- cbind(df$ValidDetections,df$FalseDetections)
Unit <- factor(1:length(df$ValidDetections))
m1 <- glmer(Rdet ~ tm:Area + tm:c.distance +
c.distance:Area + c.tm.depth:Area +
c.receiver.depth:Area + c.temp:Area +
c.wind:Area +
c.tm.depth + c.receiver.depth +
c.temp +c.wind + tm + c.distance + Area +
replicate +
(1|SUR.ID) + (1|Day) + (1|Unit) ,
data = df, family = binomial(link=logit))
## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl =
## control$checkConv, : Model failed to converge with max|grad| = 1.52673
## (tol = 0.001, component 17)
## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, : Model is nearly unidentifiable: very large eigenvalue
## - Rescale variables?;Model is nearly unidentifiable: large eigenvalue ratio
## - Rescale variables?
Data set size:
nrow(df)
## [1] 220
length(getME(m1,"theta"))
## [1] 3
length(fixef(m1))
## [1] 16
It doesn’t necessarily mess up the fit, but large differences in the scales of parameters often lead to problems (especially with calculating standard deviations of fixed effects); we might as well clean up this problem before we proceed:
numcols <- grep("^c\\.",names(df))
dfs <- df
dfs[,numcols] <- scale(dfs[,numcols])
m1_sc <- update(m1,data=dfs)
## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl =
## control$checkConv, : Model failed to converge with max|grad| = 1.28888
## (tol = 0.001, component 19)
Gets rid of scaling warnings, but we still have a large relative gradient.
If the fit is singular or near-singular, there might be a higher chance of a false positive (we’re not necessarily screening out gradient and Hessian checking on singular directions properly); a higher chance that the model has actually misconverged (because the optimization problem is difficult on the boundary); and a reasonable argument that the random effects model should be simplified.
The definition of singularity is that some of the constrained parameters of the random effects theta
parameters are on the boundary (equal to zero, or very very close to zero, say \(<10^{-6}\)):
tt <- getME(m1_sc,"theta")
ll <- getME(m1_sc,"lower")
min(tt[ll==0])
## [1] 0.007956743
Not a problem in this case.
Compare absolute and scaled gradient; compare gradient and Hessian with numDeriv
equivalents.
We compute the gradient and Hessian in a quick-but-less-accurate way; we can use a more precise (but slower) algorithm implemented by the numDeriv
package (Richardson extrapolation).
Extract pre-computed information:
derivs1 <- m1_sc@optinfo$derivs
sc_grad1 <- with(derivs1,solve(Hessian,gradient))
max(abs(sc_grad1))
## [1] 1.288884
One general problem is that large scaled gradients are often associated with small absolute gradients: we might decide that we’re more interested in testing the (parallel) minimum of these two quantities:
max(pmin(abs(sc_grad1),abs(derivs1$gradient)))
## [1] 0.07784106
This is a lot smaller, although still larger than the tolerance we typically set (0.001).
What if we redo the calculations with numDeriv
?
dd <- update(m1_sc,devFunOnly=TRUE)
pars <- unlist(getME(m1_sc,c("theta","fixef")))
grad2 <- grad(dd,pars)
hess2 <- hessian(dd,pars)
sc_grad2 <- solve(hess2,grad2)
max(pmin(abs(sc_grad2),abs(grad2)))
## [1] 0.07784101
Not particularly different, in this case.
Try restarting from previous fit … restart didn’t converge in 10000 evals, so bumped up max number of iterations.
ss <- getME(m1_sc,c("theta","fixef"))
m2 <- update(m1_sc,start=ss,control=glmerControl(optCtrl=list(maxfun=2e4)))
## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl =
## control$checkConv, : Model failed to converge with max|grad| = 1.11384
## (tol = 0.001, component 19)
Still have a problem.
Try bobyqa
for both phases – current GLMM default is bobyqa
for first phase, Nelder-Mead for second phase. We are thinking about changing the default to nloptwrap
, which is generally much faster. As we’ll see below, nloptwrap
would not help with the convergence warning in this case …
m3 <- update(m1_sc,start=ss,control=glmerControl(optimizer="bobyqa",
optCtrl=list(maxfun=2e5)))
If we set the number of evaluations large enough so the optimization actually finishes, we do avoid the warning.
Use this recipe to source the allFit
script from Github (it’s also been incorporated into the mixed
package):
afurl <- "https://raw.githubusercontent.com/lme4/lme4/master/misc/issues/allFit.R"
eval(parse(text=getURL(afurl)))
## Loading required package: optimx
## Loading required package: nloptr
## Loading required package: dfoptim
aa <- allFit(m2)
is.OK <- sapply(aa,is,"merMod") ## nlopt NELDERMEAD failed, others succeeded
aa.OK <- aa[is.OK]
lapply(aa.OK,function(x) x@optinfo$conv$lme4$messages)
## $bobyqa.
## NULL
##
## $Nelder_Mead.
## [1] "Model failed to converge with max|grad| = 1.24692 (tol = 0.001, component 19)"
##
## $optimx.nlminb
## NULL
##
## $`optimx.L-BFGS-B`
## [1] "Model failed to converge with max|grad| = 1.01503 (tol = 0.001, component 19)"
##
## $nloptwrap.NLOPT_LN_NELDERMEAD
## [1] "Model failed to converge with max|grad| = 1.27814 (tol = 0.001, component 19)"
##
## $nloptwrap.NLOPT_LN_BOBYQA
## [1] "Model failed to converge with max|grad| = 1.27814 (tol = 0.001, component 19)"
All but nlminb
and built-in bobyqa
give convergence warnings.
In this case, we can overcome the failure to converge by using a different optimizer. But how bad was the convergence failure, really, in this case? Should we have worried at all?
Log-likelihoods are all approximately the same: the two optimizers that finish withoug convergence warnings have log-likelihoods about 0.07 below the worst fits …
(lliks <- sort(sapply(aa.OK,logLik)))
## nloptwrap.NLOPT_LN_NELDERMEAD nloptwrap.NLOPT_LN_BOBYQA
## -107.1693 -107.1693
## Nelder_Mead. optimx.L-BFGS-B
## -107.1161 -107.1130
## bobyqa. optimx.nlminb
## -107.1098 -107.1098
aa.fixef <- t(sapply(aa.OK,fixef))
aa.fixef.m <- melt(aa.fixef)
models <- levels(aa.fixef.m$Var1)
ylabs <- substr(models,1,3)
aa.fixef.m <- transform(aa.fixef.m,Var1=factor(Var1,levels=names(lliks)))
(gplot1 <- ggplot(aa.fixef.m,aes(x=value,y=Var1,colour=Var1))+geom_point()+
facet_wrap(~Var2,scale="free")+
scale_colour_brewer(palette="Dark2")+
scale_y_discrete(breaks=models,
labels=ylabs)+
labs(x="",y=""))
Coefficients of variation of fixed-effect parameter estimates:
summary(unlist(daply(aa.fixef.m,"Var2",summarise,sd(value)/abs(mean(value)))))
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.003513 0.017890 0.023470 0.021620 0.028250 0.036460
Not as interesting: all optimizers except N-M give very close to zero std.dev. for Day
and SUR.ID
aa.stddev <- t(sapply(aa.OK,function(x) sqrt(unlist(VarCorr(x)))))
print(aa.stddev,digits=3)
## Unit Day SUR.ID
## bobyqa. 0.687 1.38e-08 3.73e-08
## Nelder_Mead. 0.685 1.29e-03 2.98e-03
## optimx.nlminb 0.687 0.00e+00 0.00e+00
## optimx.L-BFGS-B 0.686 0.00e+00 0.00e+00
## nloptwrap.NLOPT_LN_NELDERMEAD 0.592 2.66e-05 5.78e-04
## nloptwrap.NLOPT_LN_BOBYQA 0.592 2.66e-05 5.78e-04
aa.stddev.m <- melt(aa.stddev)
aa.stddev.m <- transform(aa.stddev.m,Var1=factor(Var1,levels=names(lliks)))
gplot1 %+% aa.stddev.m
I previously tried the lme4.0
package for comparison, but I can’t quite get it to work for various reasons (apparent knitr
/caching problems and/or downdated X'X is not positive definite
errors): presented here without evaluation.
library("lme4.0")
m5 <- glmer(Rdet ~ tm:Area + tm:c.distance +
c.distance:Area + c.tm.depth:Area +
c.receiver.depth:Area + c.temp:Area +
c.wind:Area +
c.tm.depth + c.receiver.depth +
c.temp +c.wind + tm + c.distance + Area +
replicate +
(1|SUR.ID) + (1|Day) + (1|Unit) ,
data = dfs, family = binomial(link=logit))
detach("package:lme4.0")