One of the major issues in recent releases of lme4 (1.1-6) forward has been the increased rate of model fits that are reported not to converge. Possibly contrary to popular belief, this increased rate of reported problems is not due to greater instability or other problems in the fitting algorithm itself; the algorithms themselves and their default settings have changed only slightly in recent releases (see the lme4 NEWS file). Rather, we introduced new convergence checks, so lots of models that seemed to fit with previous versions now report convergence issues (but in fact are fitting just as well, or as poorly, as before).

Along with the convergence tests, however, have come some false-positive warnings. These were particularly bad in 1.1-6; version 1.1-7 improved things considerably by testing the scaled gradient (i.e., the solution of \(\boldsymbol{H}\boldsymbol{s} = \boldsymbol{g}\), where \(\boldsymbol{H}\) is the estimated Hessian at the MLE, \(\boldsymbol{g}\) is the absolute gradient, and \(\boldsymbol{s}\) is the scaled gradient). To a first approximation, one can think of the scaled gradient as measuring the expected change in the log-likelihood over a scale of one standard error of parameter change, rather than on the scale of the original parameter. A further change (22 October 2014, since version 1.1-8 changed the test to find the maximum absolute value of the minimum of absolute and scaled gradient, to handle situations where the estimated likelihood surface was very flat (poorly determined parameters/large standard errors) so that the scaled gradient would be very large. However, the problem persists to some extent.

The examples below explore the dependence of the scaled gradient and computed Hessians for relatively well-behaved data sets as a function of (simulated) data set size. This does not tell us exactly where we should set the threshold for convergence warnings, but it does indicate that

  1. our thresholds are probably too strict for large data sets [for well-behaved examples below where we think we are getting the correct answers, our current rules would report convergence failures].
  2. the thresholds might be scaled to sample size, but as explored below this is difficult. If we deprecated use of nlminb for large problems, an overall scaled tolerance of 0.01 might be appropriate …
  3. we should switch from our current, fast-but-sloppy finite difference calculation of Hessians to the more accurate but slower Richardson extrapolation calculation implemented in the numDeriv package for large sample sizes - or turn off Hessian calculation entirely for large problems.
library("lme4")
library("ggplot2")
## preferred (cosmetic) settings
theme_set(theme_bw()+theme(panel.margin=grid::unit(0,"lines")))
library("gridExtra")
library("scales") ## for squish()
## with apologies for Hadleyverse 2 ...
library("tidyr")
library("dplyr")

Get data (previously computed):

load("penicillin_conv.RData")
id_vars <- c("data","optimizer","log10size","rep")
data_vars <- setdiff(names(res),id_vars)