One of the most expensive parts of fitting lme4 models is setting up the random-effects structure. The built-in refit function is designed to update models with a new value of the response vector, for uses like parametric bootstrapping. What about the case where the predictor variable differs, e.g. in a bioinformatic context where one wants to test the effects of many different genes? (This leaves aside the question of whether one should be fitting the whole model simultaneously with gene as a random effect, to implement a shrinkage estimator …)

This use case is not built into lme4 but can be handled fairly easily using the steps laid out in ?modular

library("lme4")
packageVersion("lme4")
## [1] '1.1.7'
library("rbenchmark")
set.seed(101)
genotypes <- c("A","B","C")
n <- 1000
ngenes <- 100
## sample cluster sizes up to a cumulative total of 1000
clustsize <- 1+rpois(n,lambda=1)
clust <- rep(1:n,clustsize)[1:n]
dd <- data.frame(fgluc=rnorm(n),
                 clust=factor(clust))
genes <- setNames(do.call(data.frame,
                 replicate(ngenes,sample(
                     x=genotypes,size=n,replace=TRUE),
                           simplify=FALSE)),
                  paste0("gene",1:ngenes))
dd2 <- cbind(dd,genes)

Vanilla function to refit from scratch (could also use update, but it wouldn’t actually save any time):

fitGene <- function(g) {
    f <- reformulate(c(g,"(1|clust)"),response="fgluc")
    lmer(f,data=dd2)
}

Now we will instead take the pre-computed information about the model structure (in particular the reTrms component which describes the RE structure and involves an expensive permutation computation) and fill in the new formula and fixed-effect model matrix for each new predictor variable. I thought we might save some more time by returning just the optimized parameters and not the whole model structure, but that apparently doesn’t make much difference … It might be possible to copy the existing structures at an even lower level (and hence gain a little bit more speed), but the example here is very low-hanging fruit and won’t require us to (1) mess around within the lower-level structures nor (2) think hard about which components of the model need to be updated when the predictor variable changes.

lmod0 <- lFormula(fgluc ~ 1 + (1|clust), dd2) ## basic structure
refitGene <- function(g,retmod=TRUE) {
    ## set up new fixed and full formulas
    f0 <- reformulate(g,response="fgluc")
    f <- reformulate(c(g,"(1|clust)"),response="fgluc")
    ## copy baseline structure and replace relevant pieces
    lmod <- lmod0
    lmod$formula <- f
    lmod$X <- model.matrix(f0,data=dd2)
    ## now finish the fit (construct dev fun, optimize,
    ##  optionally return the full model)
    devfun <- do.call(mkLmerDevfun, lmod)
    opt <- optimizeLmer(devfun)
    if (!retmod) opt else {
        mkMerMod(environment(devfun), opt, lmod$reTrms, fr = lmod$fr)
    }
}

fitAll <- function(...) {
    lapply(grep("^gene",names(dd2),value=TRUE),...)
}

Test: the two approaches come out with cosmetic differences, but the fits are really the same.

sumfun <- function(x) {
    c(logLik(x),fixef(x),getME(x,"theta"))
}
all.equal(sumfun(fitGene("gene1")),sumfun(refitGene("gene1")))
## [1] TRUE
all.equal(sumfun(fitGene("gene85")),sumfun(refitGene("gene85")))
## [1] TRUE

How does it work?

benchmark(fitAll(fitGene),
          fitAll(refitGene),
          fitAll(refitGene,retmod=FALSE),
          replications=10,
          columns = c("test", "replications", "elapsed", 
                         "relative"))
##                                test replications elapsed relative
## 1                   fitAll(fitGene)           10  39.143    2.425
## 3 fitAll(refitGene, retmod = FALSE)           10  16.140    1.000
## 2                 fitAll(refitGene)           10  19.290    1.195

When I ran this on Linux/r-devel/lme4 1.1-8 I got about a 70% speedup and not much additional benefit for skipping the merMod-construction step. I guess the benefits will vary by platform??