When fitting models to data sets, I sometimes follow a more data-driven (some might say: chaotic) approach. This often involves fitting multiple models with differing parameterizations so the same data set, or using different variables and/or codings of variables.

As long as the data set is small or the models are simple, this seldomly poses a problem (besides me getting confused). More recently, I started to fit more complex mixed effect models, often involving crossed random factors, to medium to large data sets. Although the great *lme4* package handles those kinds of models efficiently, I often (for example right now) find myself sitting in front of my computer waiting for results. This can become somewhat tedious. To speed up this process, I wrote a simple convenience function that uses *parallel::mclapply()* to estimate multiple modells at the same time. As the function requires *parallel::mclapply()*, it does not work on Windows [1].

```
f_lmer_mc = function(data, calls, mc.cores) {
require(parallel)
if (is.data.frame(data))
data = replicate(length(calls), data, simplify = F)
for (i in 1:length(data)) attr(data[[i]], "cll") = calls[i]
m.list = mclapply(data, function(i) eval(parse(text = attr(i, "cll"))),
mc.cores = mc.cores)
return(m.list)
}
```

The function is quite simple. I will demonstrate its use with the *sleepstudy* data from the *lme4* package. In principle, the function will work for all kind of models which have a similar call as lmer. Therefore, the package has to be loaded separately.

```
library(lme4)
```

```
## Loading required package: Matrix Loading required package: Rcpp
```

The function needs three inputs: data, calls, and mc.cores.

**data**: If you want to fit all models to the same data set, data is just the name of the data.frame. In or example: *sleepstudy*. The data.frame is replicated as many times as there are model calls. If you want to fit models to different data sets, the function accepts a list of data.frame objects.

**calls**: The function calls (in this case to *lmer()*) are stored as a character vector. If you use different data sets, the order of data sets in the data list has to be the same as the order of function calls. **Each call has to include “data = i”!** This could probably be easily made more convenient, but I am to lazy to figure out exactly how.

```
models = c("lmer(Reaction ~ 1 + (1 | Subject), data = i)", "lmer(Reaction ~ Days + (Days | Subject), data = i)",
"lmer(Reaction ~ Days + (1 | Subject) + (0 + Days | Subject), data = i)")
```

In this example, we fit different models to the sleepstudy data. The first model is an intercept-only model. The second call fits a latent growth curve model. The third model omits the correlation between intercept and slope in the growth curve model.

**mc.cores** gives mclapply() the number of cores to use for parallelization. It makes sense to use as many cores as you have models to fit. Consequently, it does not make to fit more models as you have cores. If you do, you have to wait longer before getting the first results.

We call the convenience function and save the results (objects of class lmerMod) in a list.

```
m.list = f_lmer_mc(sleepstudy, models, 3)
```

```
## Loading required package: parallel
```

Ofcourse only a neglectable amount of time is saved in this simple example.

The results of single models can be viewed like all list elements. For example the second model:

```
summary(m.list[[2]])
```

```
## Linear mixed model fit by REML ['lmerMod']
## Formula: Reaction ~ Days + (Days | Subject)
## Data: i
##
## REML criterion at convergence: 1744
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -3.954 -0.463 0.023 0.463 5.179
##
## Random effects:
## Groups Name Variance Std.Dev. Corr
## Subject (Intercept) 612.1 24.74
## Days 35.1 5.92 0.07
## Residual 654.9 25.59
## Number of obs: 180, groups: Subject, 18
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 251.41 6.82 36.8
## Days 10.47 1.55 6.8
##
## Correlation of Fixed Effects:
## (Intr)
## Days -0.138
```

If you want to see the results of all models, use:

```
lapply(m.list, summary)
```

```
## [[1]]
## Linear mixed model fit by REML ['lmerMod']
## Formula: Reaction ~ 1 + (1 | Subject)
## Data: i
##
## REML criterion at convergence: 1904
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -2.498 -0.550 -0.148 0.512 3.345
##
## Random effects:
## Groups Name Variance Std.Dev.
## Subject (Intercept) 1278 35.8
## Residual 1959 44.3
## Number of obs: 180, groups: Subject, 18
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 298.51 9.05 33
##
## [[2]]
## Linear mixed model fit by REML ['lmerMod']
## Formula: Reaction ~ Days + (Days | Subject)
## Data: i
##
## REML criterion at convergence: 1744
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -3.954 -0.463 0.023 0.463 5.179
##
## Random effects:
## Groups Name Variance Std.Dev. Corr
## Subject (Intercept) 612.1 24.74
## Days 35.1 5.92 0.07
## Residual 654.9 25.59
## Number of obs: 180, groups: Subject, 18
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 251.41 6.82 36.8
## Days 10.47 1.55 6.8
##
## Correlation of Fixed Effects:
## (Intr)
## Days -0.138
##
## [[3]]
## Linear mixed model fit by REML ['lmerMod']
## Formula: Reaction ~ Days + (1 | Subject) + (0 + Days | Subject)
## Data: i
##
## REML criterion at convergence: 1744
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -3.963 -0.463 0.020 0.465 5.186
##
## Random effects:
## Groups Name Variance Std.Dev.
## Subject (Intercept) 627.6 25.05
## Subject.1 Days 35.9 5.99
## Residual 653.6 25.57
## Number of obs: 180, groups: Subject, 18
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 251.41 6.89 36.5
## Days 10.47 1.56 6.7
##
## Correlation of Fixed Effects:
## (Intr)
## Days -0.184
```

And to compare the fits (make sure that a model comparison makes sense - the function runs no checks!):

```
do.call(anova, m.list)
```

```
## Warning: failed to find unique model names, assigning generic names
```

```
## refitting model(s) with ML (instead of REML)
```

```
## Data: i
## Models:
## MODEL1: Reaction ~ 1 + (1 | Subject)
## MODEL3: Reaction ~ Days + (1 | Subject) + (0 + Days | Subject)
## MODEL2: Reaction ~ Days + (Days | Subject)
## Df AIC BIC logLik deviance Chisq Chi Df Pr(>Chisq)
## MODEL1 3 1917 1926 -955 1911
## MODEL3 5 1762 1778 -876 1752 158.54 2 <2e-16 ***
## MODEL2 6 1764 1783 -876 1752 0.06 1 0.8
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
```

Most importantly, the function mostly makes sense if all estimations require an equal amount of time. You have to wait for the slowest estimation to finish before you get any results.

[1] That is, it will work on Windows machines, but without parallel model estimation. All models will be estimated one at a time, hence making the convenience function more or less useless. See http://stat.ethz.ch/R-manual/R-devel/library/parallel/doc/parallel.pdf