Using JuliaCall from within R

Several authors have found that fitting complex models to large observational data sets can take a very long time using the lme4 package and often produces warnings about failure to converge to the optimum.

I have often responded that it is comparatively straightforward to fit such models using the MixedModels package for Julia. The cost of doing things in this way is installing and learning to use Julia.

Recently the R package JuliaCall has been added to CRAN. This package allows an R user to call Julia functions from within R.

Packages to install

Before installing the JuliaCall package one must install Julia itself. The easiest way to do that is from the download site https://julialang.org/downloads/. An alternative is to install JuliaPro from https://juliacomputing.com

You may wish to start Julia once just to install some packages. This can be done from R through JuliaCall but I’m not sure how easy it is to try to install the RCall package through JuliaCall. I think it is easiest to start Julia and run

Pkg.add("RCall")
Pkg.add("MixedModels")

Then install JuliaCall in R and load it and lme4

library(JuliaCall)
library(lme4)
## Loading required package: Matrix

Example data sets

The MixedModels package for Julia contains several example data sets in .RData format. These can be attached to the R session

attach("~/.julia/v0.6/MixedModels/test/dat.rda")
## The following objects are masked from package:lme4:
## 
##     Arabidopsis, cake, cbpp, Dyestuff, Dyestuff2, grouseticks,
##     InstEval, Pastes, Penicillin, sleepstudy, VerbAgg

One of these example data sets is kb07 from a 2007 paper by Kronmueller and Barr (thanks to Dale Barr for providing these data).

str(kb07)
## 'data.frame':    1790 obs. of  10 variables:
##  $ G: Factor w/ 56 levels "30","31","34",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ H: Factor w/ 32 levels "1","2","3","4",..: 1 2 3 4 5 6 7 8 9 10 ...
##  $ Y: num  2267 3856 1567 1732 2660 ...
##  $ S: num  1 -1 -1 1 1 -1 -1 1 1 -1 ...
##  $ T: num  -1 1 -1 1 -1 1 -1 1 -1 1 ...
##  $ U: num  1 -1 -1 -1 -1 1 1 1 1 -1 ...
##  $ V: num  -1 -1 1 1 -1 -1 1 1 -1 -1 ...
##  $ W: num  1 1 1 -1 -1 -1 -1 1 1 1 ...
##  $ X: num  -1 -1 1 -1 1 1 -1 1 -1 -1 ...
##  $ Z: num  -1 1 -1 -1 1 -1 1 1 -1 1 ...

The names of the columns have been shrunk to single letters to cut down on typing. The two grouping factors, G and H, are “Subject” and “Item”, respectively. The covariates S, T, and U are two-level factors in a \(2^3\) design coded in the -1, +1 encoding favored by statisticians. The other covariates, 'V, W, X and Z are twp- and three-factor interactions.

Fitting the maximal model using lmer

Fitting the “maximal” model, as recommended by Barr et al. (2013), with lmer does take a relatively long time.

system.time(fm1 <- lmer(Y ~ 1+S+T+U+V+W+X+Z + (1+S+T+U+V+W+X+Z|G) + (1+S+T+U+V+W+X+Z|H),
            kb07, REML=FALSE))
## Warning in commonArgs(par, fn, control, environment()): maxfun < 10 *
## length(par)^2 is not recommended.
## Warning in optwrap(optimizer, devfun, getStart(start, rho$lower, rho$pp), :
## convergence code 1 from bobyqa: bobyqa -- maximum number of function
## evaluations exceeded
## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control
## $checkConv, : Model failed to converge with max|grad| = 0.00277275 (tol =
## 0.002, component 1)
##    user  system elapsed 
## 696.208   0.061 696.283
print(summary(fm1), corr=FALSE)
## Linear mixed model fit by maximum likelihood  ['lmerMod']
## Formula: Y ~ 1 + S + T + U + V + W + X + Z + (1 + S + T + U + V + W +  
##     X + Z | G) + (1 + S + T + U + V + W + X + Z | H)
##    Data: kb07
## 
##      AIC      BIC   logLik deviance df.resid 
##  28748.3  29193.0 -14293.2  28586.3     1709 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -3.0332 -0.5773 -0.1437  0.3964  4.7091 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev. Corr                         
##  G        (Intercept)  90764   301.27                                
##           S             5182    71.99   -0.43                        
##           T             5543    74.45   -0.47  0.07                  
##           U             7587    87.10    0.21 -0.20  0.41            
##           V             8829    93.96    0.20 -0.76 -0.54 -0.20      
##           W             1822    42.68    0.47 -0.53 -0.11 -0.44  0.28
##           X             7422    86.15   -0.10  0.13 -0.05 -0.86 -0.06
##           Z             3802    61.66   -0.48  0.41 -0.39 -0.09  0.18
##  H        (Intercept) 129829   360.32                                
##           S             1855    43.07   -0.34                        
##           T            62395   249.79   -0.68 -0.45                  
##           U             2948    54.30    0.20 -0.03 -0.18            
##           V             1043    32.30    0.57 -0.76  0.02  0.01      
##           W             1622    40.27    0.28 -0.02 -0.27  0.44 -0.21
##           X             4700    68.56    0.08 -0.23  0.21 -0.13 -0.26
##           Z             4820    69.43    0.04 -0.48  0.32 -0.69  0.65
##  Residual             399613   632.15                                
##             
##             
##             
##             
##             
##             
##             
##   0.70      
##  -0.78 -0.39
##             
##             
##             
##             
##             
##             
##   0.02      
##  -0.68 -0.10
##             
## Number of obs: 1790, groups:  G, 56; H, 32
## 
## Fixed effects:
##             Estimate Std. Error t value
## (Intercept) 2180.627     76.820  28.386
## S            -66.990     19.333  -3.465
## T           -333.881     47.667  -7.005
## U             78.987     21.234   3.720
## V             22.152     20.336   1.089
## W            -18.924     17.507  -1.081
## X              5.262     22.421   0.235
## Z            -23.951     21.019  -1.139
## convergence code: 1
## Model failed to converge with max|grad| = 0.00277275 (tol = 0.002, component 1)
## maxfun < 10 * length(par)^2 is not recommended.

The reason this takes a long time is because there are 72 covariance parameters to optimize, and each evaluation of the objective function requires solving a large penalized least squares problem for the fixed-effects parameters and the random effects. This is a huge number of parameters for a model like this. Almost inevitably convergence will be to singular covariance matrices for the random effects, as it is in this case, complicating matters further.

Let me be clear that I do not advocate fitting such models - I am simply showing how it can be done if you or the referees on your paper think you should.

Using JuliaCall

Begin with julia_setup. This can take a while the first time you do this because it must initialize the Julia system and load some packages.

system.time(j <- julia_setup())
## Julia version 0.6.2 at location /usr/local/src/julia-d386e40c17/bin will be used.
## Julia initiation...
## Finish Julia initiation.
## Loading setup script for JuliaCall...
## Finish loading setup script for JuliaCall.
##    user  system elapsed 
##  27.525   2.033  27.373

Load the MixedModels package.

j$library("MixedModels")

and copy the formula and data set from R to Julia

j$assign("form", formula(fm1))
j$assign("kb07", kb07)

Then fit the model

j$eval("@elapsed fm1 = fit(LinearMixedModel, form, kb07)")
## [1] 22.90206
j$eval("fm1")
## Julia Object of type MixedModels.LinearMixedModel{Float64}.
## Linear mixed model fit by maximum likelihood
##  Formula: Y ~ 1 + S + T + U + V + W + X + Z + ((1 + S + T + U + V + W + X + Z) | G) + ((1 + S + T + U + V + W + X + Z) | H)
##      logLik       -2 logLik         AIC            BIC      
##  -1.4293159×10⁴  2.8586318×10⁴  2.8748318×10⁴ 2.91930056×10⁴
## 
## Variance components:
##               Column     Variance   Std.Dev.    Corr.
##  G        (Intercept)   90715.0184 301.189340
##           S              5180.3901  71.974927 -0.43
##           T              5543.1348  74.452232 -0.47  0.08
##           U              7584.8816  87.091226  0.21 -0.20  0.41
##           V              8832.9843  93.983958  0.20 -0.76 -0.54 -0.20
##           W              1821.9809  42.684668  0.47 -0.53 -0.11 -0.44  0.28
##           X              7417.1453  86.122850 -0.10  0.13 -0.05 -0.86 -0.06  0.70
##           Z              3801.0318  61.652509 -0.47  0.41 -0.39 -0.09  0.18 -0.78 -0.39
##  H        (Intercept)  129690.6871 360.125932
##           S              1856.9765  43.092650 -0.34
##           T             62370.7020 249.741270 -0.68 -0.45
##           U              2950.1553  54.315332  0.20 -0.03 -0.18
##           V              1042.0598  32.280950  0.57 -0.76  0.02  0.02
##           W              1620.5108  40.255569  0.28 -0.03 -0.27  0.44 -0.21
##           X              4703.6870  68.583431  0.08 -0.24  0.21 -0.13 -0.26  0.02
##           Z              4821.2335  69.435103  0.04 -0.47  0.32 -0.68  0.65 -0.69 -0.10
##  Residual              399627.0136 632.160592
##  Number of obs: 1790; levels of grouping factors: 56, 32
## 
##   Fixed-effects parameters:
##              Estimate Std.Error  z value P(>|z|)
## (Intercept)   2180.63   76.7856  28.3989  <1e-99
## S              -66.99   19.3346 -3.46478  0.0005
## T            -333.881   47.6587 -7.00566  <1e-11
## U              78.987    21.235  3.71967  0.0002
## V             22.1518   20.3368  1.08925  0.2760
## W            -18.9243   17.5061 -1.08101  0.2797
## X             5.26182   22.4216 0.234677  0.8145
## Z             -23.951   21.0197 -1.13946  0.2545

Fitting the model in Julia using RCall

I think of fitting a model in Julia using JuliaCall as “pushing” the data to Julia. An alternative, if you are willing to work in the Julia REPL (Read-Eval-Print-Loop) is to “pull” the data and formula using RCall. Here’s the same example in the Julia REPL.

julia> using MixedModels, RCall

R> attach("~/.julia/v0.6/MixedModels/test/dat.rda")

R> form <- Y ~ 1+S+T+U+V+W+X+Z + (1+S+T+U+V+W+X+Z|G) + (1+S+T+U+V+W+X+Z|H)

julia> fm1 = fit(LinearMixedModel, rcopy(R"form"), rcopy(R"kb07"))
Linear mixed model fit by maximum likelihood
 Formula: Y ~ 1 + S + T + U + V + W + X + Z + ((1 + S + T + U + V + W + X + Z) | G) + ((1 + S + T + U + V + W + X + Z) | H)
     logLik        -2 logLik          AIC             BIC       
 -1.42931611×10⁴  2.85863221×10⁴  2.87483221×10⁴  2.91930097×10⁴

Variance components:
              Column     Variance   Std.Dev.    Corr.
 G        (Intercept)   90797.8433 301.326805
          S              5186.2898  72.015900 -0.43
          T              5545.5545  74.468480 -0.47  0.08
          U              7590.2433  87.122003  0.21 -0.20  0.41
          V              8839.5027  94.018630  0.20 -0.76 -0.54 -0.20
          W              1822.8053  42.694324  0.47 -0.53 -0.11 -0.44  0.29
          X              7417.7267  86.126226 -0.10  0.13 -0.05 -0.86 -0.06  0.70
          Z              3800.9670  61.651983 -0.48  0.41 -0.39 -0.09  0.18 -0.78 -0.39
 H        (Intercept)  129801.4428 360.279673
          S              1855.0633  43.070446 -0.34
          T             62410.5894 249.821115 -0.68 -0.45
          U              2957.5947  54.383773  0.20 -0.03 -0.18
          V              1038.0660  32.219031  0.57 -0.75  0.02  0.01
          W              1608.1394  40.101613  0.28 -0.05 -0.27  0.44 -0.20
          X              4698.6232  68.546504  0.08 -0.25  0.21 -0.13 -0.26  0.01
          Z              4836.0681  69.541844  0.04 -0.46  0.32 -0.68  0.65 -0.69 -0.10
 Residual              399601.5053 632.140416
 Number of obs: 1790; levels of grouping factors: 56, 32

  Fixed-effects parameters:
             Estimate Std.Error  z value P(>|z|)
(Intercept)   2180.63   76.8177   28.387  <1e-99
S            -66.9899   19.3354 -3.46463  0.0005
T            -333.881   47.6721  -7.0037  <1e-11
U             78.9869   21.2424  3.71837  0.0002
V             22.1517   20.3362  1.08927  0.2760
W            -18.9244   17.4951  -1.0817  0.2794
X             5.26191    22.418 0.234719  0.8144
Z            -23.9509   21.0303 -1.13888  0.2548


julia> @time fit(LinearMixedModel, rcopy(R"form"), rcopy(R"kb07"));
## Error: Error happens in Julia.
## REvalError:
  3.571324 seconds (1.22 M allocations: 74.571 MiB, 3.83% gc time)

julia> fm1.optsum
Initial parameter vector: [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0  …  1.0, 0.0, 0.0, 0.0, 1.0, 
0.0, 0.0, 1.0, 0.0, 1.0]
Initial objective value:  30014.369768606295

Optimizer (from NLopt):   LN_BOBYQA
Lower bounds:             [0.0, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, 0.0, -Inf  …  0.0, -Inf, -Inf, 
-Inf, 0.0, -Inf, -Inf, 0.0, -Inf, 0.0]
ftol_rel:                 1.0e-12
ftol_abs:                 1.0e-8
xtol_rel:                 0.0
xtol_abs:                 [1.0e-10, 1.0e-10, 1.0e-10, 1.0e-10, 1.0e-10, 1.0e-10, 1.0e-10, 1.0e-10, 1.0e-10, 1.0e-10  …  1.0e-10, 1.0e-10, 1.0e-10, 1.0e-10, 1.0e-10, 1.0e-10, 1.0e-10, 1.0e-10, 1.0e-10, 1.0e-10]
initial_step:             [0.75, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.75, 1.0  …  0.75, 1.0, 1.0, 1.0, 0.75, 1.0, 1.0, 0.75, 1.0, 0.75]
maxfeval:                 -1

Function evaluations:     2018
Final parameter vector:   [0.476677, -0.0494935, -0.0558944, 0.0295759, 0.0292159, 0.0319406, -0.0139343,
 -0.0463468, 0.102611, -0.0170349  …  0.0265041, -0.0488983, -0.0548502, 0.0604581, 0.00382623, -0.0032048, -0.00157491, 0.0, 8.21222e-5, 0.0]
Final objective value:    28586.322101083984
Return code:              FTOL_REACHED

julia> versioninfo()
Julia Version 0.6.2
Commit d386e40 (2017-12-13 18:08 UTC)
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: Intel(R) Core(TM) i5-3570 CPU @ 3.40GHz
  WORD_SIZE: 64
  BLAS: libmkl_rt
  LAPACK: libmkl_rt
  LIBM: libopenlibm
  LLVM: libLLVM-3.9.1 (ORCJIT, ivybridge)

One thing to notice in this transcript is that the prompt changes from julia> to R> and back. Once the RCall package is loaded you can switch to an R REPL (i.e. the R> prompt) by typing $ at the beginning of a line. Backspace at the beginning of a line takes you back to the julia> prompt.

This fit is faster than the previous Julia fit because this version of Julia uses BLAS from Intel’s Math Kernel Library (MKL). Also, the first time that you fit a model of this form several Julia functions (well, methods actually) need to be compiled. The second and subsequent fits are faster.