Introduction

I use Stan (brms) to fit many models to many datasets, and each model runs many chains.

It is easy to parallelize the chains of one brms model, by setting brm(..., cores = 4). However, I would like to additionally run many models in parallel.

There were some good suggestions on twitter:

I took Ruben’s suggestion (thanks Ruben and everyone else who replied!) to use the future package. Here, I’ll show my approach to running many models and chains in a nested parallelization scheme using futures.

Method

I used these packages

library(future)
library(brms)
library(tidyverse)

In this example I have 2 datasets, 4 models to fit to each dataset, and 2 chains per every model. In the real problem, the major time consumption is with some individual models, which take many hours/days to run (and there are many more datasets).

# 2 example datasets
data1 <- lme4::sleepstudy
data2 <- rbind(data1, data1)

There are multiple strategies for solving this problem, the one I chose here is to use nested futures. Consider the datasets, models, and chains to be a 3-level hierarchy. (The models for each dataset are different in my real application, so we need to conceptualize datasets as the top level, and cannot apply the same model to different datasets.) We can parallelize one or many levels; the default strategy without futures is to set brm(..., cores = 4) to parallelize just the lowest (chains) level. With futures, we can parallelize the chains within another layer of parallelization. I chose to parallelize chains and datasets, such that in this example I run two datasets in parallel, the models (within a dataset) sequentially, and 2 cores per model in parallel.

To instruct future to plan a 2 level hierarchy (datasets, chains) where both levels have 2 workers goes like this (the future manual says that this strategy is not recommended, but helpfully provides no alternatives):

plan(
  list(
    tweak(multisession, workers = 2),
    tweak(multisession, workers = 2)
  )
)

I then wrote a function that fits the four models sequentially to a single dataset. Note brm(..., future = TRUE), which tells brms to use futures to parallelize the chains. The function returns a data frame with the dataset name and fitted model on each row.

run_my_models <- function(data, name) {
  f1 <- brm(
    Reaction ~ 1,
    data = data,
    chains = 2,
    future = TRUE
    , file = paste0(name, "f1")
  )
  f2 <- brm(
    Reaction ~ 1 + Days,
    data = data,
    chains = 2,
    future = TRUE
    , file = paste0(name, "f2")
  )
  f3 <- brm(
    Reaction ~ 1 + Days + (1 | Subject),
    data = data,
    chains = 2,
    future = TRUE
    , file = paste0(name, "f3")
  )
  f4 <- brm(
    Reaction ~ 1 + Days + (1 + Days | Subject),
    data = data,
    chains = 2,
    future = TRUE
    , file = paste0(name, "f4")
  )
  tibble(
    dataset = name, 
    model = 1:4,
    m = list(f1, f2, f3, f4)
  )
}

The key is then to use the future assignment operator %<-% when calling that function. It appears to do some dark magic, throw a lot of errors, but succesfully run the function calls in the background (in parallel!), allowing you to entertain yourself with fortunes::fortune()1 while you wait.

fits1 %<-% run_my_models(data=data1, name="data1")
fits2 %<-% run_my_models(data=data2, name="data2")

When those lines are run, they (I think) start background R processes, which start running through the models sequentially, but run the chains in parallel. I did look at my Mac Activity Monitor and confirmed there were an appropriate number of clang compilers and R sessions running in the background.

If you call fits1/2 before the background processes have completed, the current R session will block. However, you can keep doing whatever else without blocking the session. The reason for putting the models in a tibble was to make post processing many models easier:

all_models <- bind_rows(fits1, fits2)
all_models
## # A tibble: 8 x 3
##   dataset model m        
##   <chr>   <int> <list>   
## 1 data1       1 <brmsfit>
## 2 data1       2 <brmsfit>
## 3 data1       3 <brmsfit>
## 4 data1       4 <brmsfit>
## 5 data2       1 <brmsfit>
## 6 data2       2 <brmsfit>
## 7 data2       3 <brmsfit>
## 8 data2       4 <brmsfit>

Conclusion

Nested parallelization with futures is brilliant; you can still use the active R session while many functions are running in parallel in the background, and those functions call functions that are also parallelized. Also, this strategy works within R Markdown scripts, both interactively and when knitted. I have not found this to always be the case with parallel processing, but it is an absolute must for reproducibility and easy collaboration.

I did a primitive benchmark by running this R Markdown script with the nested parallelization scheme, which resulted in

   user  system elapsed 
  7.306   1.429 235.906

Compared with a completely sequential setup’s timing (below) the speedup was pretty good.

   user  system elapsed 
426.486  12.587 445.009 

In this example, I would have expected about a twice faster run time for the parallelized version, because I ran two datasets’ models in parallel. Parallelizing the chains didn’t make a difference because in this example the chains run very fast anyway.

benchmarkme::get_cpu()
## $vendor_id
## [1] "GenuineIntel"
## 
## $model_name
## [1] "Intel(R) Core(TM) i7-7700HQ CPU @ 2.80GHz"
## 
## $no_of_cores
## [1] 8
sessionInfo()
## R version 4.0.1 (2020-06-06)
## Platform: x86_64-apple-darwin17.0 (64-bit)
## Running under: macOS Catalina 10.15.5
## 
## Matrix products: default
## BLAS:   /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRblas.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib
## 
## locale:
## [1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
##  [1] forcats_0.5.0   stringr_1.4.0   dplyr_1.0.0     purrr_0.3.4    
##  [5] readr_1.3.1     tidyr_1.1.0     tibble_3.0.1    ggplot2_3.3.2  
##  [9] tidyverse_1.3.0 brms_2.13.2     Rcpp_1.0.4.6    future_1.17.0  
## 
## loaded via a namespace (and not attached):
##   [1] minqa_1.2.4           TH.data_1.0-10        colorspace_1.4-1     
##   [4] ellipsis_0.3.1        ggridges_0.5.2        rsconnect_0.8.16     
##   [7] benchmarkme_1.0.4     estimability_1.3      markdown_1.1         
##  [10] fs_1.4.1              base64enc_0.1-3       rstudioapi_0.11      
##  [13] listenv_0.8.0         rstan_2.19.3          DT_0.13              
##  [16] fansi_0.4.1           mvtnorm_1.1-1         lubridate_1.7.9      
##  [19] xml2_1.3.2            bridgesampling_1.0-0  codetools_0.2-16     
##  [22] splines_4.0.1         doParallel_1.0.15     knitr_1.28           
##  [25] shinythemes_1.1.2     bayesplot_1.7.2       jsonlite_1.6.1       
##  [28] nloptr_1.2.2.1        broom_0.7.0.9000      dbplyr_1.4.4         
##  [31] shiny_1.4.0.2         compiler_4.0.1        httr_1.4.1           
##  [34] emmeans_1.4.7         backports_1.1.8       assertthat_0.2.1     
##  [37] Matrix_1.2-18         fastmap_1.0.1         cli_2.0.2            
##  [40] later_1.1.0.1         htmltools_0.5.0       prettyunits_1.1.1    
##  [43] tools_4.0.1           igraph_1.2.5          coda_0.19-3          
##  [46] gtable_0.3.0          glue_1.4.1            reshape2_1.4.4       
##  [49] cellranger_1.1.0      vctrs_0.3.1           nlme_3.1-148         
##  [52] iterators_1.0.12      crosstalk_1.1.0.1     xfun_0.15            
##  [55] globals_0.12.5        ps_1.3.3              lme4_1.1-23          
##  [58] rvest_0.3.5           mime_0.9              miniUI_0.1.1.1       
##  [61] lifecycle_0.2.0       gtools_3.8.2          statmod_1.4.34       
##  [64] MASS_7.3-51.6         zoo_1.8-8             scales_1.1.1         
##  [67] colourpicker_1.0      hms_0.5.3             promises_1.1.1       
##  [70] Brobdingnag_1.2-6     parallel_4.0.1        sandwich_2.5-1       
##  [73] inline_0.3.15         shinystan_2.5.0       yaml_2.2.1           
##  [76] gridExtra_2.3         loo_2.2.0             StanHeaders_2.21.0-5 
##  [79] stringi_1.4.6         dygraphs_1.1.1.6      foreach_1.5.0        
##  [82] boot_1.3-25           pkgbuild_1.0.8        benchmarkmeData_1.0.4
##  [85] rlang_0.4.6           pkgconfig_2.0.3       matrixStats_0.56.0   
##  [88] evaluate_0.14         lattice_0.20-41       rstantools_2.1.0.9000
##  [91] htmlwidgets_1.5.1     processx_3.4.2        tidyselect_1.1.0     
##  [94] plyr_1.8.6            magrittr_1.5          R6_2.4.1             
##  [97] generics_0.0.2        multcomp_1.4-13       DBI_1.1.0            
## [100] withr_2.2.0           pillar_1.4.4          haven_2.3.1          
## [103] xts_0.12-0            survival_3.2-3        abind_1.4-5          
## [106] modelr_0.1.8          crayon_1.3.4          utf8_1.1.4           
## [109] rmarkdown_2.3         readxl_1.3.1          grid_4.0.1           
## [112] blob_1.2.1            callr_3.4.3           threejs_0.3.3        
## [115] reprex_0.3.0          digest_0.6.25         xtable_1.8-4         
## [118] httpuv_1.5.4          RcppParallel_5.0.1    stats4_4.0.1         
## [121] munsell_0.5.0         fortunes_1.5-4        shinyjs_1.1

  1. fortunes::fortune(): The computational ease with which an abundance of parameters can be estimated should not be allowed to obscure the probable unwisdom of such estimation from limited data., Arthur P. Dempster, NA, in “Covariance selection”, Biometrics 28 (1), 157-175, March 1972↩︎