In this code snippet I show how to speed up categorical brms models (ordered logit model) using weights. I use the movie data set from Liddell and Kruschke 2018 (https://www.sciencedirect.com/science/article/pii/S0022103117307746?via%3Dihub).

Use these two packages

library(tidyverse)
library(brms)
movies <- read_csv("MoviesData.csv")

This is the data, I subset it to 5 movies for this example

movies <- movies[1:5,]
movies
## # A tibble: 5 x 7
##      ID Descrip            n1    n2    n3    n4    n5
##   <dbl> <chr>           <dbl> <dbl> <dbl> <dbl> <dbl>
## 1     1 The Whole Truth    49    70   119   217   245
## 2     2 Priceless          67    22    22    60   574
## 3     3 Allied             59    76   102   203   406
## 4     4 The Infiltrator   173   216   518  1339  2073
## 5     5 Miss Sloane       180    60    48   120   793

Pivot it to long format

movies <- pivot_longer(movies, n1:n5, names_to = "rating", values_to = "count")
movies
## # A tibble: 25 x 4
##       ID Descrip         rating count
##    <dbl> <chr>           <chr>  <dbl>
##  1     1 The Whole Truth n1        49
##  2     1 The Whole Truth n2        70
##  3     1 The Whole Truth n3       119
##  4     1 The Whole Truth n4       217
##  5     1 The Whole Truth n5       245
##  6     2 Priceless       n1        67
##  7     2 Priceless       n2        22
##  8     2 Priceless       n3        22
##  9     2 Priceless       n4        60
## 10     2 Priceless       n5       574
## # … with 15 more rows

Make ratings numerical – could be ordered factors too I think

movies$rating <- as.numeric(str_extract(movies$rating, "[0-9]"))
movies
## # A tibble: 25 x 4
##       ID Descrip         rating count
##    <dbl> <chr>            <dbl> <dbl>
##  1     1 The Whole Truth      1    49
##  2     1 The Whole Truth      2    70
##  3     1 The Whole Truth      3   119
##  4     1 The Whole Truth      4   217
##  5     1 The Whole Truth      5   245
##  6     2 Priceless            1    67
##  7     2 Priceless            2    22
##  8     2 Priceless            3    22
##  9     2 Priceless            4    60
## 10     2 Priceless            5   574
## # … with 15 more rows

The typical way to model these would be in the extremely long format, where each row is an observation, instead of a count of observations:

movies2 <- uncount(movies, count)
movies2
## # A tibble: 7,811 x 3
##       ID Descrip         rating
##    <dbl> <chr>            <dbl>
##  1     1 The Whole Truth      1
##  2     1 The Whole Truth      1
##  3     1 The Whole Truth      1
##  4     1 The Whole Truth      1
##  5     1 The Whole Truth      1
##  6     1 The Whole Truth      1
##  7     1 The Whole Truth      1
##  8     1 The Whole Truth      1
##  9     1 The Whole Truth      1
## 10     1 The Whole Truth      1
## # … with 7,801 more rows

I’ll capture the evaluation time here with system.time()

fit1_time <- system.time(
  fit1 <- brm(
    rating ~ Descrip,
    family = cumulative("logit"),
    data = movies2,
    cores = 4
  )
)
## Running /usr/lib/R/bin/R CMD SHLIB foo.c
## gcc -std=gnu99 -I"/usr/share/R/include" -DNDEBUG   -I"/home/matti/R/x86_64-pc-linux-gnu-library/4.0/Rcpp/include/"  -I"/home/matti/R/x86_64-pc-linux-gnu-library/4.0/RcppEigen/include/"  -I"/home/matti/R/x86_64-pc-linux-gnu-library/4.0/RcppEigen/include/unsupported"  -I"/home/matti/R/x86_64-pc-linux-gnu-library/4.0/BH/include" -I"/home/matti/R/x86_64-pc-linux-gnu-library/4.0/StanHeaders/include/src/"  -I"/home/matti/R/x86_64-pc-linux-gnu-library/4.0/StanHeaders/include/"  -I"/home/matti/R/x86_64-pc-linux-gnu-library/4.0/RcppParallel/include/"  -I"/home/matti/R/x86_64-pc-linux-gnu-library/4.0/rstan/include" -DEIGEN_NO_DEBUG  -DBOOST_DISABLE_ASSERTS  -DBOOST_PENDING_INTEGER_LOG2_HPP  -DSTAN_THREADS  -DBOOST_NO_AUTO_PTR  -include '/home/matti/R/x86_64-pc-linux-gnu-library/4.0/StanHeaders/include/stan/math/prim/mat/fun/Eigen.hpp'  -D_REENTRANT -DRCPP_PARALLEL_USE_TBB=1      -fpic  -g -O2 -fdebug-prefix-map=/build/r-base-8T8CYO/r-base-4.0.3=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -g  -c foo.c -o foo.o
## In file included from /home/matti/R/x86_64-pc-linux-gnu-library/4.0/RcppEigen/include/Eigen/Core:88,
##                  from /home/matti/R/x86_64-pc-linux-gnu-library/4.0/RcppEigen/include/Eigen/Dense:1,
##                  from /home/matti/R/x86_64-pc-linux-gnu-library/4.0/StanHeaders/include/stan/math/prim/mat/fun/Eigen.hpp:13,
##                  from <command-line>:
## /home/matti/R/x86_64-pc-linux-gnu-library/4.0/RcppEigen/include/Eigen/src/Core/util/Macros.h:613:1: error: unknown type name ‘namespace’
##   613 | namespace Eigen {
##       | ^~~~~~~~~
## /home/matti/R/x86_64-pc-linux-gnu-library/4.0/RcppEigen/include/Eigen/src/Core/util/Macros.h:613:17: error: expected ‘=’, ‘,’, ‘;’, ‘asm’ or ‘__attribute__’ before ‘{’ token
##   613 | namespace Eigen {
##       |                 ^
## In file included from /home/matti/R/x86_64-pc-linux-gnu-library/4.0/RcppEigen/include/Eigen/Dense:1,
##                  from /home/matti/R/x86_64-pc-linux-gnu-library/4.0/StanHeaders/include/stan/math/prim/mat/fun/Eigen.hpp:13,
##                  from <command-line>:
## /home/matti/R/x86_64-pc-linux-gnu-library/4.0/RcppEigen/include/Eigen/Core:96:10: fatal error: complex: No such file or directory
##    96 | #include <complex>
##       |          ^~~~~~~~~
## compilation terminated.
## make: *** [/usr/lib/R/etc/Makeconf:172: foo.o] Error 1

We can make this a lot faster using weights with the aggregated data frame. Here’s what the brms manual says about weights:

For all families, weighted regression may be performed using weights in the aterms part. Internally, this is implemented by multiplying the log-posterior values of each observation by their corresponding weights. Suppose that variable wei contains the weights and that yi is the response variable. Then, formula yi | weights(wei) ~ predictors implements a weighted regression.

So here’s how we do it:

fit2_time <- system.time(
  fit2 <- brm(
    rating | weights(count) ~ Descrip,
    family = cumulative("logit"),
    data = movies,
    cores = 4
  )
)
## Running /usr/lib/R/bin/R CMD SHLIB foo.c
## gcc -std=gnu99 -I"/usr/share/R/include" -DNDEBUG   -I"/home/matti/R/x86_64-pc-linux-gnu-library/4.0/Rcpp/include/"  -I"/home/matti/R/x86_64-pc-linux-gnu-library/4.0/RcppEigen/include/"  -I"/home/matti/R/x86_64-pc-linux-gnu-library/4.0/RcppEigen/include/unsupported"  -I"/home/matti/R/x86_64-pc-linux-gnu-library/4.0/BH/include" -I"/home/matti/R/x86_64-pc-linux-gnu-library/4.0/StanHeaders/include/src/"  -I"/home/matti/R/x86_64-pc-linux-gnu-library/4.0/StanHeaders/include/"  -I"/home/matti/R/x86_64-pc-linux-gnu-library/4.0/RcppParallel/include/"  -I"/home/matti/R/x86_64-pc-linux-gnu-library/4.0/rstan/include" -DEIGEN_NO_DEBUG  -DBOOST_DISABLE_ASSERTS  -DBOOST_PENDING_INTEGER_LOG2_HPP  -DSTAN_THREADS  -DBOOST_NO_AUTO_PTR  -include '/home/matti/R/x86_64-pc-linux-gnu-library/4.0/StanHeaders/include/stan/math/prim/mat/fun/Eigen.hpp'  -D_REENTRANT -DRCPP_PARALLEL_USE_TBB=1      -fpic  -g -O2 -fdebug-prefix-map=/build/r-base-8T8CYO/r-base-4.0.3=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -g  -c foo.c -o foo.o
## In file included from /home/matti/R/x86_64-pc-linux-gnu-library/4.0/RcppEigen/include/Eigen/Core:88,
##                  from /home/matti/R/x86_64-pc-linux-gnu-library/4.0/RcppEigen/include/Eigen/Dense:1,
##                  from /home/matti/R/x86_64-pc-linux-gnu-library/4.0/StanHeaders/include/stan/math/prim/mat/fun/Eigen.hpp:13,
##                  from <command-line>:
## /home/matti/R/x86_64-pc-linux-gnu-library/4.0/RcppEigen/include/Eigen/src/Core/util/Macros.h:613:1: error: unknown type name ‘namespace’
##   613 | namespace Eigen {
##       | ^~~~~~~~~
## /home/matti/R/x86_64-pc-linux-gnu-library/4.0/RcppEigen/include/Eigen/src/Core/util/Macros.h:613:17: error: expected ‘=’, ‘,’, ‘;’, ‘asm’ or ‘__attribute__’ before ‘{’ token
##   613 | namespace Eigen {
##       |                 ^
## In file included from /home/matti/R/x86_64-pc-linux-gnu-library/4.0/RcppEigen/include/Eigen/Dense:1,
##                  from /home/matti/R/x86_64-pc-linux-gnu-library/4.0/StanHeaders/include/stan/math/prim/mat/fun/Eigen.hpp:13,
##                  from <command-line>:
## /home/matti/R/x86_64-pc-linux-gnu-library/4.0/RcppEigen/include/Eigen/Core:96:10: fatal error: complex: No such file or directory
##    96 | #include <complex>
##       |          ^~~~~~~~~
## compilation terminated.
## make: *** [/usr/lib/R/etc/Makeconf:172: foo.o] Error 1

And it’s faster. To make the comparison easier, I also account for the compilation time by including a model without samples (fit0–not shown)

rbind(fit0_time, fit1_time, fit2_time)
##           user.self sys.self elapsed user.child sys.child
## fit0_time     4.775    0.215  31.551     26.097     5.656
## fit1_time     4.541    0.495  84.916    225.465     6.653
## fit2_time     5.371    0.488  33.336     28.720     7.506

Quite obviously, the speedup is enormous (about 2 seconds vs 50 seconds) because the data is orders of magnitude smaller. And results are the same

options(digits = 2)
fixef(fit1)
##                       Estimate Est.Error    Q2.5 Q97.5
## Intercept[1]             -2.45     0.074 -2.5965 -2.31
## Intercept[2]             -1.78     0.068 -1.9100 -1.64
## Intercept[3]             -1.04     0.065 -1.1636 -0.91
## Intercept[4]              0.12     0.064  0.0011  0.25
## DescripMissSloane         0.56     0.088  0.3812  0.73
## DescripPriceless          1.22     0.109  1.0080  1.43
## DescripTheInfiltrator     0.14     0.070  0.0104  0.28
## DescripTheWholeTruth     -0.39     0.091 -0.5706 -0.22
fixef(fit2)
##                       Estimate Est.Error    Q2.5 Q97.5
## Intercept[1]             -2.45     0.076 -2.5988 -2.30
## Intercept[2]             -1.78     0.070 -1.9156 -1.64
## Intercept[3]             -1.04     0.067 -1.1694 -0.91
## Intercept[4]              0.13     0.066 -0.0044  0.25
## DescripMissSloane         0.56     0.091  0.3827  0.74
## DescripPriceless          1.22     0.109  1.0088  1.44
## DescripTheInfiltrator     0.15     0.071  0.0041  0.28
## DescripTheWholeTruth     -0.39     0.092 -0.5614 -0.20

This leads, however, to the follow up question if the weights can be used in auxiliary formulas as well, such as for ones predicting the latent variance. The answer is yes:

fit3 <- brm(
    bf(rating ~ Descrip) + 
      lf(disc ~ 0 + Descrip, cmc = FALSE),
    family = cumulative("logit"),
    data = movies2,
    cores = 4
  )
## Running /usr/lib/R/bin/R CMD SHLIB foo.c
## gcc -std=gnu99 -I"/usr/share/R/include" -DNDEBUG   -I"/home/matti/R/x86_64-pc-linux-gnu-library/4.0/Rcpp/include/"  -I"/home/matti/R/x86_64-pc-linux-gnu-library/4.0/RcppEigen/include/"  -I"/home/matti/R/x86_64-pc-linux-gnu-library/4.0/RcppEigen/include/unsupported"  -I"/home/matti/R/x86_64-pc-linux-gnu-library/4.0/BH/include" -I"/home/matti/R/x86_64-pc-linux-gnu-library/4.0/StanHeaders/include/src/"  -I"/home/matti/R/x86_64-pc-linux-gnu-library/4.0/StanHeaders/include/"  -I"/home/matti/R/x86_64-pc-linux-gnu-library/4.0/RcppParallel/include/"  -I"/home/matti/R/x86_64-pc-linux-gnu-library/4.0/rstan/include" -DEIGEN_NO_DEBUG  -DBOOST_DISABLE_ASSERTS  -DBOOST_PENDING_INTEGER_LOG2_HPP  -DSTAN_THREADS  -DBOOST_NO_AUTO_PTR  -include '/home/matti/R/x86_64-pc-linux-gnu-library/4.0/StanHeaders/include/stan/math/prim/mat/fun/Eigen.hpp'  -D_REENTRANT -DRCPP_PARALLEL_USE_TBB=1      -fpic  -g -O2 -fdebug-prefix-map=/build/r-base-8T8CYO/r-base-4.0.3=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -g  -c foo.c -o foo.o
## In file included from /home/matti/R/x86_64-pc-linux-gnu-library/4.0/RcppEigen/include/Eigen/Core:88,
##                  from /home/matti/R/x86_64-pc-linux-gnu-library/4.0/RcppEigen/include/Eigen/Dense:1,
##                  from /home/matti/R/x86_64-pc-linux-gnu-library/4.0/StanHeaders/include/stan/math/prim/mat/fun/Eigen.hpp:13,
##                  from <command-line>:
## /home/matti/R/x86_64-pc-linux-gnu-library/4.0/RcppEigen/include/Eigen/src/Core/util/Macros.h:613:1: error: unknown type name ‘namespace’
##   613 | namespace Eigen {
##       | ^~~~~~~~~
## /home/matti/R/x86_64-pc-linux-gnu-library/4.0/RcppEigen/include/Eigen/src/Core/util/Macros.h:613:17: error: expected ‘=’, ‘,’, ‘;’, ‘asm’ or ‘__attribute__’ before ‘{’ token
##   613 | namespace Eigen {
##       |                 ^
## In file included from /home/matti/R/x86_64-pc-linux-gnu-library/4.0/RcppEigen/include/Eigen/Dense:1,
##                  from /home/matti/R/x86_64-pc-linux-gnu-library/4.0/StanHeaders/include/stan/math/prim/mat/fun/Eigen.hpp:13,
##                  from <command-line>:
## /home/matti/R/x86_64-pc-linux-gnu-library/4.0/RcppEigen/include/Eigen/Core:96:10: fatal error: complex: No such file or directory
##    96 | #include <complex>
##       |          ^~~~~~~~~
## compilation terminated.
## make: *** [/usr/lib/R/etc/Makeconf:172: foo.o] Error 1
fit4 <- brm(
    bf(rating | weights(count) ~ Descrip) + 
      lf(disc ~ 0 + Descrip, cmc = FALSE),
    family = cumulative("logit"),
    data = movies,
    cores = 4
  )
## Running /usr/lib/R/bin/R CMD SHLIB foo.c
## gcc -std=gnu99 -I"/usr/share/R/include" -DNDEBUG   -I"/home/matti/R/x86_64-pc-linux-gnu-library/4.0/Rcpp/include/"  -I"/home/matti/R/x86_64-pc-linux-gnu-library/4.0/RcppEigen/include/"  -I"/home/matti/R/x86_64-pc-linux-gnu-library/4.0/RcppEigen/include/unsupported"  -I"/home/matti/R/x86_64-pc-linux-gnu-library/4.0/BH/include" -I"/home/matti/R/x86_64-pc-linux-gnu-library/4.0/StanHeaders/include/src/"  -I"/home/matti/R/x86_64-pc-linux-gnu-library/4.0/StanHeaders/include/"  -I"/home/matti/R/x86_64-pc-linux-gnu-library/4.0/RcppParallel/include/"  -I"/home/matti/R/x86_64-pc-linux-gnu-library/4.0/rstan/include" -DEIGEN_NO_DEBUG  -DBOOST_DISABLE_ASSERTS  -DBOOST_PENDING_INTEGER_LOG2_HPP  -DSTAN_THREADS  -DBOOST_NO_AUTO_PTR  -include '/home/matti/R/x86_64-pc-linux-gnu-library/4.0/StanHeaders/include/stan/math/prim/mat/fun/Eigen.hpp'  -D_REENTRANT -DRCPP_PARALLEL_USE_TBB=1      -fpic  -g -O2 -fdebug-prefix-map=/build/r-base-8T8CYO/r-base-4.0.3=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -g  -c foo.c -o foo.o
## In file included from /home/matti/R/x86_64-pc-linux-gnu-library/4.0/RcppEigen/include/Eigen/Core:88,
##                  from /home/matti/R/x86_64-pc-linux-gnu-library/4.0/RcppEigen/include/Eigen/Dense:1,
##                  from /home/matti/R/x86_64-pc-linux-gnu-library/4.0/StanHeaders/include/stan/math/prim/mat/fun/Eigen.hpp:13,
##                  from <command-line>:
## /home/matti/R/x86_64-pc-linux-gnu-library/4.0/RcppEigen/include/Eigen/src/Core/util/Macros.h:613:1: error: unknown type name ‘namespace’
##   613 | namespace Eigen {
##       | ^~~~~~~~~
## /home/matti/R/x86_64-pc-linux-gnu-library/4.0/RcppEigen/include/Eigen/src/Core/util/Macros.h:613:17: error: expected ‘=’, ‘,’, ‘;’, ‘asm’ or ‘__attribute__’ before ‘{’ token
##   613 | namespace Eigen {
##       |                 ^
## In file included from /home/matti/R/x86_64-pc-linux-gnu-library/4.0/RcppEigen/include/Eigen/Dense:1,
##                  from /home/matti/R/x86_64-pc-linux-gnu-library/4.0/StanHeaders/include/stan/math/prim/mat/fun/Eigen.hpp:13,
##                  from <command-line>:
## /home/matti/R/x86_64-pc-linux-gnu-library/4.0/RcppEigen/include/Eigen/Core:96:10: fatal error: complex: No such file or directory
##    96 | #include <complex>
##       |          ^~~~~~~~~
## compilation terminated.
## make: *** [/usr/lib/R/etc/Makeconf:172: foo.o] Error 1

Once again ensuring that the results are identical:

options(digits = 2)
fixef(fit3)
##                            Estimate Est.Error   Q2.5 Q97.5
## Intercept[1]                 -2.481     0.113 -2.708 -2.26
## Intercept[2]                 -1.725     0.087 -1.897 -1.56
## Intercept[3]                 -0.986     0.069 -1.125 -0.85
## Intercept[4]                  0.103     0.067 -0.028  0.23
## DescripMissSloane             1.727     0.223  1.307  2.18
## DescripPriceless              2.934     0.376  2.261  3.73
## DescripTheInfiltrator         0.043     0.070 -0.097  0.18
## DescripTheWholeTruth         -0.418     0.086 -0.584 -0.25
## disc_DescripMissSloane       -0.885     0.075 -1.028 -0.74
## disc_DescripPriceless        -0.840     0.101 -1.040 -0.65
## disc_DescripTheInfiltrator    0.252     0.048  0.158  0.35
## disc_DescripTheWholeTruth     0.196     0.060  0.076  0.31
fixef(fit4)
##                            Estimate Est.Error   Q2.5 Q97.5
## Intercept[1]                 -2.476     0.115 -2.706 -2.26
## Intercept[2]                 -1.722     0.088 -1.896 -1.55
## Intercept[3]                 -0.985     0.070 -1.126 -0.85
## Intercept[4]                  0.102     0.069 -0.033  0.23
## DescripMissSloane             1.712     0.228  1.290  2.18
## DescripPriceless              2.907     0.393  2.204  3.75
## DescripTheInfiltrator         0.041     0.072 -0.105  0.18
## DescripTheWholeTruth         -0.418     0.085 -0.586 -0.26
## disc_DescripMissSloane       -0.880     0.077 -1.032 -0.73
## disc_DescripPriceless        -0.834     0.104 -1.041 -0.64
## disc_DescripTheInfiltrator    0.255     0.049  0.160  0.35
## disc_DescripTheWholeTruth     0.199     0.062  0.078  0.32