In this code snippet I show how to speed up categorical brms models (ordered logit model) using weights. I use the movie data set from Liddell and Kruschke 2018 (https://www.sciencedirect.com/science/article/pii/S0022103117307746?via%3Dihub).
Use these two packages
library(tidyverse)
library(brms)
movies <- read_csv("MoviesData.csv")
This is the data, I subset it to 5 movies for this example
movies <- movies[1:5,]
movies
## # A tibble: 5 x 7
## ID Descrip n1 n2 n3 n4 n5
## <dbl> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 1 The Whole Truth 49 70 119 217 245
## 2 2 Priceless 67 22 22 60 574
## 3 3 Allied 59 76 102 203 406
## 4 4 The Infiltrator 173 216 518 1339 2073
## 5 5 Miss Sloane 180 60 48 120 793
Pivot it to long format
movies <- pivot_longer(movies, n1:n5, names_to = "rating", values_to = "count")
movies
## # A tibble: 25 x 4
## ID Descrip rating count
## <dbl> <chr> <chr> <dbl>
## 1 1 The Whole Truth n1 49
## 2 1 The Whole Truth n2 70
## 3 1 The Whole Truth n3 119
## 4 1 The Whole Truth n4 217
## 5 1 The Whole Truth n5 245
## 6 2 Priceless n1 67
## 7 2 Priceless n2 22
## 8 2 Priceless n3 22
## 9 2 Priceless n4 60
## 10 2 Priceless n5 574
## # … with 15 more rows
Make ratings numerical – could be ordered factors too I think
movies$rating <- as.numeric(str_extract(movies$rating, "[0-9]"))
movies
## # A tibble: 25 x 4
## ID Descrip rating count
## <dbl> <chr> <dbl> <dbl>
## 1 1 The Whole Truth 1 49
## 2 1 The Whole Truth 2 70
## 3 1 The Whole Truth 3 119
## 4 1 The Whole Truth 4 217
## 5 1 The Whole Truth 5 245
## 6 2 Priceless 1 67
## 7 2 Priceless 2 22
## 8 2 Priceless 3 22
## 9 2 Priceless 4 60
## 10 2 Priceless 5 574
## # … with 15 more rows
The typical way to model these would be in the extremely long format, where each row is an observation, instead of a count of observations:
movies2 <- uncount(movies, count)
movies2
## # A tibble: 7,811 x 3
## ID Descrip rating
## <dbl> <chr> <dbl>
## 1 1 The Whole Truth 1
## 2 1 The Whole Truth 1
## 3 1 The Whole Truth 1
## 4 1 The Whole Truth 1
## 5 1 The Whole Truth 1
## 6 1 The Whole Truth 1
## 7 1 The Whole Truth 1
## 8 1 The Whole Truth 1
## 9 1 The Whole Truth 1
## 10 1 The Whole Truth 1
## # … with 7,801 more rows
I’ll capture the evaluation time here with system.time()
fit1_time <- system.time(
fit1 <- brm(
rating ~ Descrip,
family = cumulative("logit"),
data = movies2,
cores = 4
)
)
## Running /usr/lib/R/bin/R CMD SHLIB foo.c
## gcc -std=gnu99 -I"/usr/share/R/include" -DNDEBUG -I"/home/matti/R/x86_64-pc-linux-gnu-library/4.0/Rcpp/include/" -I"/home/matti/R/x86_64-pc-linux-gnu-library/4.0/RcppEigen/include/" -I"/home/matti/R/x86_64-pc-linux-gnu-library/4.0/RcppEigen/include/unsupported" -I"/home/matti/R/x86_64-pc-linux-gnu-library/4.0/BH/include" -I"/home/matti/R/x86_64-pc-linux-gnu-library/4.0/StanHeaders/include/src/" -I"/home/matti/R/x86_64-pc-linux-gnu-library/4.0/StanHeaders/include/" -I"/home/matti/R/x86_64-pc-linux-gnu-library/4.0/RcppParallel/include/" -I"/home/matti/R/x86_64-pc-linux-gnu-library/4.0/rstan/include" -DEIGEN_NO_DEBUG -DBOOST_DISABLE_ASSERTS -DBOOST_PENDING_INTEGER_LOG2_HPP -DSTAN_THREADS -DBOOST_NO_AUTO_PTR -include '/home/matti/R/x86_64-pc-linux-gnu-library/4.0/StanHeaders/include/stan/math/prim/mat/fun/Eigen.hpp' -D_REENTRANT -DRCPP_PARALLEL_USE_TBB=1 -fpic -g -O2 -fdebug-prefix-map=/build/r-base-8T8CYO/r-base-4.0.3=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -g -c foo.c -o foo.o
## In file included from /home/matti/R/x86_64-pc-linux-gnu-library/4.0/RcppEigen/include/Eigen/Core:88,
## from /home/matti/R/x86_64-pc-linux-gnu-library/4.0/RcppEigen/include/Eigen/Dense:1,
## from /home/matti/R/x86_64-pc-linux-gnu-library/4.0/StanHeaders/include/stan/math/prim/mat/fun/Eigen.hpp:13,
## from <command-line>:
## /home/matti/R/x86_64-pc-linux-gnu-library/4.0/RcppEigen/include/Eigen/src/Core/util/Macros.h:613:1: error: unknown type name ‘namespace’
## 613 | namespace Eigen {
## | ^~~~~~~~~
## /home/matti/R/x86_64-pc-linux-gnu-library/4.0/RcppEigen/include/Eigen/src/Core/util/Macros.h:613:17: error: expected ‘=’, ‘,’, ‘;’, ‘asm’ or ‘__attribute__’ before ‘{’ token
## 613 | namespace Eigen {
## | ^
## In file included from /home/matti/R/x86_64-pc-linux-gnu-library/4.0/RcppEigen/include/Eigen/Dense:1,
## from /home/matti/R/x86_64-pc-linux-gnu-library/4.0/StanHeaders/include/stan/math/prim/mat/fun/Eigen.hpp:13,
## from <command-line>:
## /home/matti/R/x86_64-pc-linux-gnu-library/4.0/RcppEigen/include/Eigen/Core:96:10: fatal error: complex: No such file or directory
## 96 | #include <complex>
## | ^~~~~~~~~
## compilation terminated.
## make: *** [/usr/lib/R/etc/Makeconf:172: foo.o] Error 1
We can make this a lot faster using weights with the aggregated data frame. Here’s what the brms manual says about weights:
For all families, weighted regression may be performed using weights in the aterms part. Internally, this is implemented by multiplying the log-posterior values of each observation by their corresponding weights. Suppose that variable wei contains the weights and that yi is the response variable. Then, formula yi | weights(wei) ~ predictors implements a weighted regression.
So here’s how we do it:
fit2_time <- system.time(
fit2 <- brm(
rating | weights(count) ~ Descrip,
family = cumulative("logit"),
data = movies,
cores = 4
)
)
## Running /usr/lib/R/bin/R CMD SHLIB foo.c
## gcc -std=gnu99 -I"/usr/share/R/include" -DNDEBUG -I"/home/matti/R/x86_64-pc-linux-gnu-library/4.0/Rcpp/include/" -I"/home/matti/R/x86_64-pc-linux-gnu-library/4.0/RcppEigen/include/" -I"/home/matti/R/x86_64-pc-linux-gnu-library/4.0/RcppEigen/include/unsupported" -I"/home/matti/R/x86_64-pc-linux-gnu-library/4.0/BH/include" -I"/home/matti/R/x86_64-pc-linux-gnu-library/4.0/StanHeaders/include/src/" -I"/home/matti/R/x86_64-pc-linux-gnu-library/4.0/StanHeaders/include/" -I"/home/matti/R/x86_64-pc-linux-gnu-library/4.0/RcppParallel/include/" -I"/home/matti/R/x86_64-pc-linux-gnu-library/4.0/rstan/include" -DEIGEN_NO_DEBUG -DBOOST_DISABLE_ASSERTS -DBOOST_PENDING_INTEGER_LOG2_HPP -DSTAN_THREADS -DBOOST_NO_AUTO_PTR -include '/home/matti/R/x86_64-pc-linux-gnu-library/4.0/StanHeaders/include/stan/math/prim/mat/fun/Eigen.hpp' -D_REENTRANT -DRCPP_PARALLEL_USE_TBB=1 -fpic -g -O2 -fdebug-prefix-map=/build/r-base-8T8CYO/r-base-4.0.3=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -g -c foo.c -o foo.o
## In file included from /home/matti/R/x86_64-pc-linux-gnu-library/4.0/RcppEigen/include/Eigen/Core:88,
## from /home/matti/R/x86_64-pc-linux-gnu-library/4.0/RcppEigen/include/Eigen/Dense:1,
## from /home/matti/R/x86_64-pc-linux-gnu-library/4.0/StanHeaders/include/stan/math/prim/mat/fun/Eigen.hpp:13,
## from <command-line>:
## /home/matti/R/x86_64-pc-linux-gnu-library/4.0/RcppEigen/include/Eigen/src/Core/util/Macros.h:613:1: error: unknown type name ‘namespace’
## 613 | namespace Eigen {
## | ^~~~~~~~~
## /home/matti/R/x86_64-pc-linux-gnu-library/4.0/RcppEigen/include/Eigen/src/Core/util/Macros.h:613:17: error: expected ‘=’, ‘,’, ‘;’, ‘asm’ or ‘__attribute__’ before ‘{’ token
## 613 | namespace Eigen {
## | ^
## In file included from /home/matti/R/x86_64-pc-linux-gnu-library/4.0/RcppEigen/include/Eigen/Dense:1,
## from /home/matti/R/x86_64-pc-linux-gnu-library/4.0/StanHeaders/include/stan/math/prim/mat/fun/Eigen.hpp:13,
## from <command-line>:
## /home/matti/R/x86_64-pc-linux-gnu-library/4.0/RcppEigen/include/Eigen/Core:96:10: fatal error: complex: No such file or directory
## 96 | #include <complex>
## | ^~~~~~~~~
## compilation terminated.
## make: *** [/usr/lib/R/etc/Makeconf:172: foo.o] Error 1
And it’s faster. To make the comparison easier, I also account for the compilation time by including a model without samples (fit0–not shown)
rbind(fit0_time, fit1_time, fit2_time)
## user.self sys.self elapsed user.child sys.child
## fit0_time 4.775 0.215 31.551 26.097 5.656
## fit1_time 4.541 0.495 84.916 225.465 6.653
## fit2_time 5.371 0.488 33.336 28.720 7.506
Quite obviously, the speedup is enormous (about 2 seconds vs 50 seconds) because the data is orders of magnitude smaller. And results are the same
options(digits = 2)
fixef(fit1)
## Estimate Est.Error Q2.5 Q97.5
## Intercept[1] -2.45 0.074 -2.5965 -2.31
## Intercept[2] -1.78 0.068 -1.9100 -1.64
## Intercept[3] -1.04 0.065 -1.1636 -0.91
## Intercept[4] 0.12 0.064 0.0011 0.25
## DescripMissSloane 0.56 0.088 0.3812 0.73
## DescripPriceless 1.22 0.109 1.0080 1.43
## DescripTheInfiltrator 0.14 0.070 0.0104 0.28
## DescripTheWholeTruth -0.39 0.091 -0.5706 -0.22
fixef(fit2)
## Estimate Est.Error Q2.5 Q97.5
## Intercept[1] -2.45 0.076 -2.5988 -2.30
## Intercept[2] -1.78 0.070 -1.9156 -1.64
## Intercept[3] -1.04 0.067 -1.1694 -0.91
## Intercept[4] 0.13 0.066 -0.0044 0.25
## DescripMissSloane 0.56 0.091 0.3827 0.74
## DescripPriceless 1.22 0.109 1.0088 1.44
## DescripTheInfiltrator 0.15 0.071 0.0041 0.28
## DescripTheWholeTruth -0.39 0.092 -0.5614 -0.20
This leads, however, to the follow up question if the weights can be used in auxiliary formulas as well, such as for ones predicting the latent variance. The answer is yes:
fit3 <- brm(
bf(rating ~ Descrip) +
lf(disc ~ 0 + Descrip, cmc = FALSE),
family = cumulative("logit"),
data = movies2,
cores = 4
)
## Running /usr/lib/R/bin/R CMD SHLIB foo.c
## gcc -std=gnu99 -I"/usr/share/R/include" -DNDEBUG -I"/home/matti/R/x86_64-pc-linux-gnu-library/4.0/Rcpp/include/" -I"/home/matti/R/x86_64-pc-linux-gnu-library/4.0/RcppEigen/include/" -I"/home/matti/R/x86_64-pc-linux-gnu-library/4.0/RcppEigen/include/unsupported" -I"/home/matti/R/x86_64-pc-linux-gnu-library/4.0/BH/include" -I"/home/matti/R/x86_64-pc-linux-gnu-library/4.0/StanHeaders/include/src/" -I"/home/matti/R/x86_64-pc-linux-gnu-library/4.0/StanHeaders/include/" -I"/home/matti/R/x86_64-pc-linux-gnu-library/4.0/RcppParallel/include/" -I"/home/matti/R/x86_64-pc-linux-gnu-library/4.0/rstan/include" -DEIGEN_NO_DEBUG -DBOOST_DISABLE_ASSERTS -DBOOST_PENDING_INTEGER_LOG2_HPP -DSTAN_THREADS -DBOOST_NO_AUTO_PTR -include '/home/matti/R/x86_64-pc-linux-gnu-library/4.0/StanHeaders/include/stan/math/prim/mat/fun/Eigen.hpp' -D_REENTRANT -DRCPP_PARALLEL_USE_TBB=1 -fpic -g -O2 -fdebug-prefix-map=/build/r-base-8T8CYO/r-base-4.0.3=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -g -c foo.c -o foo.o
## In file included from /home/matti/R/x86_64-pc-linux-gnu-library/4.0/RcppEigen/include/Eigen/Core:88,
## from /home/matti/R/x86_64-pc-linux-gnu-library/4.0/RcppEigen/include/Eigen/Dense:1,
## from /home/matti/R/x86_64-pc-linux-gnu-library/4.0/StanHeaders/include/stan/math/prim/mat/fun/Eigen.hpp:13,
## from <command-line>:
## /home/matti/R/x86_64-pc-linux-gnu-library/4.0/RcppEigen/include/Eigen/src/Core/util/Macros.h:613:1: error: unknown type name ‘namespace’
## 613 | namespace Eigen {
## | ^~~~~~~~~
## /home/matti/R/x86_64-pc-linux-gnu-library/4.0/RcppEigen/include/Eigen/src/Core/util/Macros.h:613:17: error: expected ‘=’, ‘,’, ‘;’, ‘asm’ or ‘__attribute__’ before ‘{’ token
## 613 | namespace Eigen {
## | ^
## In file included from /home/matti/R/x86_64-pc-linux-gnu-library/4.0/RcppEigen/include/Eigen/Dense:1,
## from /home/matti/R/x86_64-pc-linux-gnu-library/4.0/StanHeaders/include/stan/math/prim/mat/fun/Eigen.hpp:13,
## from <command-line>:
## /home/matti/R/x86_64-pc-linux-gnu-library/4.0/RcppEigen/include/Eigen/Core:96:10: fatal error: complex: No such file or directory
## 96 | #include <complex>
## | ^~~~~~~~~
## compilation terminated.
## make: *** [/usr/lib/R/etc/Makeconf:172: foo.o] Error 1
fit4 <- brm(
bf(rating | weights(count) ~ Descrip) +
lf(disc ~ 0 + Descrip, cmc = FALSE),
family = cumulative("logit"),
data = movies,
cores = 4
)
## Running /usr/lib/R/bin/R CMD SHLIB foo.c
## gcc -std=gnu99 -I"/usr/share/R/include" -DNDEBUG -I"/home/matti/R/x86_64-pc-linux-gnu-library/4.0/Rcpp/include/" -I"/home/matti/R/x86_64-pc-linux-gnu-library/4.0/RcppEigen/include/" -I"/home/matti/R/x86_64-pc-linux-gnu-library/4.0/RcppEigen/include/unsupported" -I"/home/matti/R/x86_64-pc-linux-gnu-library/4.0/BH/include" -I"/home/matti/R/x86_64-pc-linux-gnu-library/4.0/StanHeaders/include/src/" -I"/home/matti/R/x86_64-pc-linux-gnu-library/4.0/StanHeaders/include/" -I"/home/matti/R/x86_64-pc-linux-gnu-library/4.0/RcppParallel/include/" -I"/home/matti/R/x86_64-pc-linux-gnu-library/4.0/rstan/include" -DEIGEN_NO_DEBUG -DBOOST_DISABLE_ASSERTS -DBOOST_PENDING_INTEGER_LOG2_HPP -DSTAN_THREADS -DBOOST_NO_AUTO_PTR -include '/home/matti/R/x86_64-pc-linux-gnu-library/4.0/StanHeaders/include/stan/math/prim/mat/fun/Eigen.hpp' -D_REENTRANT -DRCPP_PARALLEL_USE_TBB=1 -fpic -g -O2 -fdebug-prefix-map=/build/r-base-8T8CYO/r-base-4.0.3=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -g -c foo.c -o foo.o
## In file included from /home/matti/R/x86_64-pc-linux-gnu-library/4.0/RcppEigen/include/Eigen/Core:88,
## from /home/matti/R/x86_64-pc-linux-gnu-library/4.0/RcppEigen/include/Eigen/Dense:1,
## from /home/matti/R/x86_64-pc-linux-gnu-library/4.0/StanHeaders/include/stan/math/prim/mat/fun/Eigen.hpp:13,
## from <command-line>:
## /home/matti/R/x86_64-pc-linux-gnu-library/4.0/RcppEigen/include/Eigen/src/Core/util/Macros.h:613:1: error: unknown type name ‘namespace’
## 613 | namespace Eigen {
## | ^~~~~~~~~
## /home/matti/R/x86_64-pc-linux-gnu-library/4.0/RcppEigen/include/Eigen/src/Core/util/Macros.h:613:17: error: expected ‘=’, ‘,’, ‘;’, ‘asm’ or ‘__attribute__’ before ‘{’ token
## 613 | namespace Eigen {
## | ^
## In file included from /home/matti/R/x86_64-pc-linux-gnu-library/4.0/RcppEigen/include/Eigen/Dense:1,
## from /home/matti/R/x86_64-pc-linux-gnu-library/4.0/StanHeaders/include/stan/math/prim/mat/fun/Eigen.hpp:13,
## from <command-line>:
## /home/matti/R/x86_64-pc-linux-gnu-library/4.0/RcppEigen/include/Eigen/Core:96:10: fatal error: complex: No such file or directory
## 96 | #include <complex>
## | ^~~~~~~~~
## compilation terminated.
## make: *** [/usr/lib/R/etc/Makeconf:172: foo.o] Error 1
Once again ensuring that the results are identical:
options(digits = 2)
fixef(fit3)
## Estimate Est.Error Q2.5 Q97.5
## Intercept[1] -2.481 0.113 -2.708 -2.26
## Intercept[2] -1.725 0.087 -1.897 -1.56
## Intercept[3] -0.986 0.069 -1.125 -0.85
## Intercept[4] 0.103 0.067 -0.028 0.23
## DescripMissSloane 1.727 0.223 1.307 2.18
## DescripPriceless 2.934 0.376 2.261 3.73
## DescripTheInfiltrator 0.043 0.070 -0.097 0.18
## DescripTheWholeTruth -0.418 0.086 -0.584 -0.25
## disc_DescripMissSloane -0.885 0.075 -1.028 -0.74
## disc_DescripPriceless -0.840 0.101 -1.040 -0.65
## disc_DescripTheInfiltrator 0.252 0.048 0.158 0.35
## disc_DescripTheWholeTruth 0.196 0.060 0.076 0.31
fixef(fit4)
## Estimate Est.Error Q2.5 Q97.5
## Intercept[1] -2.476 0.115 -2.706 -2.26
## Intercept[2] -1.722 0.088 -1.896 -1.55
## Intercept[3] -0.985 0.070 -1.126 -0.85
## Intercept[4] 0.102 0.069 -0.033 0.23
## DescripMissSloane 1.712 0.228 1.290 2.18
## DescripPriceless 2.907 0.393 2.204 3.75
## DescripTheInfiltrator 0.041 0.072 -0.105 0.18
## DescripTheWholeTruth -0.418 0.085 -0.586 -0.26
## disc_DescripMissSloane -0.880 0.077 -1.032 -0.73
## disc_DescripPriceless -0.834 0.104 -1.041 -0.64
## disc_DescripTheInfiltrator 0.255 0.049 0.160 0.35
## disc_DescripTheWholeTruth 0.199 0.062 0.078 0.32