Bayesian Model Selection with brms

Author

Trevor Caughlin

1 Overview

In this example, we compare three Bayesian count models for sagebrush flower stalk production, predicted from drone imagery using negative binomial regression in brms.

We will fit:

  1. an intercept-only model
  2. a model with canopy height only
  3. a full model with many predictors

Then we will compare models with two common predictive criteria:

  • LOOIC: approximate leave-one-out cross-validation
  • 4-fold cross-validation: repeated fitting to held-out subsets of the data

We will also summarize out-of-sample mean absolute error (MAE) using posterior draws from kfold_predict().

Our model selection procedure will addresss the following questions:

  • Which model predicts new data best?
  • Does added complexity improve predictive performance enough to justify it?
  • Are simpler models nearly as good?
library(brms)
library(ggplot2)

dat <- read.csv("flower_power.csv")

1.1 Quick look at the data

str(dat)
'data.frame':   561 obs. of  9 variables:
 $ flower_stalk_count: int  35 44 9 30 8 22 34 165 41 46 ...
 $ Area              : num  0.0568 0.7987 1.0388 1.0388 -0.0773 ...
 $ Canopy_height     : num  0.658 0.44 2.091 2.091 0.361 ...
 $ Red               : num  -0.665 -1.372 -1.25 -1.25 -1.239 ...
 $ Green             : num  -0.633 -1.334 -1.217 -1.217 -1.172 ...
 $ Blue              : num  -0.732 -1.088 -1.29 -1.29 -0.929 ...
 $ Elevation         : num  0.525 0.515 0.52 0.52 0.513 ...
 $ Year              : int  22 22 22 22 22 22 22 22 22 22 ...
 $ site              : chr  "site1" "site1" "site1" "site1" ...
summary(dat)
 flower_stalk_count      Area           Canopy_height      
 Min.   :  0.00     Min.   :-0.731163   Min.   :-0.813657  
 1st Qu.:  4.00     1st Qu.:-0.361909   1st Qu.:-0.365651  
 Median : 24.00     Median :-0.120697   Median :-0.077639  
 Mean   : 76.47     Mean   :-0.003575   Mean   :-0.002608  
 3rd Qu.: 92.00     3rd Qu.: 0.286283   3rd Qu.: 0.296175  
 Max.   :709.00     Max.   : 1.754956   Max.   : 2.091122  
      Red                 Green                 Blue          
 Min.   :-2.2096266   Min.   :-2.0402513   Min.   :-1.427864  
 1st Qu.:-0.2299413   1st Qu.:-0.2643900   1st Qu.:-0.318854  
 Median : 0.0864125   Median : 0.0548392   Median :-0.001843  
 Mean   : 0.0006889   Mean   : 0.0008777   Mean   : 0.001234  
 3rd Qu.: 0.3228717   3rd Qu.: 0.3538943   3rd Qu.: 0.305844  
 Max.   : 0.9232058   Max.   : 1.0299014   Max.   : 1.367469  
   Elevation              Year           site          
 Min.   :-0.797500   Min.   :21.00   Length:561        
 1st Qu.:-0.026065   1st Qu.:21.00   Class :character  
 Median : 0.150659   Median :22.00   Mode  :character  
 Mean   : 0.002797   Mean   :21.96                     
 3rd Qu.: 0.506924   3rd Qu.:23.00                     
 Max.   : 0.637205   Max.   :23.00                     
head(dat)
  flower_stalk_count        Area Canopy_height        Red      Green       Blue
1                 35  0.05683302    0.65810944 -0.6649011 -0.6334925 -0.7322918
2                 44  0.79869890    0.44023654 -1.3718880 -1.3335028 -1.0880790
3                  9  1.03884363    2.09112179 -1.2504797 -1.2173260 -1.2901425
4                 30  1.03884363    2.09112179 -1.2504797 -1.2173260 -1.2901425
5                  8 -0.07725237    0.36095791 -1.2386661 -1.1720240 -0.9287089
6                 22  0.30274782    0.04107762 -1.1538512 -1.0705588 -0.8046447
  Elevation Year  site
1 0.5247955   22 site1
2 0.5145983   22 site1
3 0.5198842   22 site1
4 0.5198842   22 site1
5 0.5131348   22 site1
6 0.5072225   22 site1

2 Exploratory figures

Before fitting models, it is good practice to look at the response and at least one key predictor.

2.1 Distribution of flower stalk counts

This histogram shows the response variable. Because flower_stalk_count is a count, a negative binomial model is a sensible choice when the data may be overdispersed relative to a Poisson model.

ggplot(dat, aes(x = flower_stalk_count)) +
  geom_histogram(binwidth = 1, boundary = 0, closed = "left") +
  labs(
    title = "Distribution of flower stalk counts",
    x = "Flower stalk count",
    y = "Frequency"
  )

2.2 Relationship between canopy height and flower stalk count

This scatterplot helps us see whether Canopy_height might be associated with flower stalk production.

ggplot(dat, aes(x = Canopy_height, y = flower_stalk_count)) +
  geom_point(alpha = 0.7) +
  labs(
    title = "Flower stalk count vs. canopy height",
    x = "Canopy height",
    y = "Flower stalk count"
  )

3 Fit three candidate models

We fit all three models with the same likelihood family and MCMC settings so that the comparisons are fair.

3.1 Model 1: intercept-only

This model assumes that all observations are exchangeable with respect to the predictors. It provides a useful baseline.

fit_null <- brm(
  flower_stalk_count ~ 1,
  data = dat,
  family = negbinomial(),
  chains = 4, cores = 4, iter = 2000, warmup = 1000,
  seed = 123,
  save_pars = save_pars(all = TRUE)
)
Running "C:/PROGRA~1/R/R-45~1.2/bin/x64/Rcmd.exe" SHLIB foo.c
using C compiler: 'gcc.exe (GCC) 14.3.0'
gcc  -I"C:/PROGRA~1/R/R-45~1.2/include" -DNDEBUG   -I"C:/Users/trevorcaughlin/AppData/Local/R/win-library/4.5/Rcpp/include/"  -I"C:/Users/trevorcaughlin/AppData/Local/R/win-library/4.5/RcppEigen/include/"  -I"C:/Users/trevorcaughlin/AppData/Local/R/win-library/4.5/RcppEigen/include/unsupported"  -I"C:/Users/trevorcaughlin/AppData/Local/R/win-library/4.5/BH/include" -I"C:/Users/trevorcaughlin/AppData/Local/R/win-library/4.5/StanHeaders/include/src/"  -I"C:/Users/trevorcaughlin/AppData/Local/R/win-library/4.5/StanHeaders/include/"  -I"C:/Users/trevorcaughlin/AppData/Local/R/win-library/4.5/RcppParallel/include/" -DRCPP_PARALLEL_USE_TBB=1 -I"C:/Users/trevorcaughlin/AppData/Local/R/win-library/4.5/rstan/include" -DEIGEN_NO_DEBUG  -DBOOST_DISABLE_ASSERTS  -DBOOST_PENDING_INTEGER_LOG2_HPP  -DSTAN_THREADS  -DUSE_STANC3 -DSTRICT_R_HEADERS  -DBOOST_PHOENIX_NO_VARIADIC_EXPRESSION  -D_HAS_AUTO_PTR_ETC=0  -include "C:/Users/trevorcaughlin/AppData/Local/R/win-library/4.5/StanHeaders/include/stan/math/prim/fun/Eigen.hpp"  -std=c++1y    -I"C:/rtools45/x86_64-w64-mingw32.static.posix/include"      -O2 -Wall -std=gnu2x  -mfpmath=sse -msse2 -mstackrealign   -c foo.c -o foo.o
cc1.exe: warning: command-line option '-std=c++14' is valid for C++/ObjC++ but not for C
In file included from C:/Users/trevorcaughlin/AppData/Local/R/win-library/4.5/RcppEigen/include/Eigen/Core:19,
                 from C:/Users/trevorcaughlin/AppData/Local/R/win-library/4.5/RcppEigen/include/Eigen/Dense:1,
                 from C:/Users/trevorcaughlin/AppData/Local/R/win-library/4.5/StanHeaders/include/stan/math/prim/fun/Eigen.hpp:22,
                 from <command-line>:
C:/Users/trevorcaughlin/AppData/Local/R/win-library/4.5/RcppEigen/include/Eigen/src/Core/util/Macros.h:679:10: fatal error: cmath: No such file or directory
  679 | #include <cmath>
      |          ^~~~~~~
compilation terminated.
make: *** [C:/PROGRA~1/R/R-45~1.2/etc/x64/Makeconf:289: foo.o] Error 1

3.2 Model 2: canopy height only

This model asks whether a single biologically meaningful predictor improves predictions over the baseline.

fit_canopy <- brm(
  flower_stalk_count ~ Canopy_height,
  data = dat,
  family = negbinomial(),
  chains = 4, cores = 4, iter = 2000, warmup = 1000,
  seed = 123,
  save_pars = save_pars(all = TRUE)
)
Running "C:/PROGRA~1/R/R-45~1.2/bin/x64/Rcmd.exe" SHLIB foo.c
using C compiler: 'gcc.exe (GCC) 14.3.0'
gcc  -I"C:/PROGRA~1/R/R-45~1.2/include" -DNDEBUG   -I"C:/Users/trevorcaughlin/AppData/Local/R/win-library/4.5/Rcpp/include/"  -I"C:/Users/trevorcaughlin/AppData/Local/R/win-library/4.5/RcppEigen/include/"  -I"C:/Users/trevorcaughlin/AppData/Local/R/win-library/4.5/RcppEigen/include/unsupported"  -I"C:/Users/trevorcaughlin/AppData/Local/R/win-library/4.5/BH/include" -I"C:/Users/trevorcaughlin/AppData/Local/R/win-library/4.5/StanHeaders/include/src/"  -I"C:/Users/trevorcaughlin/AppData/Local/R/win-library/4.5/StanHeaders/include/"  -I"C:/Users/trevorcaughlin/AppData/Local/R/win-library/4.5/RcppParallel/include/" -DRCPP_PARALLEL_USE_TBB=1 -I"C:/Users/trevorcaughlin/AppData/Local/R/win-library/4.5/rstan/include" -DEIGEN_NO_DEBUG  -DBOOST_DISABLE_ASSERTS  -DBOOST_PENDING_INTEGER_LOG2_HPP  -DSTAN_THREADS  -DUSE_STANC3 -DSTRICT_R_HEADERS  -DBOOST_PHOENIX_NO_VARIADIC_EXPRESSION  -D_HAS_AUTO_PTR_ETC=0  -include "C:/Users/trevorcaughlin/AppData/Local/R/win-library/4.5/StanHeaders/include/stan/math/prim/fun/Eigen.hpp"  -std=c++1y    -I"C:/rtools45/x86_64-w64-mingw32.static.posix/include"      -O2 -Wall -std=gnu2x  -mfpmath=sse -msse2 -mstackrealign   -c foo.c -o foo.o
cc1.exe: warning: command-line option '-std=c++14' is valid for C++/ObjC++ but not for C
In file included from C:/Users/trevorcaughlin/AppData/Local/R/win-library/4.5/RcppEigen/include/Eigen/Core:19,
                 from C:/Users/trevorcaughlin/AppData/Local/R/win-library/4.5/RcppEigen/include/Eigen/Dense:1,
                 from C:/Users/trevorcaughlin/AppData/Local/R/win-library/4.5/StanHeaders/include/stan/math/prim/fun/Eigen.hpp:22,
                 from <command-line>:
C:/Users/trevorcaughlin/AppData/Local/R/win-library/4.5/RcppEigen/include/Eigen/src/Core/util/Macros.h:679:10: fatal error: cmath: No such file or directory
  679 | #include <cmath>
      |          ^~~~~~~
compilation terminated.
make: *** [C:/PROGRA~1/R/R-45~1.2/etc/x64/Makeconf:289: foo.o] Error 1

3.3 Model 3: full model

This model includes most of the continuous predictors in the example. Green, Red, and Blue where highly correlated (>90%) so we dropped the Green variable.

fit_full <- brm(
  flower_stalk_count ~ Canopy_height + Area + Red +  Blue + Elevation,
  data = dat,
  family = negbinomial(),
  chains = 4, cores = 4, iter = 2000, warmup = 1000,
  seed = 123,
  save_pars = save_pars(all = TRUE)
)
Running "C:/PROGRA~1/R/R-45~1.2/bin/x64/Rcmd.exe" SHLIB foo.c
using C compiler: 'gcc.exe (GCC) 14.3.0'
gcc  -I"C:/PROGRA~1/R/R-45~1.2/include" -DNDEBUG   -I"C:/Users/trevorcaughlin/AppData/Local/R/win-library/4.5/Rcpp/include/"  -I"C:/Users/trevorcaughlin/AppData/Local/R/win-library/4.5/RcppEigen/include/"  -I"C:/Users/trevorcaughlin/AppData/Local/R/win-library/4.5/RcppEigen/include/unsupported"  -I"C:/Users/trevorcaughlin/AppData/Local/R/win-library/4.5/BH/include" -I"C:/Users/trevorcaughlin/AppData/Local/R/win-library/4.5/StanHeaders/include/src/"  -I"C:/Users/trevorcaughlin/AppData/Local/R/win-library/4.5/StanHeaders/include/"  -I"C:/Users/trevorcaughlin/AppData/Local/R/win-library/4.5/RcppParallel/include/" -DRCPP_PARALLEL_USE_TBB=1 -I"C:/Users/trevorcaughlin/AppData/Local/R/win-library/4.5/rstan/include" -DEIGEN_NO_DEBUG  -DBOOST_DISABLE_ASSERTS  -DBOOST_PENDING_INTEGER_LOG2_HPP  -DSTAN_THREADS  -DUSE_STANC3 -DSTRICT_R_HEADERS  -DBOOST_PHOENIX_NO_VARIADIC_EXPRESSION  -D_HAS_AUTO_PTR_ETC=0  -include "C:/Users/trevorcaughlin/AppData/Local/R/win-library/4.5/StanHeaders/include/stan/math/prim/fun/Eigen.hpp"  -std=c++1y    -I"C:/rtools45/x86_64-w64-mingw32.static.posix/include"      -O2 -Wall -std=gnu2x  -mfpmath=sse -msse2 -mstackrealign   -c foo.c -o foo.o
cc1.exe: warning: command-line option '-std=c++14' is valid for C++/ObjC++ but not for C
In file included from C:/Users/trevorcaughlin/AppData/Local/R/win-library/4.5/RcppEigen/include/Eigen/Core:19,
                 from C:/Users/trevorcaughlin/AppData/Local/R/win-library/4.5/RcppEigen/include/Eigen/Dense:1,
                 from C:/Users/trevorcaughlin/AppData/Local/R/win-library/4.5/StanHeaders/include/stan/math/prim/fun/Eigen.hpp:22,
                 from <command-line>:
C:/Users/trevorcaughlin/AppData/Local/R/win-library/4.5/RcppEigen/include/Eigen/src/Core/util/Macros.h:679:10: fatal error: cmath: No such file or directory
  679 | #include <cmath>
      |          ^~~~~~~
compilation terminated.
make: *** [C:/PROGRA~1/R/R-45~1.2/etc/x64/Makeconf:289: foo.o] Error 1

4 Check model summaries

Before comparing models, it is always worth confirming that the fits look reasonable.

summary(fit_null)
 Family: negbinomial 
  Links: mu = log 
Formula: flower_stalk_count ~ 1 
   Data: dat (Number of observations: 561) 
  Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
         total post-warmup draws = 4000

Regression Coefficients:
          Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept     4.34      0.08     4.19     4.49 1.00     3283     2350

Further Distributional Parameters:
      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
shape     0.33      0.02     0.29     0.37 1.00     3337     2352

Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
summary(fit_canopy)
 Family: negbinomial 
  Links: mu = log 
Formula: flower_stalk_count ~ Canopy_height 
   Data: dat (Number of observations: 561) 
  Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
         total post-warmup draws = 4000

Regression Coefficients:
              Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept         4.20      0.07     4.06     4.34 1.00     3977     2963
Canopy_height     1.09      0.15     0.79     1.38 1.00     3845     2894

Further Distributional Parameters:
      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
shape     0.36      0.02     0.32     0.40 1.00     4538     3131

Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
summary(fit_full)
 Family: negbinomial 
  Links: mu = log 
Formula: flower_stalk_count ~ Canopy_height + Area + Red + Blue + Elevation 
   Data: dat (Number of observations: 561) 
  Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
         total post-warmup draws = 4000

Regression Coefficients:
              Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept         4.03      0.07     3.90     4.16 1.00     3805     2966
Canopy_height     1.50      0.20     1.12     1.87 1.00     3070     2977
Area              0.21      0.18    -0.13     0.56 1.00     2820     2697
Red               0.07      0.27    -0.45     0.61 1.00     2528     2188
Blue              0.45      0.29    -0.12     1.01 1.00     2377     2384
Elevation        -1.29      0.21    -1.70    -0.88 1.00     2650     2546

Further Distributional Parameters:
      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
shape     0.41      0.02     0.36     0.45 1.00     3682     2403

Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

5 Bayes R-squared

We can first look at Bayes R-squared as an in-sample fit of model performance. Remember, Bayes R-squared weighs how how much variation out of the total is predicted by the mean response.

bayes_R2(fit_null)
   Estimate Est.Error Q2.5 Q97.5
R2        0         0    0     0
bayes_R2(fit_canopy)
    Estimate  Est.Error       Q2.5     Q97.5
R2 0.2151041 0.07298535 0.08592518 0.3621288
bayes_R2(fit_full)
    Estimate  Est.Error      Q2.5    Q97.5
R2 0.3909146 0.05889579 0.2638577 0.489807

6

7 Compare models with LOOIC

7.1 What is LOOIC?

LOOIC is the leave-one-out information criterion. It estimates how well a model is expected to predict new observations by approximating a process where each observation is left out once and predicted from a model fit to the remaining data.

Interpretation:

  • Lower LOOIC is better
  • Small differences suggest similar predictive performance
  • Large improvements in LOOIC suggest better expected out-of-sample prediction
loo_null   <- loo(fit_null)
loo_canopy <- loo(fit_canopy)
loo_full   <- loo(fit_full)

loo_null

Computed from 4000 by 561 log-likelihood matrix.

         Estimate   SE
elpd_loo  -2745.6 48.3
p_loo         2.2  0.1
looic      5491.2 96.7
------
MCSE of elpd_loo is 0.0.
MCSE and ESS estimates assume MCMC draws (r_eff in [0.7, 0.9]).

All Pareto k estimates are good (k < 0.7).
See help('pareto-k-diagnostic') for details.
loo_canopy

Computed from 4000 by 561 log-likelihood matrix.

         Estimate   SE
elpd_loo  -2719.6 49.4
p_loo         4.3  1.2
looic      5439.2 98.8
------
MCSE of elpd_loo is 0.0.
MCSE and ESS estimates assume MCMC draws (r_eff in [0.8, 1.1]).

All Pareto k estimates are good (k < 0.7).
See help('pareto-k-diagnostic') for details.
loo_full

Computed from 4000 by 561 log-likelihood matrix.

         Estimate   SE
elpd_loo  -2682.6 46.5
p_loo         5.7  0.8
looic      5365.2 93.1
------
MCSE of elpd_loo is 0.0.
MCSE and ESS estimates assume MCMC draws (r_eff in [0.6, 1.4]).

All Pareto k estimates are good (k < 0.7).
See help('pareto-k-diagnostic') for details.
loo_comp <- loo_compare(loo_null, loo_canopy, loo_full)
loo_comp
           elpd_diff se_diff
fit_full     0.0       0.0  
fit_canopy -37.0       7.7  
fit_null   -63.0       8.9  

7.2 Plot LOOIC values

loo_tbl <- data.frame(
  model = c("Intercept only", "Canopy height", "Full model"),
  looic = c(
    loo_null$estimates["looic", "Estimate"],
    loo_canopy$estimates["looic", "Estimate"],
    loo_full$estimates["looic", "Estimate"]
  )
)

ggplot(loo_tbl, aes(x = reorder(model, looic), y = looic)) +
  geom_point(size = 3) +
  coord_flip() +
  labs(
    title = "LOOIC comparison",
    x = "Model",
    y = "LOOIC (lower is better)"
  )

8 Compare models with 4-fold cross-validation

8.1 What is k-fold cross-validation?

In k-fold cross-validation, the data are divided into K groups. Each group is held out once as a test set, while the model is fit to the other K - 1 groups. Here we use K = 4. This is equivalent to a 75/25% test-training data split. How do we choose a number for our splits? Finer splits (down to leave-one-out) enable better use of the data (bigger dataset for training) but are more computationally intensive. I would go with the finest split you can, while still having models run in a reasonable period of time. If your data is spatially, temporally, or taxanomically blocked, I would strongly consider a leave-one-group out design (see: https://nsojournals.onlinelibrary.wiley.com/doi/abs/10.1111/ecog.02881)

K <- 4
N <- nrow(dat)

set.seed(123)
folds <- sample(rep(1:K, length.out = N))
folds
  [1] 3 3 3 2 3 2 3 1 4 2 2 3 4 3 2 3 2 3 1 3 2 3 1 3 4 2 1 2 4 1 1 2 1 4 1 3 2
 [38] 4 3 4 4 3 3 1 3 2 1 1 1 4 2 1 4 2 1 3 3 4 4 2 4 2 1 2 2 4 3 2 1 4 3 3 3 2
 [75] 3 1 1 4 3 2 2 2 4 2 2 2 3 4 4 2 1 2 2 2 3 4 2 3 2 3 4 1 3 3 1 2 2 3 4 2 3
[112] 1 1 2 4 1 4 1 3 3 4 2 4 3 4 4 1 2 4 1 3 4 3 1 4 1 4 2 2 3 4 4 4 4 1 1 4 4
[149] 4 2 1 2 3 4 3 3 4 1 2 1 2 3 4 2 1 4 1 2 4 4 1 1 4 2 1 1 1 3 2 1 3 4 3 2 3
[186] 3 1 1 1 3 4 3 4 1 3 3 3 2 2 3 2 3 1 1 1 3 2 2 3 1 2 2 3 1 1 1 3 3 1 1 3 1
[223] 2 4 1 3 4 3 4 1 2 4 1 1 2 4 3 3 3 1 2 2 2 3 4 3 2 4 1 2 2 1 2 3 4 4 2 1 3
[260] 4 1 4 4 2 4 4 4 1 3 1 3 4 2 3 3 1 2 3 4 1 2 4 1 4 4 2 2 2 4 3 2 4 3 4 4 4
[297] 1 1 2 2 2 1 2 4 4 3 3 2 4 1 3 3 3 3 1 4 4 2 1 1 1 1 1 4 4 4 4 4 1 3 2 2 4
[334] 1 3 4 3 2 4 2 2 3 3 3 1 4 1 3 4 1 2 3 2 2 4 1 3 3 1 3 4 4 2 1 1 4 2 2 4 4
[371] 2 4 2 2 1 3 2 4 3 1 2 1 1 3 1 3 4 1 4 3 3 4 1 2 3 4 4 1 1 3 1 1 2 4 3 1 1
[408] 3 1 2 1 3 2 4 4 2 2 4 3 2 1 4 4 4 1 3 1 2 4 2 1 3 2 4 4 1 3 3 4 3 1 2 2 2
[445] 3 2 2 4 1 1 3 3 2 1 2 1 3 4 4 4 1 2 1 2 4 1 4 3 3 3 3 4 2 3 1 1 2 4 2 3 3
[482] 2 4 4 3 1 4 3 4 2 2 4 3 2 1 1 4 2 4 1 4 4 3 3 3 1 2 2 2 2 3 4 4 1 2 2 1 1
[519] 4 1 3 1 2 2 4 2 4 2 2 2 1 3 4 2 1 3 1 4 2 1 3 1 1 1 3 1 4 2 1 3 2 4 3 1 4
[556] 3 1 3 3 1 3

8.2 Fit k-fold comparisons

kfold_null <- kfold(fit_null, K = K, folds = folds, save_fits = TRUE)
kfold_canopy <- kfold(fit_canopy, K = K, folds = folds, save_fits = TRUE)
kfold_full <- kfold(fit_full, K = K, folds = folds, save_fits = TRUE)

kfold_null

Based on 4-fold cross-validation.

           Estimate   SE
elpd_kfold  -2745.1 48.3
p_kfold         1.7  0.7
kfoldic      5490.2 96.5
kfold_canopy

Based on 4-fold cross-validation.

           Estimate    SE
elpd_kfold  -2737.1  51.0
p_kfold        21.8   5.2
kfoldic      5474.2 102.0
kfold_full

Based on 4-fold cross-validation.

           Estimate   SE
elpd_kfold  -2681.7 46.4
p_kfold         4.7  1.2
kfoldic      5363.3 92.7

8.3 Extract k-fold expected log predictive density

For kfold(), a higher elpd_kfold indicates better predictive performance. ELPD is the Expected Log Predictive Density, which evaluates the log posterior predictive density, summed over observations. Statisticians would argue that ELPD and looic present the “best” estimate of model performance, relative to other models in the data.

kfold_tbl <- data.frame(
  model = c("Intercept only", "Canopy height", "Full model"),
  elpd_kfold = c(
    kfold_null$estimates["elpd_kfold", "Estimate"],
    kfold_canopy$estimates["elpd_kfold", "Estimate"],
    kfold_full$estimates["elpd_kfold", "Estimate"]
  )
)

kfold_tbl
           model elpd_kfold
1 Intercept only  -2745.094
2  Canopy height  -2737.109
3     Full model  -2681.653

8.4 Plot k-fold predictive performance

ggplot(kfold_tbl, aes(x = reorder(model, elpd_kfold), y = elpd_kfold)) +
  geom_point(size = 3) +
  coord_flip() +
  labs(
    title = "4-fold cross-validation comparison",
    x = "Model",
    y = "elpd_kfold (higher is better)"
  )

9 Out-of-sample MAE from posterior draws

9.1 Why MAE?

LOOIC and ELPD are based on the predictive log density, which is a principled Bayesian scoring rule. However, neither of these metrics have absolute interpretations. If you are presenting to a land manager, they want to know how well the model predicts sagebrush flower stalk counts, not some abstract mathematical concept!

Here we use mean absolute error (MAE):

\[ \text{MAE} = \frac{1}{n}\sum_{i=1}^n |y_i - \hat{y}_i| \]

Smaller MAE means better prediction on average.

9.2 Function to compute posterior draws of MAE

The function below uses kfold_predict() to return posterior predictive draws for the held-out observations, then computes one MAE value per posterior draw.

get_kfold_mae_draws <- function(fit, y, K, folds) {
  yrep <- kfold_predict(fit, K = K, folds = folds)
  mae_draws <- apply(yrep$yrep, 1, function(draw) mean(abs(yrep$y - draw)))
  return(mae_draws)
}

9.3 Compute MAE posterior draws for each model

mae_null_draws   <- get_kfold_mae_draws(kfold_null, dat$flower_stalk_count, K, folds)
mae_canopy_draws <- get_kfold_mae_draws(kfold_canopy, dat$flower_stalk_count, K, folds)
mae_full_draws   <- get_kfold_mae_draws(kfold_full, dat$flower_stalk_count, K, folds)

9.4 Summarize posterior MAE

We summarize each posterior distribution of MAE with:

  • the median
  • the 2.5th percentile
  • the 97.5th percentile

These form a simple 95% credible interval for the model’s predictive MAE.

mae_summary <- function(draws){
  MAE_median = median(draws)
  MAE_lower_95 = quantile(draws,probs = 0.025)
  MAE_upper_95 = quantile(draws, probs = 0.975)

return(data.frame(MAE_median,MAE_lower_95,MAE_upper_95))
}

model_MAE=rbind(mae_summary(mae_full_draws),mae_summary(mae_null_draws),mae_summary(mae_canopy_draws))
rownames(model_MAE)<-c("full","null","canopy")
colnames(model_MAE)<-c("median","2.5%","97.5%")
print(model_MAE)
         median     2.5%    97.5%
full   108.2219 92.90709 130.2332
null   108.5089 98.59947 120.4406
canopy 105.0098 91.99599 123.7266

9.5 Plot the full posterior distributions of MAE

This figure shows uncertainty in predictive performance, not just a single point estimate.

mae_long <- data.frame(
  MAE = c(mae_null_draws, mae_canopy_draws, mae_full_draws),
  model = rep(
    c("Intercept only", "Canopy height", "Full model"),
    times = c(length(mae_null_draws), length(mae_canopy_draws), length(mae_full_draws))
  )
)

ggplot(mae_long, aes(x = MAE, fill = model)) +
  geom_density(alpha = 0.4) +
  labs(
    title = "Posterior distributions of out-of-sample MAE",
    x = "MAE",
    y = "Density"
  )

How does this figure contrast with the estimates from looic and elpd? How does displaying uncertainty help us to understand the relative fit of the different models?