library(brms)
library(ggplot2)
dat <- read.csv("flower_power.csv")Bayesian Model Selection with brms
1 Overview
In this example, we compare three Bayesian count models for sagebrush flower stalk production, predicted from drone imagery using negative binomial regression in brms.
We will fit:
- an intercept-only model
- a model with canopy height only
- a full model with many predictors
Then we will compare models with two common predictive criteria:
- LOOIC: approximate leave-one-out cross-validation
- 4-fold cross-validation: repeated fitting to held-out subsets of the data
We will also summarize out-of-sample mean absolute error (MAE) using posterior draws from kfold_predict().
Our model selection procedure will addresss the following questions:
- Which model predicts new data best?
- Does added complexity improve predictive performance enough to justify it?
- Are simpler models nearly as good?
1.1 Quick look at the data
str(dat)'data.frame': 561 obs. of 9 variables:
$ flower_stalk_count: int 35 44 9 30 8 22 34 165 41 46 ...
$ Area : num 0.0568 0.7987 1.0388 1.0388 -0.0773 ...
$ Canopy_height : num 0.658 0.44 2.091 2.091 0.361 ...
$ Red : num -0.665 -1.372 -1.25 -1.25 -1.239 ...
$ Green : num -0.633 -1.334 -1.217 -1.217 -1.172 ...
$ Blue : num -0.732 -1.088 -1.29 -1.29 -0.929 ...
$ Elevation : num 0.525 0.515 0.52 0.52 0.513 ...
$ Year : int 22 22 22 22 22 22 22 22 22 22 ...
$ site : chr "site1" "site1" "site1" "site1" ...
summary(dat) flower_stalk_count Area Canopy_height
Min. : 0.00 Min. :-0.731163 Min. :-0.813657
1st Qu.: 4.00 1st Qu.:-0.361909 1st Qu.:-0.365651
Median : 24.00 Median :-0.120697 Median :-0.077639
Mean : 76.47 Mean :-0.003575 Mean :-0.002608
3rd Qu.: 92.00 3rd Qu.: 0.286283 3rd Qu.: 0.296175
Max. :709.00 Max. : 1.754956 Max. : 2.091122
Red Green Blue
Min. :-2.2096266 Min. :-2.0402513 Min. :-1.427864
1st Qu.:-0.2299413 1st Qu.:-0.2643900 1st Qu.:-0.318854
Median : 0.0864125 Median : 0.0548392 Median :-0.001843
Mean : 0.0006889 Mean : 0.0008777 Mean : 0.001234
3rd Qu.: 0.3228717 3rd Qu.: 0.3538943 3rd Qu.: 0.305844
Max. : 0.9232058 Max. : 1.0299014 Max. : 1.367469
Elevation Year site
Min. :-0.797500 Min. :21.00 Length:561
1st Qu.:-0.026065 1st Qu.:21.00 Class :character
Median : 0.150659 Median :22.00 Mode :character
Mean : 0.002797 Mean :21.96
3rd Qu.: 0.506924 3rd Qu.:23.00
Max. : 0.637205 Max. :23.00
head(dat) flower_stalk_count Area Canopy_height Red Green Blue
1 35 0.05683302 0.65810944 -0.6649011 -0.6334925 -0.7322918
2 44 0.79869890 0.44023654 -1.3718880 -1.3335028 -1.0880790
3 9 1.03884363 2.09112179 -1.2504797 -1.2173260 -1.2901425
4 30 1.03884363 2.09112179 -1.2504797 -1.2173260 -1.2901425
5 8 -0.07725237 0.36095791 -1.2386661 -1.1720240 -0.9287089
6 22 0.30274782 0.04107762 -1.1538512 -1.0705588 -0.8046447
Elevation Year site
1 0.5247955 22 site1
2 0.5145983 22 site1
3 0.5198842 22 site1
4 0.5198842 22 site1
5 0.5131348 22 site1
6 0.5072225 22 site1
2 Exploratory figures
Before fitting models, it is good practice to look at the response and at least one key predictor.
2.1 Distribution of flower stalk counts
This histogram shows the response variable. Because flower_stalk_count is a count, a negative binomial model is a sensible choice when the data may be overdispersed relative to a Poisson model.
ggplot(dat, aes(x = flower_stalk_count)) +
geom_histogram(binwidth = 1, boundary = 0, closed = "left") +
labs(
title = "Distribution of flower stalk counts",
x = "Flower stalk count",
y = "Frequency"
)2.2 Relationship between canopy height and flower stalk count
This scatterplot helps us see whether Canopy_height might be associated with flower stalk production.
ggplot(dat, aes(x = Canopy_height, y = flower_stalk_count)) +
geom_point(alpha = 0.7) +
labs(
title = "Flower stalk count vs. canopy height",
x = "Canopy height",
y = "Flower stalk count"
)3 Fit three candidate models
We fit all three models with the same likelihood family and MCMC settings so that the comparisons are fair.
3.1 Model 1: intercept-only
This model assumes that all observations are exchangeable with respect to the predictors. It provides a useful baseline.
fit_null <- brm(
flower_stalk_count ~ 1,
data = dat,
family = negbinomial(),
chains = 4, cores = 4, iter = 2000, warmup = 1000,
seed = 123,
save_pars = save_pars(all = TRUE)
)Running "C:/PROGRA~1/R/R-45~1.2/bin/x64/Rcmd.exe" SHLIB foo.c
using C compiler: 'gcc.exe (GCC) 14.3.0'
gcc -I"C:/PROGRA~1/R/R-45~1.2/include" -DNDEBUG -I"C:/Users/trevorcaughlin/AppData/Local/R/win-library/4.5/Rcpp/include/" -I"C:/Users/trevorcaughlin/AppData/Local/R/win-library/4.5/RcppEigen/include/" -I"C:/Users/trevorcaughlin/AppData/Local/R/win-library/4.5/RcppEigen/include/unsupported" -I"C:/Users/trevorcaughlin/AppData/Local/R/win-library/4.5/BH/include" -I"C:/Users/trevorcaughlin/AppData/Local/R/win-library/4.5/StanHeaders/include/src/" -I"C:/Users/trevorcaughlin/AppData/Local/R/win-library/4.5/StanHeaders/include/" -I"C:/Users/trevorcaughlin/AppData/Local/R/win-library/4.5/RcppParallel/include/" -DRCPP_PARALLEL_USE_TBB=1 -I"C:/Users/trevorcaughlin/AppData/Local/R/win-library/4.5/rstan/include" -DEIGEN_NO_DEBUG -DBOOST_DISABLE_ASSERTS -DBOOST_PENDING_INTEGER_LOG2_HPP -DSTAN_THREADS -DUSE_STANC3 -DSTRICT_R_HEADERS -DBOOST_PHOENIX_NO_VARIADIC_EXPRESSION -D_HAS_AUTO_PTR_ETC=0 -include "C:/Users/trevorcaughlin/AppData/Local/R/win-library/4.5/StanHeaders/include/stan/math/prim/fun/Eigen.hpp" -std=c++1y -I"C:/rtools45/x86_64-w64-mingw32.static.posix/include" -O2 -Wall -std=gnu2x -mfpmath=sse -msse2 -mstackrealign -c foo.c -o foo.o
cc1.exe: warning: command-line option '-std=c++14' is valid for C++/ObjC++ but not for C
In file included from C:/Users/trevorcaughlin/AppData/Local/R/win-library/4.5/RcppEigen/include/Eigen/Core:19,
from C:/Users/trevorcaughlin/AppData/Local/R/win-library/4.5/RcppEigen/include/Eigen/Dense:1,
from C:/Users/trevorcaughlin/AppData/Local/R/win-library/4.5/StanHeaders/include/stan/math/prim/fun/Eigen.hpp:22,
from <command-line>:
C:/Users/trevorcaughlin/AppData/Local/R/win-library/4.5/RcppEigen/include/Eigen/src/Core/util/Macros.h:679:10: fatal error: cmath: No such file or directory
679 | #include <cmath>
| ^~~~~~~
compilation terminated.
make: *** [C:/PROGRA~1/R/R-45~1.2/etc/x64/Makeconf:289: foo.o] Error 1
3.2 Model 2: canopy height only
This model asks whether a single biologically meaningful predictor improves predictions over the baseline.
fit_canopy <- brm(
flower_stalk_count ~ Canopy_height,
data = dat,
family = negbinomial(),
chains = 4, cores = 4, iter = 2000, warmup = 1000,
seed = 123,
save_pars = save_pars(all = TRUE)
)Running "C:/PROGRA~1/R/R-45~1.2/bin/x64/Rcmd.exe" SHLIB foo.c
using C compiler: 'gcc.exe (GCC) 14.3.0'
gcc -I"C:/PROGRA~1/R/R-45~1.2/include" -DNDEBUG -I"C:/Users/trevorcaughlin/AppData/Local/R/win-library/4.5/Rcpp/include/" -I"C:/Users/trevorcaughlin/AppData/Local/R/win-library/4.5/RcppEigen/include/" -I"C:/Users/trevorcaughlin/AppData/Local/R/win-library/4.5/RcppEigen/include/unsupported" -I"C:/Users/trevorcaughlin/AppData/Local/R/win-library/4.5/BH/include" -I"C:/Users/trevorcaughlin/AppData/Local/R/win-library/4.5/StanHeaders/include/src/" -I"C:/Users/trevorcaughlin/AppData/Local/R/win-library/4.5/StanHeaders/include/" -I"C:/Users/trevorcaughlin/AppData/Local/R/win-library/4.5/RcppParallel/include/" -DRCPP_PARALLEL_USE_TBB=1 -I"C:/Users/trevorcaughlin/AppData/Local/R/win-library/4.5/rstan/include" -DEIGEN_NO_DEBUG -DBOOST_DISABLE_ASSERTS -DBOOST_PENDING_INTEGER_LOG2_HPP -DSTAN_THREADS -DUSE_STANC3 -DSTRICT_R_HEADERS -DBOOST_PHOENIX_NO_VARIADIC_EXPRESSION -D_HAS_AUTO_PTR_ETC=0 -include "C:/Users/trevorcaughlin/AppData/Local/R/win-library/4.5/StanHeaders/include/stan/math/prim/fun/Eigen.hpp" -std=c++1y -I"C:/rtools45/x86_64-w64-mingw32.static.posix/include" -O2 -Wall -std=gnu2x -mfpmath=sse -msse2 -mstackrealign -c foo.c -o foo.o
cc1.exe: warning: command-line option '-std=c++14' is valid for C++/ObjC++ but not for C
In file included from C:/Users/trevorcaughlin/AppData/Local/R/win-library/4.5/RcppEigen/include/Eigen/Core:19,
from C:/Users/trevorcaughlin/AppData/Local/R/win-library/4.5/RcppEigen/include/Eigen/Dense:1,
from C:/Users/trevorcaughlin/AppData/Local/R/win-library/4.5/StanHeaders/include/stan/math/prim/fun/Eigen.hpp:22,
from <command-line>:
C:/Users/trevorcaughlin/AppData/Local/R/win-library/4.5/RcppEigen/include/Eigen/src/Core/util/Macros.h:679:10: fatal error: cmath: No such file or directory
679 | #include <cmath>
| ^~~~~~~
compilation terminated.
make: *** [C:/PROGRA~1/R/R-45~1.2/etc/x64/Makeconf:289: foo.o] Error 1
3.3 Model 3: full model
This model includes most of the continuous predictors in the example. Green, Red, and Blue where highly correlated (>90%) so we dropped the Green variable.
fit_full <- brm(
flower_stalk_count ~ Canopy_height + Area + Red + Blue + Elevation,
data = dat,
family = negbinomial(),
chains = 4, cores = 4, iter = 2000, warmup = 1000,
seed = 123,
save_pars = save_pars(all = TRUE)
)Running "C:/PROGRA~1/R/R-45~1.2/bin/x64/Rcmd.exe" SHLIB foo.c
using C compiler: 'gcc.exe (GCC) 14.3.0'
gcc -I"C:/PROGRA~1/R/R-45~1.2/include" -DNDEBUG -I"C:/Users/trevorcaughlin/AppData/Local/R/win-library/4.5/Rcpp/include/" -I"C:/Users/trevorcaughlin/AppData/Local/R/win-library/4.5/RcppEigen/include/" -I"C:/Users/trevorcaughlin/AppData/Local/R/win-library/4.5/RcppEigen/include/unsupported" -I"C:/Users/trevorcaughlin/AppData/Local/R/win-library/4.5/BH/include" -I"C:/Users/trevorcaughlin/AppData/Local/R/win-library/4.5/StanHeaders/include/src/" -I"C:/Users/trevorcaughlin/AppData/Local/R/win-library/4.5/StanHeaders/include/" -I"C:/Users/trevorcaughlin/AppData/Local/R/win-library/4.5/RcppParallel/include/" -DRCPP_PARALLEL_USE_TBB=1 -I"C:/Users/trevorcaughlin/AppData/Local/R/win-library/4.5/rstan/include" -DEIGEN_NO_DEBUG -DBOOST_DISABLE_ASSERTS -DBOOST_PENDING_INTEGER_LOG2_HPP -DSTAN_THREADS -DUSE_STANC3 -DSTRICT_R_HEADERS -DBOOST_PHOENIX_NO_VARIADIC_EXPRESSION -D_HAS_AUTO_PTR_ETC=0 -include "C:/Users/trevorcaughlin/AppData/Local/R/win-library/4.5/StanHeaders/include/stan/math/prim/fun/Eigen.hpp" -std=c++1y -I"C:/rtools45/x86_64-w64-mingw32.static.posix/include" -O2 -Wall -std=gnu2x -mfpmath=sse -msse2 -mstackrealign -c foo.c -o foo.o
cc1.exe: warning: command-line option '-std=c++14' is valid for C++/ObjC++ but not for C
In file included from C:/Users/trevorcaughlin/AppData/Local/R/win-library/4.5/RcppEigen/include/Eigen/Core:19,
from C:/Users/trevorcaughlin/AppData/Local/R/win-library/4.5/RcppEigen/include/Eigen/Dense:1,
from C:/Users/trevorcaughlin/AppData/Local/R/win-library/4.5/StanHeaders/include/stan/math/prim/fun/Eigen.hpp:22,
from <command-line>:
C:/Users/trevorcaughlin/AppData/Local/R/win-library/4.5/RcppEigen/include/Eigen/src/Core/util/Macros.h:679:10: fatal error: cmath: No such file or directory
679 | #include <cmath>
| ^~~~~~~
compilation terminated.
make: *** [C:/PROGRA~1/R/R-45~1.2/etc/x64/Makeconf:289: foo.o] Error 1
4 Check model summaries
Before comparing models, it is always worth confirming that the fits look reasonable.
summary(fit_null) Family: negbinomial
Links: mu = log
Formula: flower_stalk_count ~ 1
Data: dat (Number of observations: 561)
Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
total post-warmup draws = 4000
Regression Coefficients:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept 4.34 0.08 4.19 4.49 1.00 3283 2350
Further Distributional Parameters:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
shape 0.33 0.02 0.29 0.37 1.00 3337 2352
Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
summary(fit_canopy) Family: negbinomial
Links: mu = log
Formula: flower_stalk_count ~ Canopy_height
Data: dat (Number of observations: 561)
Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
total post-warmup draws = 4000
Regression Coefficients:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept 4.20 0.07 4.06 4.34 1.00 3977 2963
Canopy_height 1.09 0.15 0.79 1.38 1.00 3845 2894
Further Distributional Parameters:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
shape 0.36 0.02 0.32 0.40 1.00 4538 3131
Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
summary(fit_full) Family: negbinomial
Links: mu = log
Formula: flower_stalk_count ~ Canopy_height + Area + Red + Blue + Elevation
Data: dat (Number of observations: 561)
Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
total post-warmup draws = 4000
Regression Coefficients:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept 4.03 0.07 3.90 4.16 1.00 3805 2966
Canopy_height 1.50 0.20 1.12 1.87 1.00 3070 2977
Area 0.21 0.18 -0.13 0.56 1.00 2820 2697
Red 0.07 0.27 -0.45 0.61 1.00 2528 2188
Blue 0.45 0.29 -0.12 1.01 1.00 2377 2384
Elevation -1.29 0.21 -1.70 -0.88 1.00 2650 2546
Further Distributional Parameters:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
shape 0.41 0.02 0.36 0.45 1.00 3682 2403
Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
5 Bayes R-squared
We can first look at Bayes R-squared as an in-sample fit of model performance. Remember, Bayes R-squared weighs how how much variation out of the total is predicted by the mean response.
bayes_R2(fit_null) Estimate Est.Error Q2.5 Q97.5
R2 0 0 0 0
bayes_R2(fit_canopy) Estimate Est.Error Q2.5 Q97.5
R2 0.2151041 0.07298535 0.08592518 0.3621288
bayes_R2(fit_full) Estimate Est.Error Q2.5 Q97.5
R2 0.3909146 0.05889579 0.2638577 0.489807
6
7 Compare models with LOOIC
7.1 What is LOOIC?
LOOIC is the leave-one-out information criterion. It estimates how well a model is expected to predict new observations by approximating a process where each observation is left out once and predicted from a model fit to the remaining data.
Interpretation:
- Lower LOOIC is better
- Small differences suggest similar predictive performance
- Large improvements in LOOIC suggest better expected out-of-sample prediction
loo_null <- loo(fit_null)
loo_canopy <- loo(fit_canopy)
loo_full <- loo(fit_full)
loo_null
Computed from 4000 by 561 log-likelihood matrix.
Estimate SE
elpd_loo -2745.6 48.3
p_loo 2.2 0.1
looic 5491.2 96.7
------
MCSE of elpd_loo is 0.0.
MCSE and ESS estimates assume MCMC draws (r_eff in [0.7, 0.9]).
All Pareto k estimates are good (k < 0.7).
See help('pareto-k-diagnostic') for details.
loo_canopy
Computed from 4000 by 561 log-likelihood matrix.
Estimate SE
elpd_loo -2719.6 49.4
p_loo 4.3 1.2
looic 5439.2 98.8
------
MCSE of elpd_loo is 0.0.
MCSE and ESS estimates assume MCMC draws (r_eff in [0.8, 1.1]).
All Pareto k estimates are good (k < 0.7).
See help('pareto-k-diagnostic') for details.
loo_full
Computed from 4000 by 561 log-likelihood matrix.
Estimate SE
elpd_loo -2682.6 46.5
p_loo 5.7 0.8
looic 5365.2 93.1
------
MCSE of elpd_loo is 0.0.
MCSE and ESS estimates assume MCMC draws (r_eff in [0.6, 1.4]).
All Pareto k estimates are good (k < 0.7).
See help('pareto-k-diagnostic') for details.
loo_comp <- loo_compare(loo_null, loo_canopy, loo_full)
loo_comp elpd_diff se_diff
fit_full 0.0 0.0
fit_canopy -37.0 7.7
fit_null -63.0 8.9
7.2 Plot LOOIC values
loo_tbl <- data.frame(
model = c("Intercept only", "Canopy height", "Full model"),
looic = c(
loo_null$estimates["looic", "Estimate"],
loo_canopy$estimates["looic", "Estimate"],
loo_full$estimates["looic", "Estimate"]
)
)
ggplot(loo_tbl, aes(x = reorder(model, looic), y = looic)) +
geom_point(size = 3) +
coord_flip() +
labs(
title = "LOOIC comparison",
x = "Model",
y = "LOOIC (lower is better)"
)8 Compare models with 4-fold cross-validation
8.1 What is k-fold cross-validation?
In k-fold cross-validation, the data are divided into K groups. Each group is held out once as a test set, while the model is fit to the other K - 1 groups. Here we use K = 4. This is equivalent to a 75/25% test-training data split. How do we choose a number for our splits? Finer splits (down to leave-one-out) enable better use of the data (bigger dataset for training) but are more computationally intensive. I would go with the finest split you can, while still having models run in a reasonable period of time. If your data is spatially, temporally, or taxanomically blocked, I would strongly consider a leave-one-group out design (see: https://nsojournals.onlinelibrary.wiley.com/doi/abs/10.1111/ecog.02881)
K <- 4
N <- nrow(dat)
set.seed(123)
folds <- sample(rep(1:K, length.out = N))
folds [1] 3 3 3 2 3 2 3 1 4 2 2 3 4 3 2 3 2 3 1 3 2 3 1 3 4 2 1 2 4 1 1 2 1 4 1 3 2
[38] 4 3 4 4 3 3 1 3 2 1 1 1 4 2 1 4 2 1 3 3 4 4 2 4 2 1 2 2 4 3 2 1 4 3 3 3 2
[75] 3 1 1 4 3 2 2 2 4 2 2 2 3 4 4 2 1 2 2 2 3 4 2 3 2 3 4 1 3 3 1 2 2 3 4 2 3
[112] 1 1 2 4 1 4 1 3 3 4 2 4 3 4 4 1 2 4 1 3 4 3 1 4 1 4 2 2 3 4 4 4 4 1 1 4 4
[149] 4 2 1 2 3 4 3 3 4 1 2 1 2 3 4 2 1 4 1 2 4 4 1 1 4 2 1 1 1 3 2 1 3 4 3 2 3
[186] 3 1 1 1 3 4 3 4 1 3 3 3 2 2 3 2 3 1 1 1 3 2 2 3 1 2 2 3 1 1 1 3 3 1 1 3 1
[223] 2 4 1 3 4 3 4 1 2 4 1 1 2 4 3 3 3 1 2 2 2 3 4 3 2 4 1 2 2 1 2 3 4 4 2 1 3
[260] 4 1 4 4 2 4 4 4 1 3 1 3 4 2 3 3 1 2 3 4 1 2 4 1 4 4 2 2 2 4 3 2 4 3 4 4 4
[297] 1 1 2 2 2 1 2 4 4 3 3 2 4 1 3 3 3 3 1 4 4 2 1 1 1 1 1 4 4 4 4 4 1 3 2 2 4
[334] 1 3 4 3 2 4 2 2 3 3 3 1 4 1 3 4 1 2 3 2 2 4 1 3 3 1 3 4 4 2 1 1 4 2 2 4 4
[371] 2 4 2 2 1 3 2 4 3 1 2 1 1 3 1 3 4 1 4 3 3 4 1 2 3 4 4 1 1 3 1 1 2 4 3 1 1
[408] 3 1 2 1 3 2 4 4 2 2 4 3 2 1 4 4 4 1 3 1 2 4 2 1 3 2 4 4 1 3 3 4 3 1 2 2 2
[445] 3 2 2 4 1 1 3 3 2 1 2 1 3 4 4 4 1 2 1 2 4 1 4 3 3 3 3 4 2 3 1 1 2 4 2 3 3
[482] 2 4 4 3 1 4 3 4 2 2 4 3 2 1 1 4 2 4 1 4 4 3 3 3 1 2 2 2 2 3 4 4 1 2 2 1 1
[519] 4 1 3 1 2 2 4 2 4 2 2 2 1 3 4 2 1 3 1 4 2 1 3 1 1 1 3 1 4 2 1 3 2 4 3 1 4
[556] 3 1 3 3 1 3
8.2 Fit k-fold comparisons
kfold_null <- kfold(fit_null, K = K, folds = folds, save_fits = TRUE)
kfold_canopy <- kfold(fit_canopy, K = K, folds = folds, save_fits = TRUE)
kfold_full <- kfold(fit_full, K = K, folds = folds, save_fits = TRUE)
kfold_null
Based on 4-fold cross-validation.
Estimate SE
elpd_kfold -2745.1 48.3
p_kfold 1.7 0.7
kfoldic 5490.2 96.5
kfold_canopy
Based on 4-fold cross-validation.
Estimate SE
elpd_kfold -2737.1 51.0
p_kfold 21.8 5.2
kfoldic 5474.2 102.0
kfold_full
Based on 4-fold cross-validation.
Estimate SE
elpd_kfold -2681.7 46.4
p_kfold 4.7 1.2
kfoldic 5363.3 92.7
8.3 Extract k-fold expected log predictive density
For kfold(), a higher elpd_kfold indicates better predictive performance. ELPD is the Expected Log Predictive Density, which evaluates the log posterior predictive density, summed over observations. Statisticians would argue that ELPD and looic present the “best” estimate of model performance, relative to other models in the data.
kfold_tbl <- data.frame(
model = c("Intercept only", "Canopy height", "Full model"),
elpd_kfold = c(
kfold_null$estimates["elpd_kfold", "Estimate"],
kfold_canopy$estimates["elpd_kfold", "Estimate"],
kfold_full$estimates["elpd_kfold", "Estimate"]
)
)
kfold_tbl model elpd_kfold
1 Intercept only -2745.094
2 Canopy height -2737.109
3 Full model -2681.653
8.4 Plot k-fold predictive performance
ggplot(kfold_tbl, aes(x = reorder(model, elpd_kfold), y = elpd_kfold)) +
geom_point(size = 3) +
coord_flip() +
labs(
title = "4-fold cross-validation comparison",
x = "Model",
y = "elpd_kfold (higher is better)"
)9 Out-of-sample MAE from posterior draws
9.1 Why MAE?
LOOIC and ELPD are based on the predictive log density, which is a principled Bayesian scoring rule. However, neither of these metrics have absolute interpretations. If you are presenting to a land manager, they want to know how well the model predicts sagebrush flower stalk counts, not some abstract mathematical concept!
Here we use mean absolute error (MAE):
\[ \text{MAE} = \frac{1}{n}\sum_{i=1}^n |y_i - \hat{y}_i| \]
Smaller MAE means better prediction on average.
9.2 Function to compute posterior draws of MAE
The function below uses kfold_predict() to return posterior predictive draws for the held-out observations, then computes one MAE value per posterior draw.
get_kfold_mae_draws <- function(fit, y, K, folds) {
yrep <- kfold_predict(fit, K = K, folds = folds)
mae_draws <- apply(yrep$yrep, 1, function(draw) mean(abs(yrep$y - draw)))
return(mae_draws)
}9.3 Compute MAE posterior draws for each model
mae_null_draws <- get_kfold_mae_draws(kfold_null, dat$flower_stalk_count, K, folds)
mae_canopy_draws <- get_kfold_mae_draws(kfold_canopy, dat$flower_stalk_count, K, folds)
mae_full_draws <- get_kfold_mae_draws(kfold_full, dat$flower_stalk_count, K, folds)9.4 Summarize posterior MAE
We summarize each posterior distribution of MAE with:
- the median
- the 2.5th percentile
- the 97.5th percentile
These form a simple 95% credible interval for the model’s predictive MAE.
mae_summary <- function(draws){
MAE_median = median(draws)
MAE_lower_95 = quantile(draws,probs = 0.025)
MAE_upper_95 = quantile(draws, probs = 0.975)
return(data.frame(MAE_median,MAE_lower_95,MAE_upper_95))
}
model_MAE=rbind(mae_summary(mae_full_draws),mae_summary(mae_null_draws),mae_summary(mae_canopy_draws))
rownames(model_MAE)<-c("full","null","canopy")
colnames(model_MAE)<-c("median","2.5%","97.5%")
print(model_MAE) median 2.5% 97.5%
full 108.2219 92.90709 130.2332
null 108.5089 98.59947 120.4406
canopy 105.0098 91.99599 123.7266
9.5 Plot the full posterior distributions of MAE
This figure shows uncertainty in predictive performance, not just a single point estimate.
mae_long <- data.frame(
MAE = c(mae_null_draws, mae_canopy_draws, mae_full_draws),
model = rep(
c("Intercept only", "Canopy height", "Full model"),
times = c(length(mae_null_draws), length(mae_canopy_draws), length(mae_full_draws))
)
)
ggplot(mae_long, aes(x = MAE, fill = model)) +
geom_density(alpha = 0.4) +
labs(
title = "Posterior distributions of out-of-sample MAE",
x = "MAE",
y = "Density"
)How does this figure contrast with the estimates from looic and elpd? How does displaying uncertainty help us to understand the relative fit of the different models?