Bayesian Credible Intervals

Author

Dr Andrew Dalby

Ever since I read the book Statistics with Confidence by Doug Altman and colleagues(Altman et al. 2013) I have been a very enthusiastic user of confidence intervals. What I specifically want from my students is for them to think about the process of estimation and inference and to realise that point estimates are rarely valid and an interval is a much better way to present the data.

I am also not part of the statistics wars between the frequentists and Bayesians. I come from a Biochemistry background and in Biology Bayes should be the dominant paradigm. Not because of some philosophical reason but because we know so little and there is so much complexity.

I have been reading Sewall Wright’s Evolution and the Genetics of Populations(Wright 1984) and he makes it very clear when deriving the formulae in the first volume that a Bayesian approach is required.

As a simple example I am going to use STAN to calculate the Bayesian Credible interval based on the height data that I collected from my students over many years. I am using STAN and not BUGS because of the improved sampling in STAN which reduces the computational cost.

I have split the dataset based on gender for this particular example.

library(haven)
library(rstan)
library(ggplot2)

data <- read_sav("Expanded Student Data Female.sav")

options(mc.cores=parallel::detectCores())
Y <- data$Height

fit <- stan("female_heights.stan",iter=1000,chains=4, data=list(Y=Y))

Running /Library/Frameworks/R.framework/Resources/bin/R CMD SHLIB foo.c
using C compiler: ‘Apple clang version 17.0.0 (clang-1700.4.4.1)’
using SDK: ‘MacOSX26.1.sdk’
clang -arch x86_64 -I"/Library/Frameworks/R.framework/Resources/include" -DNDEBUG   -I"/Library/Frameworks/R.framework/Versions/4.5-x86_64/Resources/library/Rcpp/include/"  -I"/Library/Frameworks/R.framework/Versions/4.5-x86_64/Resources/library/RcppEigen/include/"  -I"/Library/Frameworks/R.framework/Versions/4.5-x86_64/Resources/library/RcppEigen/include/unsupported"  -I"/Library/Frameworks/R.framework/Versions/4.5-x86_64/Resources/library/BH/include" -I"/Library/Frameworks/R.framework/Versions/4.5-x86_64/Resources/library/StanHeaders/include/src/"  -I"/Library/Frameworks/R.framework/Versions/4.5-x86_64/Resources/library/StanHeaders/include/"  -I"/Library/Frameworks/R.framework/Versions/4.5-x86_64/Resources/library/RcppParallel/include/"  -I"/Library/Frameworks/R.framework/Versions/4.5-x86_64/Resources/library/rstan/include" -DEIGEN_NO_DEBUG  -DBOOST_DISABLE_ASSERTS  -DBOOST_PENDING_INTEGER_LOG2_HPP  -DSTAN_THREADS  -DUSE_STANC3 -DSTRICT_R_HEADERS  -DBOOST_PHOENIX_NO_VARIADIC_EXPRESSION  -D_HAS_AUTO_PTR_ETC=0  -include '/Library/Frameworks/R.framework/Versions/4.5-x86_64/Resources/library/StanHeaders/include/stan/math/prim/fun/Eigen.hpp'  -D_REENTRANT -DRCPP_PARALLEL_USE_TBB=1   -I/opt/R/x86_64/include    -fPIC  -falign-functions=64 -Wall -g -O2  -c foo.c -o foo.o
In file included from <built-in>:1:
In file included from /Library/Frameworks/R.framework/Versions/4.5-x86_64/Resources/library/StanHeaders/include/stan/math/prim/fun/Eigen.hpp:22:
In file included from /Library/Frameworks/R.framework/Versions/4.5-x86_64/Resources/library/RcppEigen/include/Eigen/Dense:1:
In file included from /Library/Frameworks/R.framework/Versions/4.5-x86_64/Resources/library/RcppEigen/include/Eigen/Core:19:
/Library/Frameworks/R.framework/Versions/4.5-x86_64/Resources/library/RcppEigen/include/Eigen/src/Core/util/Macros.h:679:10: fatal error: 'cmath' file not found
  679 | #include <cmath>
      |          ^~~~~~~
1 error generated.
make: *** [foo.o] Error 1

SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 1).
Chain 1: 
Chain 1: Gradient evaluation took 0.000122 seconds
Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 1.22 seconds.
Chain 1: Adjust your expectations accordingly!
Chain 1: 
Chain 1: 
Chain 1: Iteration:   1 / 1000 [  0%]  (Warmup)

SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 2).
Chain 
SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 2: 
Chain 2: Gradient evaluation took 0.000125 seconds
3).
Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 1.25 seconds.
Chain 2: Adjust your expectations accordingly!
Chain 2: 
Chain 2: 
Chain 2: Iteration:   1 / 1000 [  0%]  (Warmup)
Chain 3: 
Chain 3: Gradient evaluation took 0.000147 seconds
Chain 3: 1000 transitions using 10 leapfrog steps per transition would take 1.47 seconds.
Chain 3: Adjust your expectations accordingly!
Chain 3: 
Chain 3: 

SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 4).
Chain 4: 
Chain 4: Gradient evaluation took 0.000125 seconds
Chain 4: 1000 transitions using 10 leapfrog steps per transition would take 1.25 seconds.
Chain 4: Adjust your expectations accordingly!
Chain 4: 
Chain 4: 
Chain 4: Iteration:   1 / 1000 [  0%]  (Warmup)
Chain 3: Iteration:   1 / 1000 [  0%]  (Warmup)
Chain 1: Iteration: 100 / 1000 [ 10%]  (Warmup)
Chain 3: Iteration: 100 / 1000 [ 10%]  (Warmup)
Chain 2: Iteration: 100 / 1000 [ 10%]  (Warmup)
Chain 4: Iteration: 100 / 1000 [ 10%]  (Warmup)
Chain 1: Iteration: 200 / 1000 [ 20%]  (Warmup)
Chain 2: Iteration: 200 / 1000 [ 20%]  (Warmup)
Chain 3: Iteration: 200 / 1000 [ 20%]  (Warmup)
Chain 1: Iteration: 300 / 1000 [ 30%]  (Warmup)
Chain 4: Iteration: 200 / 1000 [ 20%]  (Warmup)
Chain 2: Iteration: 300 / 1000 [ 30%]  (Warmup)
Chain 3: Iteration: 300 / 1000 [ 30%]  (Warmup)
Chain 1: Iteration: 400 / 1000 [ 40%]  (Warmup)
Chain 4: Iteration: 300 / 1000 [ 30%]  (Warmup)
Chain 2: Iteration: 400 / 1000 [ 40%]  (Warmup)
Chain 3: Iteration: 400 / 1000 [ 40%]  (Warmup)
Chain 1: Iteration: 500 / 1000 [ 50%]  (Warmup)
Chain 1: Iteration: 501 / 1000 [ 50%]  (Sampling)
Chain 4: Iteration: 400 / 1000 [ 40%]  (Warmup)
Chain 2: Iteration: 500 / 1000 [ 50%]  (Warmup)
Chain 2: Iteration: 501 / 1000 [ 50%]  (Sampling)
Chain 3: Iteration: 500 / 1000 [ 50%]  (Warmup)
Chain 3: Iteration: 501 / 1000 [ 50%]  (Sampling)
Chain 1: Iteration: 600 / 1000 [ 60%]  (Sampling)
Chain 4: Iteration: 500 / 1000 [ 50%]  (Warmup)
Chain 4: Iteration: 501 / 1000 [ 50%]  (Sampling)
Chain 3: Iteration: 600 / 1000 [ 60%]  (Sampling)
Chain 2: Iteration: 600 / 1000 [ 60%]  (Sampling)
Chain 1: Iteration: 700 / 1000 [ 70%]  (Sampling)
Chain 4: Iteration: 600 / 1000 [ 60%]  (Sampling)
Chain 3: Iteration: 700 / 1000 [ 70%]  (Sampling)
Chain 2: Iteration: 700 / 1000 [ 70%]  (Sampling)
Chain 1: Iteration: 800 / 1000 [ 80%]  (Sampling)
Chain 4: Iteration: 700 / 1000 [ 70%]  (Sampling)
Chain 3: Iteration: 800 / 1000 [ 80%]  (Sampling)
Chain 2: Iteration: 800 / 1000 [ 80%]  (Sampling)
Chain 1: Iteration: 900 / 1000 [ 90%]  (Sampling)
Chain 4: Iteration: 800 / 1000 [ 80%]  (Sampling)
Chain 3: Iteration: 900 / 1000 [ 90%]  (Sampling)
Chain 2: Iteration: 900 / 1000 [ 90%]  (Sampling)
Chain 1: Iteration: 1000 / 1000 [100%]  (Sampling)
Chain 1: 
Chain 1:  Elapsed Time: 0.119 seconds (Warm-up)
Chain 1:                0.085 seconds (Sampling)
Chain 1:                0.204 seconds (Total)
Chain 1: 
Chain 4: Iteration: 900 / 1000 [ 90%]  (Sampling)
Chain 3: Iteration: 1000 / 1000 [100%]  (Sampling)
Chain 3: 
Chain 3:  Elapsed Time: 0.118 seconds (Warm-up)
Chain 3:                0.074 seconds (Sampling)
Chain 3:                0.192 seconds (Total)
Chain 3: 
Chain 2: Iteration: 1000 / 1000 [100%]  (Sampling)
Chain 2: 
Chain 2:  Elapsed Time: 0.128 seconds (Warm-up)
Chain 2:                0.081 seconds (Sampling)
Chain 2:                0.209 seconds (Total)
Chain 2: 
Chain 4: Iteration: 1000 / 1000 [100%]  (Sampling)
Chain 4: 
Chain 4:  Elapsed Time: 0.133 seconds (Warm-up)
Chain 4:                0.08 seconds (Sampling)
Chain 4:                0.213 seconds (Total)
Chain 4:

print(fit,probs=c(0.025, 0.5, 0.975))

Inference for Stan model: anon_model.
4 chains, each with iter=1000; warmup=500; thin=1; 
post-warmup draws per chain=500, total post-warmup draws=2000.

                  mean se_mean   sd    2.5%     50%   97.5% n_eff Rhat
mu                1.63    0.00 0.00    1.62    1.63    1.64  2192 1.00
sigma             0.07    0.00 0.00    0.07    0.07    0.08   691 1.00
aMax_indicator    0.63    0.01 0.48    0.00    1.00    1.00  1863 1.00
aMin_indicator    0.35    0.01 0.48    0.00    0.00    1.00  1941 1.00
lp__           1076.05    0.04 1.09 1073.02 1076.38 1077.11   775 1.01

Samples were drawn using NUTS(diag_e) at Mon Dec 15 13:30:27 2025.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at 
convergence, Rhat=1).

mu <- data.frame(extract(fit,"mu")[[1]])
colnames(mu) <- c("Mean")
sigma <- extract(fit,"sigma")[[1]]
ggplot(mu, aes(x=Mean))+
  geom_histogram()+
  labs(title="Histogram of the STAN Simulated Mean Heights for Female Students", x="Mean Height (m)")

From the summary at the end of the Stan simulations the 95% credible interval is from 1.62 to 1.64 m.

This compares to the 95% confidence interval which is 1.62 and 1.64 m in SPSS (which also gave a Bayesian credible interval of 1.62 to 1.64 m).

This is a lot of computational work to get very little benefit which is why I am a pragmatic Bayesian. In this case it was a simple question and we had a large sample which did not contain any messy data. In this case the Bayesian Credible Interval and Confidence Interval are the same. The difference is in the validity of the interpretation. We cannot interpret the confidence interval in the same way as the Bayesian Credible Interval, but as they converge to the same result I consider that it is rather pedantic than practical.

data <- read_sav("Expanded Student Data Male.sav")

options(mc.cores=parallel::detectCores())
Y <- data$Height

fit <- stan("male_heights.stan",iter=1000,chains=4, data=list(Y=Y))

Running /Library/Frameworks/R.framework/Resources/bin/R CMD SHLIB foo.c
using C compiler: ‘Apple clang version 17.0.0 (clang-1700.4.4.1)’
using SDK: ‘MacOSX26.1.sdk’
clang -arch x86_64 -I"/Library/Frameworks/R.framework/Resources/include" -DNDEBUG   -I"/Library/Frameworks/R.framework/Versions/4.5-x86_64/Resources/library/Rcpp/include/"  -I"/Library/Frameworks/R.framework/Versions/4.5-x86_64/Resources/library/RcppEigen/include/"  -I"/Library/Frameworks/R.framework/Versions/4.5-x86_64/Resources/library/RcppEigen/include/unsupported"  -I"/Library/Frameworks/R.framework/Versions/4.5-x86_64/Resources/library/BH/include" -I"/Library/Frameworks/R.framework/Versions/4.5-x86_64/Resources/library/StanHeaders/include/src/"  -I"/Library/Frameworks/R.framework/Versions/4.5-x86_64/Resources/library/StanHeaders/include/"  -I"/Library/Frameworks/R.framework/Versions/4.5-x86_64/Resources/library/RcppParallel/include/"  -I"/Library/Frameworks/R.framework/Versions/4.5-x86_64/Resources/library/rstan/include" -DEIGEN_NO_DEBUG  -DBOOST_DISABLE_ASSERTS  -DBOOST_PENDING_INTEGER_LOG2_HPP  -DSTAN_THREADS  -DUSE_STANC3 -DSTRICT_R_HEADERS  -DBOOST_PHOENIX_NO_VARIADIC_EXPRESSION  -D_HAS_AUTO_PTR_ETC=0  -include '/Library/Frameworks/R.framework/Versions/4.5-x86_64/Resources/library/StanHeaders/include/stan/math/prim/fun/Eigen.hpp'  -D_REENTRANT -DRCPP_PARALLEL_USE_TBB=1   -I/opt/R/x86_64/include    -fPIC  -falign-functions=64 -Wall -g -O2  -c foo.c -o foo.o
In file included from <built-in>:1:
In file included from /Library/Frameworks/R.framework/Versions/4.5-x86_64/Resources/library/StanHeaders/include/stan/math/prim/fun/Eigen.hpp:22:
In file included from /Library/Frameworks/R.framework/Versions/4.5-x86_64/Resources/library/RcppEigen/include/Eigen/Dense:1:
In file included from /Library/Frameworks/R.framework/Versions/4.5-x86_64/Resources/library/RcppEigen/include/Eigen/Core:19:
/Library/Frameworks/R.framework/Versions/4.5-x86_64/Resources/library/RcppEigen/include/Eigen/src/Core/util/Macros.h:679:10: fatal error: 'cmath' file not found
  679 | #include <cmath>
      |          ^~~~~~~
1 error generated.
make: *** [foo.o] Error 1

SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 1).
Chain 1: 
Chain 1: Gradient evaluation took 5.9e-05 seconds
Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 0.59 seconds.
Chain 1: Adjust your expectations accordingly!
Chain 1: 
Chain 1: 

SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 2).
Chain 2: 
Chain 2: Gradient evaluation took 6.4e-05 seconds
Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 0.64 seconds.
Chain 2: Adjust your expectations accordingly!
Chain 2: 
Chain 2: 
Chain 2: Iteration:   1 / 1000 [  0%]  (Warmup)

SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 3).
Chain 3: 
Chain 3: Gradient evaluation took 5.1e-05 seconds
Chain 3: 1000 transitions using 10 leapfrog steps per transition would take 0.51 seconds.
Chain 3: Adjust your expectations accordingly!
Chain 3: 
Chain 3: 

SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 4).
Chain 4: 
Chain 4: Gradient evaluation took 4.9e-05 seconds
Chain 4: 1000 transitions using 10 leapfrog steps per transition would take 0.49 seconds.
Chain 4: Adjust your expectations accordingly!
Chain 4: 
Chain 4: 
Chain 2: Iteration: 100 / 1000 [ 10%]  (Warmup)
Chain 1: Iteration:   1 / 1000 [  0%]  (Warmup)
Chain 3: Iteration:   1 / 1000 [  0%]  (Warmup)
Chain 2: Iteration: 200 / 1000 [ 20%]  (Warmup)
Chain 4: Iteration:   1 / 1000 [  0%]  (Warmup)
Chain 2: Iteration: 300 / 1000 [ 30%]  (Warmup)
Chain 2: Iteration: 400 / 1000 [ 40%]  (Warmup)
Chain 3: Iteration: 100 / 1000 [ 10%]  (Warmup)
Chain 1: Iteration: 100 / 1000 [ 10%]  (Warmup)
Chain 2: Iteration: 500 / 1000 [ 50%]  (Warmup)
Chain 2: Iteration: 501 / 1000 [ 50%]  (Sampling)
Chain 3: Iteration: 200 / 1000 [ 20%]  (Warmup)
Chain 4: Iteration: 100 / 1000 [ 10%]  (Warmup)
Chain 1: Iteration: 200 / 1000 [ 20%]  (Warmup)
Chain 2: Iteration: 600 / 1000 [ 60%]  (Sampling)
Chain 3: Iteration: 300 / 1000 [ 30%]  (Warmup)
Chain 2: Iteration: 700 / 1000 [ 70%]  (Sampling)
Chain 1: Iteration: 300 / 1000 [ 30%]  (Warmup)
Chain 4: Iteration: 200 / 1000 [ 20%]  (Warmup)
Chain 3: Iteration: 400 / 1000 [ 40%]  (Warmup)
Chain 2: Iteration: 800 / 1000 [ 80%]  (Sampling)
Chain 1: Iteration: 400 / 1000 [ 40%]  (Warmup)
Chain 4: Iteration: 300 / 1000 [ 30%]  (Warmup)
Chain 3: Iteration: 500 / 1000 [ 50%]  (Warmup)
Chain 3: Iteration: 501 / 1000 [ 50%]  (Sampling)
Chain 2: Iteration: 900 / 1000 [ 90%]  (Sampling)
Chain 1: Iteration: 500 / 1000 [ 50%]  (Warmup)
Chain 1: Iteration: 501 / 1000 [ 50%]  (Sampling)
Chain 4: Iteration: 400 / 1000 [ 40%]  (Warmup)
Chain 3: Iteration: 600 / 1000 [ 60%]  (Sampling)
Chain 2: Iteration: 1000 / 1000 [100%]  (Sampling)
Chain 2: 
Chain 2:  Elapsed Time: 0.034 seconds (Warm-up)
Chain 2:                0.022 seconds (Sampling)
Chain 2:                0.056 seconds (Total)
Chain 2: 
Chain 1: Iteration: 600 / 1000 [ 60%]  (Sampling)
Chain 3: Iteration: 700 / 1000 [ 70%]  (Sampling)
Chain 4: Iteration: 500 / 1000 [ 50%]  (Warmup)
Chain 4: Iteration: 501 / 1000 [ 50%]  (Sampling)
Chain 1: Iteration: 700 / 1000 [ 70%]  (Sampling)
Chain 3: Iteration: 800 / 1000 [ 80%]  (Sampling)
Chain 4: Iteration: 600 / 1000 [ 60%]  (Sampling)
Chain 1: Iteration: 800 / 1000 [ 80%]  (Sampling)
Chain 3: Iteration: 900 / 1000 [ 90%]  (Sampling)
Chain 4: Iteration: 700 / 1000 [ 70%]  (Sampling)
Chain 1: Iteration: 900 / 1000 [ 90%]  (Sampling)
Chain 4: Chain 3: Iteration: 1000 / 1000 [100%]  (Sampling)
Iteration: 800 / 1000 [ 80%]  (Sampling)
Chain 3: 
Chain 3:  Elapsed Time: 0.032 seconds (Warm-up)
Chain 3:                0.022 seconds (Sampling)
Chain 3:                0.054 seconds (Total)
Chain 3: 
Chain 1: Iteration: 1000 / 1000 [100%]  (Sampling)
Chain 1: 
Chain 1:  Elapsed Time: 0.038 seconds (Warm-up)
Chain 1:                0.024 seconds (Sampling)
Chain 1:                0.062 seconds (Total)Chain 4: Iteration: 900 / 1000 [ 90%]  (Sampling)

Chain 1: 
Chain 4: Iteration: 1000 / 1000 [100%]  (Sampling)
Chain 4: 
Chain 4:  Elapsed Time: 0.038 seconds (Warm-up)
Chain 4:                0.022 seconds (Sampling)
Chain 4:                0.06 seconds (Total)
Chain 4:

print(fit,probs=c(0.025, 0.5, 0.975))

Inference for Stan model: anon_model.
4 chains, each with iter=1000; warmup=500; thin=1; 
post-warmup draws per chain=500, total post-warmup draws=2000.

                 mean se_mean   sd   2.5%    50%  97.5% n_eff Rhat
mu               1.77    0.00 0.01   1.75   1.77   1.78  2038    1
sigma            0.09    0.00 0.01   0.08   0.09   0.11  1454    1
aMax_indicator   0.07    0.01 0.26   0.00   0.00   1.00  1899    1
aMin_indicator   0.30    0.01 0.46   0.00   0.00   1.00  1692    1
lp__           250.23    0.03 1.02 247.44 250.57 251.23   918    1

Samples were drawn using NUTS(diag_e) at Mon Dec 15 13:31:07 2025.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at 
convergence, Rhat=1).

mu <- as.data.frame(extract(fit,"mu")[[1]])
colnames(mu) <- c("Mean")
sigma <- extract(fit,"sigma")[[1]]
ggplot(mu, aes(x=Mean))+
  geom_histogram()+
  labs(title="Histogram of the STAN Simulated Mean Heights for Male Students", x="Mean Height (m)")

From the summary at the end of the Stan simulations the 95% credible interval is from 1.75 to 1.78 m.

This compares to the 95% confidence interval which is 1.75 and 1.79 m in SPSS (which also gave a Bayesian credible interval of 1.75 to 1.79 m).

The mean was 1.77 in SPSS and it is 1.76 from the Bayesian calculations.

Small differences are starting to appear once you get noisier data and smaller sample sizes (137 compared to 509).

References

Altman, Douglas, David Machin, Trevor Bryant, and Martin Gardner. 2013. Statistics with Confidence: Confidence Intervals and Statistical Guidelines. John Wiley & Sons. https://books.google.com/books?hl=en&lr=&id=HmnIBAAAQBAJ&oi=fnd&pg=PT12&dq=statistics+with+confidence+altman&ots=_5x0Sv8V3D&sig=Dzt-WESxiP_KFXdMAbdRpbbzsok.

Wright, Sewall. 1984. Evolution and the Genetics of Populations, Volume 1: Genetic and Biometric Foundations. Vol. 1. University of Chicago press. https://books.google.com/books?hl=en&lr=&id=4pTdTWi83ecC&oi=fnd&pg=PP7&dq=evolution+and+the+genetics+of+populations&ots=WV7VZEpbox&sig=LwrRUwUQZjP6HvtyLaQw7ac7QZ4.