Lecture10_Batsandfigs

Models

brms, categorical predictor variables, and the poisson distribution

This guide was created by Trevor Caughlin for EEB622 in Spring 2026.

Seeds

How does canopy temperature relate to plant height?

thermal <- read.csv("thermal_data.csv")

1| Plot it

plot(thermal$plant_height, thermal$temperature, xlab="Plant Height (m)", ylab=expression("Temperature ("*degree*C*")"), col="#9EA000")

Figure 1a: Data showing temperature of plants predicted by plant height. You can see a general trend that as plants increase in height, temperature of the plant decreases.

hist(thermal$temperature, col="pink", border="white", xlab=expression("Continous Temperature Data  ("*degree*C*")"), main="Hist. response, assume Gaussian")

Figure 1b: Distribution of response variable to determine the model family. Here we are assuming normality so we will use Gaussian (the default distribution for the brm function.

We are assuming the histogram of the above response variable (Temperature) is normal.

temp_model <- brm(temperature~plant_height, data=thermal, silent=T, refresh=0)

## Running "C:/PROGRA~1/R/R-45~1.2/bin/x64/Rcmd.exe" SHLIB foo.c
## using C compiler: 'gcc.exe (GCC) 14.3.0'
## gcc  -I"C:/PROGRA~1/R/R-45~1.2/include" -DNDEBUG   -I"C:/Users/jessi/AppData/Local/R/win-library/4.5/Rcpp/include/"  -I"C:/Users/jessi/AppData/Local/R/win-library/4.5/RcppEigen/include/"  -I"C:/Users/jessi/AppData/Local/R/win-library/4.5/RcppEigen/include/unsupported"  -I"C:/Users/jessi/AppData/Local/R/win-library/4.5/BH/include" -I"C:/Users/jessi/AppData/Local/R/win-library/4.5/StanHeaders/include/src/"  -I"C:/Users/jessi/AppData/Local/R/win-library/4.5/StanHeaders/include/"  -I"C:/Users/jessi/AppData/Local/R/win-library/4.5/RcppParallel/include/" -DRCPP_PARALLEL_USE_TBB=1 -I"C:/Users/jessi/AppData/Local/R/win-library/4.5/rstan/include" -DEIGEN_NO_DEBUG  -DBOOST_DISABLE_ASSERTS  -DBOOST_PENDING_INTEGER_LOG2_HPP  -DSTAN_THREADS  -DUSE_STANC3 -DSTRICT_R_HEADERS  -DBOOST_PHOENIX_NO_VARIADIC_EXPRESSION  -D_HAS_AUTO_PTR_ETC=0  -include "C:/Users/jessi/AppData/Local/R/win-library/4.5/StanHeaders/include/stan/math/prim/fun/Eigen.hpp"  -std=c++1y    -I"C:/rtools45/x86_64-w64-mingw32.static.posix/include"      -O2 -Wall -std=gnu2x  -mfpmath=sse -msse2 -mstackrealign   -c foo.c -o foo.o
## cc1.exe: warning: command-line option '-std=c++14' is valid for C++/ObjC++ but not for C
## In file included from C:/Users/jessi/AppData/Local/R/win-library/4.5/RcppEigen/include/Eigen/Core:19,
##                  from C:/Users/jessi/AppData/Local/R/win-library/4.5/RcppEigen/include/Eigen/Dense:1,
##                  from C:/Users/jessi/AppData/Local/R/win-library/4.5/StanHeaders/include/stan/math/prim/fun/Eigen.hpp:22,
##                  from <command-line>:
## C:/Users/jessi/AppData/Local/R/win-library/4.5/RcppEigen/include/Eigen/src/Core/util/Macros.h:679:10: fatal error: cmath: No such file or directory
##   679 | #include <cmath>
##       |          ^~~~~~~
## compilation terminated.
## make: *** [C:/PROGRA~1/R/R-45~1.2/etc/x64/Makeconf:289: foo.o] Error 1

2| Effects & Uncertainty

Summary gives you estimates of the effect sizes as well as confidence intervals for uncertainty. fixef will do exactly the same thing!

fixef(temp_model)

##               Estimate Est.Error      Q2.5     Q97.5
## Intercept    44.184472 0.3915403 43.420134 44.937630
## plant_height -8.169058 0.8138395 -9.745344 -6.588254

summary(temp_model)

##  Family: gaussian 
##   Links: mu = identity 
## Formula: temperature ~ plant_height 
##    Data: thermal (Number of observations: 207) 
##   Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
##          total post-warmup draws = 4000
## 
## Regression Coefficients:
##              Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept       44.18      0.39    43.42    44.94 1.00     4271     3144
## plant_height    -8.17      0.81    -9.75    -6.59 1.00     4337     3112
## 
## Further Distributional Parameters:
##       Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sigma     2.16      0.11     1.96     2.38 1.00     3609     2974
## 
## Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).

#put the model into a data frame so you can use it
post_thermal <- data.frame(temp_model)

Baseline temperature is 44 $^\circ$C, for every 1 unit plant growth, temperature decreases by 8 $^\circ$C.

3| Thinking about plants

There are a lot of questions we can ask/answer using the posterior distribution below:

plot(density(post_thermal$b_plant_height))
polygon(density(post_thermal$b_plant_height), col="#00B9BE", main="Plausible values for slope parameter")

Figure 2: Histogram of posterior slope estimates from the Bayesian regression of temperature on plant height. The posterior mass is entirely negative, showing near-certain evidence that taller plants are associated with lower canopy temperatures. After collecting all values of posterior slope, pulling out only those that are less than or equal to -10 and finding the mean, gives the following information: The probability that canopy temperature decreases by at least 10 °C per meter of plant height is approximately 0.9%

Here you can see that our slope value is almost CERTAINLY between -11 and -5 and does not cross 0. So we would want to ask a question about negative direction (see 2 below).

a. Direction

Is the effect positive or negative? How certain? $P (\beta<0)$ or $P (\beta>0)$

We went with negative $P (\beta>0)$ because of the posterior distribution above, all x values are in the negative.

mean(post_thermal$b_plant_height < 0)

## [1] 1

## 1

Answer: There was near-certainty (>99.9% probability) that canopy temperature decreases as plant height increases.

b. Magnitude

How likely is the effect at least [x] big?

What is the probability canopy temperature drops at least 10°C per meter of plant height?

$P (\beta < -10)$

prob_drop10 <- mean(post_thermal$b_plant_height <= -10)

prob_drop10

## [1] 0.0135

Answer: There was only about a 1.3% probability that canopy temperature decreases by at least 10 °C per meter of plant height, indicating the cooling effect is unlikely to be that strong.

c. Credible ranges

What range of parameter values is plausible? quantile(post_thermal$b_plant_height, c(0.25, 0.5, 0.975).

This is basically what fixef() summarizes.

d. Different scenarios

What is the temperature change between two plant heights? The thermal data has heights ranging from 0.897332 to 0.01694893.

We can use these to answer a question like,

What is the temperature change between plants that are 0.1 m tall and plants that are 0.8 m tall?

max(thermal$plant_height)

## [1] 0.897332

min(thermal$plant_height)

## [1] 0.01694893

h1 <- 0.10
h2 <- 0.80

changeinTemp <- post_thermal$b_plant_height * (h2-h1) #the change in heights are multiplied by each posterior slope here

quantile(changeinTemp) #let's change the % ranges

##        0%       25%       50%       75%      100% 
## -7.815580 -6.091488 -5.721048 -5.335805 -3.843027

quantile(changeinTemp, c(0.25, 0.50, 0.975))

##       25%       50%     97.5% 
## -6.091488 -5.721048 -4.611778

mean(changeinTemp < 0)

## [1] 1

Answer: Plants 0.8 m tall were predicted to be about 5.7 °C cooler than plants 0.1 m tall, with a plausible range of roughly 4.6 to 6.1 °C cooler and essentially 100% probability of cooling.

e. Predictions

You can ask questions about predictions too. Like,

What temperature do we expect at a given height (with uncertainty [x])?

h <- 0.50
T_hat <- post_thermal$b_Intercept + post_thermal$b_plant_height * h

quantile(T_hat, c(.025,.5,.975))

##     2.5%      50%    97.5% 
## 39.78876 40.09958 40.40063

Answer: A plant 0.5 m tall is predicted to have a canopy temperature of approximately 40.1 °C, with a 95% credible interval of about 39.8–40.4 °C.

There are so many useful ecological questions we can answer using these posterior samples. Posterior samples are just a huge (4000 row) data frame of plausible worlds. And once you have them, there are so many questions you can ask over and over in ANY model.

4| Final Plot

ce <- conditional_effects(temp_model,method ="posterior_predict",prob = 0.90) #here the predictive interval of 90 was requested


p <- plot(ce, points = TRUE)[[1]] #how to pull out the desired plot from the conditional_effects function

Figure 3a & b: Conditional_effects function to show the posterior mean relationship between plant height and temperature and a credible interval band (Default to the conditional_effects function). Doesn’t show the shape of uncertainty the way Fig. 5 does because it is an interval for the mean and not the full predictive interval unless specifically requested. Unfortunately this function is clunky and annoying and printed out two versions of itself with no real reason found as to why.

p + geom_point(color = "#E068EB", size = 2, alpha = 0.8)+
  labs(title="Predicted temperature with 90% CI", x="Plant Height (m)", y=expression("Temperature ("*degree*C*")")) #make it pretty

plot(temperature~plant_height, xlab="Plant Height (m)", ylab=expression("Temperature (" *degree*C*")"), pch=19, col="#00B9BE", data=thermal)

median_slope <- median(post_thermal$b_plant_height)

median_intercept<-median(post_thermal$b_Intercept)

for(i in 1:length(post_thermal$b_plant_height)){
  curve((post_thermal$b_Intercept[i]+post_thermal$b_plant_height[i]*x),add=T,lwd=0.5,col=rgb(0.45, 0.47, 0.55, 0.05))
}
curve((median_intercept+median_slope*x),add=T,col="black",lwd=2)

Figure 4: Relationship between plant height and canopy temperature. Points show observed data. Thin grey lines represent posterior regression draws from the Bayesian model, illustrating uncertainty in the estimated relationship. The black line shows the posterior mean prediction, indicating that taller plants are associated with lower temperatures. This plot is showing you where the posterior density is, which can be more useful sometimes than Figure 3. Effect size Every meter that plants grow in height, there is a decrease in temperature by 8 °C. Uncertainty The true cooling rate is very likely between roughly 6.6 and 9.7 °C per meter (95% credible interval).

Figs

figs <- read.csv("fig_visitors.csv")

plot(Bat~FIG.DBH,xlab="Diameter at Breast Height of sacred fig trees (cm)",
     ylab="Counts of bats that visited trees",pch=19,col="purple4",data=figs)

hist(figs$Bat,xlab="Counts of bats")

Model Building

# note this model will give errors or warnings because the exp of some of these values is HUGE

#Bat_model <- brm(Bat~FIG.DBH, family = "poisson", data=figs, silent=T, refresh=0)

#SO, lets add column of log DBH
figs$log_DBH <- log(figs$FIG.DBH)

Bat_model <- brm(Bat~log_DBH, family = "poisson", data=figs, silent=T, refresh=0)

## Running "C:/PROGRA~1/R/R-45~1.2/bin/x64/Rcmd.exe" SHLIB foo.c
## using C compiler: 'gcc.exe (GCC) 14.3.0'
## gcc  -I"C:/PROGRA~1/R/R-45~1.2/include" -DNDEBUG   -I"C:/Users/jessi/AppData/Local/R/win-library/4.5/Rcpp/include/"  -I"C:/Users/jessi/AppData/Local/R/win-library/4.5/RcppEigen/include/"  -I"C:/Users/jessi/AppData/Local/R/win-library/4.5/RcppEigen/include/unsupported"  -I"C:/Users/jessi/AppData/Local/R/win-library/4.5/BH/include" -I"C:/Users/jessi/AppData/Local/R/win-library/4.5/StanHeaders/include/src/"  -I"C:/Users/jessi/AppData/Local/R/win-library/4.5/StanHeaders/include/"  -I"C:/Users/jessi/AppData/Local/R/win-library/4.5/RcppParallel/include/" -DRCPP_PARALLEL_USE_TBB=1 -I"C:/Users/jessi/AppData/Local/R/win-library/4.5/rstan/include" -DEIGEN_NO_DEBUG  -DBOOST_DISABLE_ASSERTS  -DBOOST_PENDING_INTEGER_LOG2_HPP  -DSTAN_THREADS  -DUSE_STANC3 -DSTRICT_R_HEADERS  -DBOOST_PHOENIX_NO_VARIADIC_EXPRESSION  -D_HAS_AUTO_PTR_ETC=0  -include "C:/Users/jessi/AppData/Local/R/win-library/4.5/StanHeaders/include/stan/math/prim/fun/Eigen.hpp"  -std=c++1y    -I"C:/rtools45/x86_64-w64-mingw32.static.posix/include"      -O2 -Wall -std=gnu2x  -mfpmath=sse -msse2 -mstackrealign   -c foo.c -o foo.o
## cc1.exe: warning: command-line option '-std=c++14' is valid for C++/ObjC++ but not for C
## In file included from C:/Users/jessi/AppData/Local/R/win-library/4.5/RcppEigen/include/Eigen/Core:19,
##                  from C:/Users/jessi/AppData/Local/R/win-library/4.5/RcppEigen/include/Eigen/Dense:1,
##                  from C:/Users/jessi/AppData/Local/R/win-library/4.5/StanHeaders/include/stan/math/prim/fun/Eigen.hpp:22,
##                  from <command-line>:
## C:/Users/jessi/AppData/Local/R/win-library/4.5/RcppEigen/include/Eigen/src/Core/util/Macros.h:679:10: fatal error: cmath: No such file or directory
##   679 | #include <cmath>
##       |          ^~~~~~~
## compilation terminated.
## make: *** [C:/PROGRA~1/R/R-45~1.2/etc/x64/Makeconf:289: foo.o] Error 1

Summarize & `fixef`

#check out the summary

summary(Bat_model)

##  Family: poisson 
##   Links: mu = log 
## Formula: Bat ~ log_DBH 
##    Data: figs (Number of observations: 22) 
##   Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
##          total post-warmup draws = 4000
## 
## Regression Coefficients:
##           Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept    -3.28      0.34    -3.94    -2.62 1.00     1482     1860
## log_DBH       1.27      0.06     1.14     1.39 1.00     1580     1986
## 
## Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).

Notice in the summary(Bat_model) table contains the median estimates for slope and intercept. It also has the CREDIBLE INTERVALS (CI) of l-95% CI and u-95% CI. Those are going to come in handy.

Now lefts extract the effects directly:

fixef(Bat_model)

##            Estimate  Est.Error      Q2.5     Q97.5
## Intercept -3.277950 0.34269780 -3.936452 -2.615018
## log_DBH    1.265819 0.06373246  1.144000  1.388716

But these don’t contain “the real action”! The posterior samples are where it’s at.

Posterior sample

posterior_samples <- data.frame(Bat_model)

head(posterior_samples)

##   b_Intercept b_log_DBH Intercept    lprior      lp__
## 1   -3.284170  1.273675  2.820080 -1.957778 -306.8077
## 2   -3.436360  1.305167  2.818817 -1.957614 -307.3043
## 3   -3.283822  1.267373  2.790223 -1.953997 -306.5366
## 4   -3.021571  1.217529  2.813588 -1.956941 -306.8171
## 5   -3.097356  1.233123  2.812540 -1.956806 -306.6679
## 6   -3.478449  1.308692  2.793620 -1.954418 -306.9219

In the data above, each row represents one sample from the posterior distribution of the model parameters. b_Intercept and b_log_DBH are posterior samples of the intercept and the slope for log_DBH, respectively (on the model’s link scale). These are the important parts that I want you to focus on!

Less important: lprior is the log prior density for that draw. lp__ is the total log posterior density (log likelihood + log prior) for that sample, used internally by Stan for sampling diagnostics.

Note that each row here represents one plausible set of parameter values given your data and priors, and together these samples characterize the full posterior distribution of your bat model. we can start exploring these values by plotting.

Here are two different ways of doing the same thing. Note that because there are 4000 values of the posterior draws, I am making the histogram breaks larger (100 breaks, rather than the default of 10).

hist(posterior_samples$b_log_DBH,breaks=100,col="pink",xlab="plausible values of effect of log(DBH) on bat visits")

plot(density(posterior_samples$b_log_DBH))
polygon(density(posterior_samples$b_log_DBH),col="pink",main="Plausible values for slope parameter")

Choose whichever plot you want above for posterior distribution. The most important thing about these plots is that our slope value is almost CERTAINLY between 1.1 and 1.5. This conclusion is based on the histogram (or density plot) above.

Slope Values

Now, let’s explore the slope values a bit more here.

What is the probability that increasing tree size has a positive impact on bat visits?

That is, more bats visit bigger trees. Remember: Bayesian statistics is COUNTING!

Let’s count the posterior samples that are positive. The R code below is asking, how many samples are greater than 0? The length function measures how many elements are in a vector. The which statement is subsetting the vector to only those values that are >0.

posterior_slopes <- posterior_samples$b_log_DBH

mean(posterior_slopes>0)

## [1] 1

This gives us the answer to the following question:

What is the probability that the effect of tree size on bat visits is positive?

100%

There are no posterior samples for the slope that are less than 0 here. This makes biological sense too, bats like visiting bigger fruiting trees. How would we write this in a paper?

We would never want to say that we are 100% certain. That’s nutty in science. Here is a better way to phrase it:

There was near-certainty that bats were more likely to visit larger trees (P > 99.99%).

Credibility Intervals (CI)

First, quickly get the slope means for the posterior values effects. You could present these in a paper as a summary of the effect. It is up to you whether you think this is easier to understand than quantiles.

mean(posterior_samples$b_log_DBH)

## [1] 1.265819

sd(posterior_samples$b_log_DBH)

## [1] 0.06373246

And if you want to get the 80% quantiles instead of the model output values of 95%? Use this code:

quantile(posterior_samples$b_log_DBH, c(0.2, 0.8))

##      20%      80% 
## 1.209796 1.320589

In my view, it is good to think critically about what you want the quantiles to be here. If you are building a nuclear power plant, you probably want a probability of failure <0.01% (or some very small number). However, for messy ecology datasets, like counts of bats, a value of 95% seems a bit high.

Effect as a graph

We can curve out the median effect. But a single line is kind of boring, because there’s no uncertainty. This is ignoring the rich information the posterior samples give us. You can always show all the samples all at once. Here’s the code I usually use, a big clunky for loop.

library(tidybayes)
library(tibble)
library(ggplot2)
library(dplyr)

library(ggplot2)

# Create a sequence of x values for the regression line
x_seq <- seq(min(figs$log_DBH), max(figs$log_DBH), length.out = 100)

# 1. Calculate Median Line Data
median_slope <- median(posterior_samples$b_log_DBH)

median_intercept <- median(posterior_samples$b_Intercept)

median_line <- data.frame(
  log_DBH = x_seq,
  Bat = exp(median_intercept + median_slope * x_seq)
)

# 2. Calculate Individual Posterior Samples (Spaghetti Plot Data)
# Note: For large samples, subsetting (e.g., 1:100) is recommended for speed

sample_lines <- expand.grid(log_DBH = x_seq, i = 1:length(posterior_samples$b_log_DBH))


sample_lines$Bat <- exp(posterior_samples$b_Intercept[sample_lines$i] + 
                        posterior_samples$b_log_DBH[sample_lines$i] * sample_lines$log_DBH)


#create ggplot

ggplot(figs, aes(x = log_DBH, y = Bat)) +
  # Add individual posterior sample lines (spaghetti plot)
  geom_line(data = sample_lines, aes(group = i), 
            color = "#F0726A", alpha = 0.05, size = 0.1) +
  # Add raw data points
  geom_point(color = "#00B9BE", size = 2, alpha = 0.7) +
  # Add median regression line
  geom_line(data = median_line, color = "#F0726A", size = 1.5) +
  # Labels and themes
  labs(x = "log(Diameter at Breast Height of sacred fig trees (cm))",
       y = "Counts of bats that visited trees") +
  theme_minimal()