in progress: updates still being made
N O T E
This markdown (will be) a map for generative modeling [currently, completed tabs are denoted with an emoji]. The goal is to use the same 6 steps listed under F L O W on all distributions for fluidity’s sake. I used data provided in Spring 2026 EEB622 by Dr. T Caughlin under the Binomial & Poisson tabs. I provided data for the Negative Binomial flow. All other data was taken from Analysis of Biological Data site, and links to data used are provided under each tab.
F L O W
Load Data & define biological question
Define Variables
plot it, decide x & y, determine what the intercept represents
Set Priors
plausible worlds, plausible parameters
Fit Bayesian Model GLM: brm()
Posterior Predictions/Draws:
posterior_epred()
Ecological Inference translate it
A decision about likelihood can be made by asking: WHAT TYPE OF VARIABLE IS Y?
🧭 C H A R T
Gaussian → Y is continuous, ranges from −∞ to +∞ | e.g. temperature, height, measurement error | link: identity
Bernoulli → Y is binary (0 or 1) | e.g. presence/absence, alive/dead, detected/not detected | link: logit
Binomial → Y is discrete proportion (can't have half an egg) | e.g. hatched/laid, survival/total | link: logit
Poisson → Y is discrete counts of events (0, 1, 2, 3...) | e.g. fish counts, insects trapped | link: log
Negative Binomial → Y is discrete counts where variance > mean, many 0's, big values | e.g. phytoplankton counts | link: log
Gamma → Y is continuous but ONLY positive (> 0), right-skewed | e.g. biomass, productivity, time | link: log
L I B R A R I E S
brms, ggplot2, dplyr,
tidyr, gt, tidybayes
D A T A
bluetits.csv.
These data can be downloaded by clicking the link and is called
Figure 17.5-4 left Cap color on the Analysis of
Biological Data site. The scientific paper for these data can be
found here.
C H E C K L I S T
Load Data
Define Variables
Set Priors
Fit Bayesian Model: brm()
Posterior Predictions: posterior_epred()
Ecological Inference
Q U E S T I O N
??
bluetits <- read.csv("bluetits.csv")
D A T A
guppy.csv.
These data can be downloaded by clicking the link and is called
Figure 17.9-1 Guppy temperature mortality on the
Analysis of Biological Data site. The scientific paper for
these data can be found here.
C H E C K L I S T
Load Data
Define Variables
Set Priors
Fit Bayesian Model: brm()
Posterior Predictions: posterior_epred()
Ecological Inference
Q U E S T I O N
??
guppy <- read.csv("guppy.csv")
D A T A
seeds.csv. These data can be downloaded on our Slack
Channel and belongs to Dr. T Caughlin, whose work on
these data can be found here.
C H E C K L I S T
Load Data
Define Variables
Set Priors
Fit Bayesian Model: brm()
Posterior Predictions: posterior_epred()
Ecological Inference
Q U E S T I O N
??
seeds <- read.csv("seeds.csv")
D A T A
Anableps_anableps.csv. These data can be downloaded on
our Slack Channel and belongs to Dr. T Caughlin, whose work on these
data can be found here.
C H E C K L I S T
Load Data
Define Variables
Set Priors
Fit Bayesian Model: brm()
Posterior Predictions: posterior_epred()
Ecological Inference
Q U E S T I O N
How does restoration treatment impact four-eyed fish
abundance?
anableps <- read.csv("anableps_anableps.csv")
#give the data a look
anableps_dt <- gt(anableps)|> #everything below this point is just for aesthetics. gt(dataframe) is enough to create a table of your data
opt_table_font(google_font("Caveat"))|>
opt_table_outline(style="solid")|>
tab_header(
title = "Anableps Data")|>
tab_options(table.align='left')
opt_interactive(anableps_dt) #makes your datatable interactive
Check out data types:
str(anableps) #this shows that our x Treatment data are "chr: characters" and we want them to be "factors with 3 levels" so...this will be delt with in step 2.
## 'data.frame': 27 obs. of 4 variables:
## $ Treatment : chr "Degraded" "Degraded" "Degraded" "Degraded" ...
## $ Site.ID : chr "Degraded1" "Degraded2" "Degraded3" "Degraded4" ...
## $ Species : chr "Anableps anableps" "Anableps anableps" "Anableps anableps" "Anableps anableps" ...
## $ TotalCatch: int 28 0 0 0 0 0 8 7 2 6 ...
Variables were already defined by Dr. Caughlin’s question. In this
step, I have cleaned up the data and set the reference condition to
Intact (instead of Active Restoration, the
default).
Response: TotalCatch (count data)
Predictor: Treatment (multiple categories)
# ensure categorical predictor
anableps$Treatment <- as.factor(anableps$Treatment) #here, as.factor forces the chr data into factors which was previously a character
# set ecological reference condition
anableps$Treatment <- relevel(anableps$Treatment, ref = "Intact")
Purpose
Define the biological comparison:
Intercept = expected abundance in intact
habitat
Slopes = differences relative to intact
habitat
Plot
ggplot(anableps, aes(Treatment, TotalCatch, fill=Treatment))+
geom_boxplot()+
labs(title="Fish catch between treatments", x="", y="Total Fish Catch")+
theme(legend.position="none")+ #I take off the redundant legend
scale_fill_manual(values=c("#00B9BE", "#E665E6", "#78A900"))
#intercept mean
mean(anableps$TotalCatch)
## [1] 27.55556
priors <- c(prior(normal(log(27), 1), class = "Intercept"), prior(normal(0, 1), class = "b")) #log 27 was chosen by looking at the mean of catch using the above code
Interpretation
This represents knowledge before seeing data.
anableps_model <- brm(
TotalCatch ~ Treatment,
family = poisson(), #you can also write "poisson" here instead of poisson()
data = anableps,
prior = priors, #priors were explicitly created in last step
chains = 2, #rest of code is just to get the model to run faster for homework quickness
iter = 1000,
warmup = 500,
backend = "cmdstanr",
refresh = 0)
## Start sampling
## Running MCMC with 2 sequential chains...
##
## Chain 1 finished in 0.0 seconds.
## Chain 2 finished in 0.0 seconds.
##
## Both chains finished successfully.
## Mean chain execution time: 0.0 seconds.
## Total execution time: 0.3 seconds.
## Loading required namespace: rstan
This step performs Bayesian updating
\[ \color{#00B9BE}{Posterior} \propto \color{#CA75FD}{Likelihood} * \color{#78A900}{Prior} \]
Output = distribution of plausible parameter values.
epred)posterior_draws <- as.data.frame(anableps_model)
## Loading required namespace: rstan
head(posterior_draws)
## b_Intercept b_TreatmentDegraded b_TreatmentIntact Intercept lprior
## 1 4.235709 -2.639014 -2.031395 2.678906 -8.492599
## 2 4.197850 -2.553858 -1.912751 2.708980 -8.019421
## 3 4.167447 -2.270784 -1.941889 2.763223 -7.362352
## 4 4.175955 -2.581140 -2.094368 2.617452 -8.511248
## 5 4.278789 -2.762184 -2.020183 2.684667 -8.798981
## 6 4.273027 -2.427569 -1.970966 2.806848 -7.765270
## lp__
## 1 -566.2531
## 2 -566.5310
## 3 -568.4309
## 4 -567.5790
## 5 -567.6665
## 6 -567.9873
Each row represents one plausible ecological reality consistent with observed data.
newdata <- data.frame(Treatment = c("Intact", "Degraded", "Active Restoration"))
#the above code just sets up an empty dataframe with better column names than v1, v2, v3
ep <- posterior_epred(anableps_model, newdata = newdata)
#posterior_epred draws parameters from the Expected Value of the Posterior Predictive Distribution
head(ep)
## [,1] [,2] [,3]
## [1,] 9.064032 4.936691 69.11069
## [2,] 9.826652 5.175787 66.54308
## [3,] 9.258650 6.663620 64.55046
## [4,] 8.017182 4.927417 65.10196
## [5,] 9.569740 4.556728 72.15302
## [6,] 9.994754 6.330997 71.73844
posterior_epred() converts parameter uncertainty into
predicted fish abundance for each treatment.
Rows = posterior draws
Columns = treatments
PLOTS
ep_df <- as.data.frame(ep)
colnames(ep_df) <- c("Intact", "Degraded", "Active Restoration")
ep_df |>
pivot_longer(everything(), names_to = "Treatment", values_to = "lambda") |>
ggplot(aes(lambda, fill = Treatment)) +
geom_density() +
labs(title = "Posterior Predicted Fish Abundance",x = "Expected Fish Count",y = "Density") +
scale_fill_manual(values=c("#E665E6", "#78A900", "#00B9BE"))+
theme_minimal()
# Posterior expected predictions on response scale
# (epred = expected mean, not noisy simulated observations)
ep_draws <- add_epred_draws(anableps_model, newdata = newdata, ndraws = 1000)
ggplot() +
geom_jitter(data = anableps, aes(Treatment, TotalCatch, color=Treatment),width = 0.1) +
stat_pointinterval(data= ep_draws, aes(Treatment, .epred), point_interval = mean_qi, .width = 0.89) +
labs(title = "Fish abundance by treatment", subtitle = "Observed data and posterior mean of the expected abundance (black dot)",x = "", y = "Expected fish abundance") +
scale_color_manual(values=c("#00B9BE", "#E665E6", "#78A900"))+
theme_minimal()
Difference Between Treatments
rest_vs_deg <- ep[,3] - ep[,2] #take column 3 from ep data and subtract column 2 from ep data from each row
mean(rest_vs_deg)
## [1] 63.03343
PLOT
diff <- data.frame(rest_vs_deg)
data.frame(diff = rest_vs_deg) |>
ggplot(aes(diff)) +
geom_density(fill="#F0726A") +
geom_vline(xintercept=0, linetype="dashed") +
labs(title="Posterior distribution of difference",x="Restored − Degraded (fish)", y="Density") +
theme_minimal()
Across the posterior distribution, restored sites are expected to have about 63 more fish than degraded sites, on average.
quantile(rest_vs_deg, c(.055, .945))
## 5.5% 94.5%
## 58.53613 67.48714
With 89% (McElreath’s fav number) of the credible interval laying between 58-67 fish.
mean(rest_vs_deg > 0)
## [1] 1
#what if you want to know the difference between degraded and control (aka Intact)?
#deg_vs_cont <- ep[,2]- ep[,1] these columns follow the setup made earlier: newdata <- data.frame(Treatment = c("Intact"[,1], "Degraded[,2]", "Active Restoration[,3]"))
Probability that restoration increases fish abundance \(\approx\) 100%. The direction is positive.
Interpretation
Multiplicative Effect (Ratio)
rest_vs_deg_ratio <- ep[,3] / ep[,2]
mean(rest_vs_deg_ratio)
## [1] 13.06362
quantile(rest_vs_deg_ratio, c(.055, .945))
## 5.5% 94.5%
## 10.18799 16.60354
mean(rest_vs_deg_ratio > 1)
## [1] 1
Interpretation
This gives the expected proportional change in fish abundance between restoration and degraded habitats.
D A T A
halosphaera.csv.
These data can be downloaded by clicking the following link and were
collected via phytoplankton tows of Bellingham Bay, WA during the winter
months of 2024. More on this unfinished project can be found here.
C H E C K L I S T
Load Data
Define Variables
Set Priors
Fit Bayesian Model: brm()
Posterior Predictions: posterior_epred()
Ecological Inference
Q U E S T I O N
??
halosphaera <- read.csv("halosphaera.csv")
D A T A
pondplants.csv.
These data can be downloaded by clicking the link and is called
Figure 17.8-2 Plant species and productivity on the
Analysis of Biological Data site. The scientific paper for
these data can be found here.
C H E C K L I S T
Load Data
Define Variables
Set Priors
Fit Bayesian Model: brm()
Posterior Predictions: posterior_epred()
Ecological Inference
Q U E S T I O N
??
pondplants <- read.csv("pondplants.csv")