Anableps

The whole bit


California Academy of Sciences
California Academy of Sciences


Question: How does restoration treatment impact four-eyed fish abundance?

Treatment: Predictor (categorical, nominal)

TotalCatch: Response (counts of fishies)

Tutorial from ajkurz. Chatgpt used to help setup priors and get the brm() to run quicker using if statement.

look

anableps <- read.csv("anableps_anableps.csv")

#peer in


ggplot(anableps, aes(Treatment, TotalCatch, colour = Treatment))+
  geom_boxplot()+
  theme(legend.position = "none")+
  scale_color_manual(values=c("#00BBA4", "#F0726A", "#E665E6"))

#lookk at how many factors there are

anableps %>%
  distinct(Treatment)
##            Treatment
## 1           Degraded
## 2             Intact
## 3 Active Restoration
#make treatment a factor

anableps$Treatment <- as.factor(anableps$Treatment)

#Decide the reference

#here it picks degraded automatically so you can leave it, or you can set the control to the intercept. 

anableps$Treatment <- relevel(anableps$Treatment, ref = "Intact")

model fit

  • Intercept = log mean of Intact

  • b_Degraded = log difference between Degraded and Intact

  • b_Restoration = log difference between Restoration and Intact

The reference category (indicator variable) captured by the intercept is Intact.

How to choose priors? look at the summary of your response variable (TotalCatch in this example)!

summary(anableps$TotalCatch)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    0.00    1.50    7.00   27.56   22.00  285.00

The mean catch is the \(log(30)\). So, \(log(30)\approx 3\), so the intercept prior should be centered near 3 (not .6 like the example in the book).

Now ask yourself:

How much could treatments realistically change the total catch?

Would you expect a 10x difference? naw. probs not. Maybe a 2-3x difference. That sounds reasonable. So the prior normal(0, 1) is reasonable. If differences were larger, maybe use normal(0, 2). Priors are “encoded domain knowledge about plausible effect sizes. You decide them by asking yourself the following questions:

  1. What is a realistic baseline?

  2. How big could treatment effects reasonable be?

  3. On what scale? (log for Poisson)

Below is an if, then statement code to get your model to save and rerun to go quicker with knitting.

if (!file.exists("analbleps_m_aj.rds")) {
  analbleps_m_aj <- brm(
    TotalCatch ~ Treatment,
    family = poisson(),
    data = anableps,
    prior = c(
      prior(normal(log(27), 1), class = Intercept),
      prior(normal(0, 1), class = b)
    ),
    chains = 4, iter = 2000, warmup = 500, cores = 4,
    seed = 5,
    backend = "cmdstanr"
  )
  saveRDS(analbleps_m_aj, "analbleps_m_aj.rds")
} else {
  analbleps_m_aj <- readRDS("analbleps_m_aj.rds")
}

summary

summary(analbleps_m_aj)
## Loading required namespace: rstan
##  Family: poisson 
##   Links: mu = log 
## Formula: TotalCatch ~ Treatment 
##    Data: anableps (Number of observations: 27) 
##   Draws: 4 chains, each with iter = 2000; warmup = 500; thin = 1;
##          total post-warmup draws = 6000
## 
## Regression Coefficients:
##                            Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS
## Intercept                      2.20      0.11     1.97     2.42 1.00     2601
## TreatmentActiveRestoration     2.03      0.12     1.80     2.27 1.00     2684
## TreatmentDegraded             -0.58      0.18    -0.95    -0.23 1.00     2805
##                            Tail_ESS
## Intercept                      2662
## TreatmentActiveRestoration     2784
## TreatmentDegraded              2713
## 
## Draws were sampled using sample(hmc). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).

posterior samples

postsamp <- as.data.frame(analbleps_m_aj)

interpretable values

To quickly get the group means, use fitted(). The fitted function is giving us the Estimate (posterior mean), estimated error, lower CI and upper CI. SO SIMPLE!

#create new dataframe to get a smaller return for the fitted() function. Otherwise we would get more output like so: 
head(fitted(analbleps_m_aj))
##      Estimate Est.Error     Q2.5    Q97.5
## [1,]  5.10441 0.7184843 3.784368 6.581666
## [2,]  5.10441 0.7184843 3.784368 6.581666
## [3,]  5.10441 0.7184843 3.784368 6.581666
## [4,]  5.10441 0.7184843 3.784368 6.581666
## [5,]  5.10441 0.7184843 3.784368 6.581666
## [6,]  5.10441 0.7184843 3.784368 6.581666
newdata <- data.frame(
  Treatment = c("Intact", "Degraded", "Active Restoration")
)

fitted(analbleps_m_aj, newdata = newdata)
##       Estimate Est.Error      Q2.5     Q97.5
## [1,]  9.060297 1.0119679  7.194351 11.196481
## [2,]  5.104410 0.7184843  3.784368  6.581666
## [3,] 68.605206 2.7918967 63.330492 74.214744
#notice how easier this is to use! 

An even better way to do this is to use the function posterior_epred().

It returns all draws so you can compute:

  • Differences

  • Ratios

  • % Change

  • Custom Intervals

ep <- posterior_epred(analbleps_m_aj, newdata = newdata)

head(ep)
##           [,1]     [,2]     [,3]
## [1,]  9.177026 5.873200 68.85227
## [2,]  8.065949 5.298744 69.05887
## [3,]  9.528251 4.183947 68.96141
## [4,]  9.081947 4.750226 64.12892
## [5,] 10.709795 4.347704 72.11748
## [6,] 10.542013 4.424712 68.43265
#convert to df for easy plots

ep <- as.data.frame(ep)

plot posteriors!

#tidy up!

ep %>%
  pivot_longer(everything(), names_to = "Treatment", values_to = "lambda") %>%
  ggplot(aes(lambda, fill=Treatment))+
  labs(title="Full Posterior Distributions")+
  geom_density(alpha=0.4)#density plots with some transparency

summarize means

Use the following code to get the mean catch, credible intervals, and directly interpretative on count scale, WOW.

ep %>%
  pivot_longer(everything(), names_to = "Treatment", values_to = "lambda")%>%
  group_by(Treatment) %>% #necessary to put this for mean_qi to work
  mean_qi(.width = .89) #.89 is McElreaths and is arbitrary as far as I can tell
## # A tibble: 3 × 7
##   Treatment lambda .lower .upper .width .point .interval
##   <chr>      <dbl>  <dbl>  <dbl>  <dbl> <chr>  <chr>    
## 1 V1          9.06   7.52  10.7    0.89 mean   qi       
## 2 V2          5.10   4.01   6.31   0.89 mean   qi       
## 3 V3         68.6   64.3   73.2    0.89 mean   qi

make it ecological

Does restoration make a big difference? or naw?

Aka: Does restoration improve damaged habitat?

Step 1: compute the difference using ep column names (V1: Intact, V2: Degraded, V3: Active Restoration).

The units are fish. The mean difference calculated below rest_vs_deg is 63.5!

This means that Active Restoration Sites yield about 64 more fish on average than Degraded Sites. That’s huge!

#compute ecologically meaningful quantities

#restoration vs degraded differece

rest_vs_deg <- ep[, "V3"] - ep[, "V2"]

mean(rest_vs_deg)
## [1] 63.5008
quantile(rest_vs_deg, c(.055, .945))
##     5.5%    94.5% 
## 59.05380 68.23694
mean(rest_vs_deg > 0)
## [1] 1
Or get the ratio

The rest_vs_deg_ratio below is about 14.

This means that Restoration yields about 14x more fish than degraded sites.

rest_vs_deg_ratio <- ep[, "V3"] / ep[, "V2"]

mean(rest_vs_deg_ratio)
## [1] 13.71455
quantile(rest_vs_deg_ratio, c(.055, .945))
##     5.5%    94.5% 
## 10.78993 17.24672
mean(rest_vs_deg_ratio > 1)
## [1] 1
#to find percent change: 

(mean(rest_vs_deg_ratio) - 1)*100
## [1] 1271.455
colMeans(ep)
##        V1        V2        V3 
##  9.060297  5.104410 68.605206

That’s about a 1300% increase in restoration sites from degraded sites. WHOA. HUGE. This is likely because of the following:

With only 27 observations, a Poisson model can produce large multiplicative effects when:

  • one group has very high counts

  • variance is large

  • counts are skewed

  • one treatment has a few big values

Homework 5

1. Why were raw posterior draws used without exponentiating?

Because probability of direction depends only on whether the coefficient is greater or less than zero. The exponential transformation does not change the sign of the effect, so using posterior draws on the log scale is sufficient to determine whether the treatment increases or decreases fish counts.

2. Why does the restored treatment calculation only include the intercept?

Because the restored treatment is the reference category in the model. In treatment coding, the intercept represents the expected log mean count for the reference group, so no additional treatment coefficient is required.

3. What are the units of the “effect” variable?

After exponentiation, both values are on the response scale, so the effect represents the difference in expected fish abundance. The units are number of fish (counts).

4. How would you calculate proportional (multiplicative) change from b_TreatmentDegraded alone?

By exponentiating the coefficient:

#proportionalchange <- exp(postsamp$b_TreatmentDegraded)

#head(proportionalchange)

This gives the multiplicative change in fish counts relative to the reference treatment.

5. Why was a beta prior applied to the intercept in the intercept-only model?

Because in an intercept-only binomial model (y ~ 1), the intercept directly represents the population proportion being estimated. A beta distribution is appropriate since proportions are bounded between 0 and 1, allowing prior knowledge about plausible values to be incorporated.