anableps.knit

Anableps

The whole bit

California Academy of Sciences

Question: How does restoration treatment impact four-eyed fish abundance?

Treatment: Predictor (categorical, nominal)

TotalCatch: Response (counts of fishies)

Tutorial from ajkurz. Chatgpt used to help setup priors and get the brm() to run quicker using if statement.

links

Discussed in Statistical Rethinking on pages: 104, 107, 313, 316; and categorical variables on pages: 124, 153

look

anableps <- read.csv("anableps_anableps.csv")

#peer in


ggplot(anableps, aes(Treatment, TotalCatch, colour = Treatment))+
  geom_boxplot()+
  theme(legend.position = "none")+
  scale_color_manual(values=c("#00BBA4", "#F0726A", "#E665E6"))

#lookk at how many factors there are

anableps %>%
  distinct(Treatment)

##            Treatment
## 1           Degraded
## 2             Intact
## 3 Active Restoration

#make treatment a factor

anableps$Treatment <- as.factor(anableps$Treatment)

#Decide the reference

#here it picks degraded automatically so you can leave it, or you can set the control to the intercept. 

anableps$Treatment <- relevel(anableps$Treatment, ref = "Intact")

model fit

Intercept = log mean of Intact
b_Degraded = log difference between Degraded and Intact
b_Restoration = log difference between Restoration and Intact

The reference category (indicator variable) captured by the intercept is Intact.

How to choose priors? look at the summary of your response variable (TotalCatch in this example)!

summary(anableps$TotalCatch)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    0.00    1.50    7.00   27.56   22.00  285.00

The mean catch is the \(log(30)\). So, \(log(30)\approx 3\), so the intercept prior should be centered near 3 (not .6 like the example in the book).

Now ask yourself:

How much could treatments realistically change the total catch?

Would you expect a 10x difference? naw. probs not. Maybe a 2-3x difference. That sounds reasonable. So the prior normal(0, 1) is reasonable. If differences were larger, maybe use normal(0, 2). Priors are “encoded domain knowledge about plausible effect sizes. You decide them by asking yourself the following questions:

What is a realistic baseline?
How big could treatment effects reasonable be?
On what scale? (log for Poisson)

Below is an if, then statement code to get your model to save and rerun to go quicker with knitting.

if (!file.exists("analbleps_m_aj.rds")) {
  analbleps_m_aj <- brm(
    TotalCatch ~ Treatment,
    family = poisson(),
    data = anableps,
    prior = c(
      prior(normal(log(27), 1), class = Intercept),
      prior(normal(0, 1), class = b)
    ),
    chains = 4, iter = 2000, warmup = 500, cores = 4,
    seed = 5,
    backend = "cmdstanr"
  )
  saveRDS(analbleps_m_aj, "analbleps_m_aj.rds")
} else {
  analbleps_m_aj <- readRDS("analbleps_m_aj.rds")
}

summary

summary(analbleps_m_aj)

## Loading required namespace: rstan

##  Family: poisson 
##   Links: mu = log 
## Formula: TotalCatch ~ Treatment 
##    Data: anableps (Number of observations: 27) 
##   Draws: 4 chains, each with iter = 2000; warmup = 500; thin = 1;
##          total post-warmup draws = 6000
## 
## Regression Coefficients:
##                            Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS
## Intercept                      2.20      0.11     1.97     2.42 1.00     2601
## TreatmentActiveRestoration     2.03      0.12     1.80     2.27 1.00     2684
## TreatmentDegraded             -0.58      0.18    -0.95    -0.23 1.00     2805
##                            Tail_ESS
## Intercept                      2662
## TreatmentActiveRestoration     2784
## TreatmentDegraded              2713
## 
## Draws were sampled using sample(hmc). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).

posterior samples

postsamp <- as.data.frame(analbleps_m_aj)

interpretable values

To quickly get the group means, use fitted(). The fitted function is giving us the Estimate (posterior mean), estimated error, lower CI and upper CI. SO SIMPLE!

#create new dataframe to get a smaller return for the fitted() function. Otherwise we would get more output like so: 
head(fitted(analbleps_m_aj))

##      Estimate Est.Error     Q2.5    Q97.5
## [1,]  5.10441 0.7184843 3.784368 6.581666
## [2,]  5.10441 0.7184843 3.784368 6.581666
## [3,]  5.10441 0.7184843 3.784368 6.581666
## [4,]  5.10441 0.7184843 3.784368 6.581666
## [5,]  5.10441 0.7184843 3.784368 6.581666
## [6,]  5.10441 0.7184843 3.784368 6.581666

newdata <- data.frame(
  Treatment = c("Intact", "Degraded", "Active Restoration")
)

fitted(analbleps_m_aj, newdata = newdata)

##       Estimate Est.Error      Q2.5     Q97.5
## [1,]  9.060297 1.0119679  7.194351 11.196481
## [2,]  5.104410 0.7184843  3.784368  6.581666
## [3,] 68.605206 2.7918967 63.330492 74.214744

#notice how easier this is to use!

An even better way to do this is to use the function posterior_epred().

It returns all draws so you can compute:

Differences
Ratios
% Change
Custom Intervals

ep <- posterior_epred(analbleps_m_aj, newdata = newdata)

head(ep)

##           [,1]     [,2]     [,3]
## [1,]  9.177026 5.873200 68.85227
## [2,]  8.065949 5.298744 69.05887
## [3,]  9.528251 4.183947 68.96141
## [4,]  9.081947 4.750226 64.12892
## [5,] 10.709795 4.347704 72.11748
## [6,] 10.542013 4.424712 68.43265

#convert to df for easy plots

ep <- as.data.frame(ep)

plot posteriors!

#tidy up!

ep %>%
  pivot_longer(everything(), names_to = "Treatment", values_to = "lambda") %>%
  ggplot(aes(lambda, fill=Treatment))+
  labs(title="Full Posterior Distributions")+
  geom_density(alpha=0.4)#density plots with some transparency

summarize means

Use the following code to get the mean catch, credible intervals, and directly interpretative on count scale, WOW.

ep %>%
  pivot_longer(everything(), names_to = "Treatment", values_to = "lambda")%>%
  group_by(Treatment) %>% #necessary to put this for mean_qi to work
  mean_qi(.width = .89) #.89 is McElreaths and is arbitrary as far as I can tell

## # A tibble: 3 × 7
##   Treatment lambda .lower .upper .width .point .interval
##   <chr>      <dbl>  <dbl>  <dbl>  <dbl> <chr>  <chr>    
## 1 V1          9.06   7.52  10.7    0.89 mean   qi       
## 2 V2          5.10   4.01   6.31   0.89 mean   qi       
## 3 V3         68.6   64.3   73.2    0.89 mean   qi

make it ecological

Does restoration make a big difference? or naw?

Aka: Does restoration improve damaged habitat?

Step 1: compute the difference using `ep` column names (V1: Intact, V2: Degraded, V3: Active Restoration).

The units are fish. The mean difference calculated below rest_vs_deg is 63.5!

This means that Active Restoration Sites yield about 64 more fish on average than Degraded Sites. That’s huge!

#compute ecologically meaningful quantities

#restoration vs degraded differece

rest_vs_deg <- ep[, "V3"] - ep[, "V2"]

mean(rest_vs_deg)

## [1] 63.5008

quantile(rest_vs_deg, c(.055, .945))

##     5.5%    94.5% 
## 59.05380 68.23694

mean(rest_vs_deg > 0)

## [1] 1

Or get the ratio

The rest_vs_deg_ratio below is about 14.

This means that Restoration yields about 14x more fish than degraded sites.

rest_vs_deg_ratio <- ep[, "V3"] / ep[, "V2"]

mean(rest_vs_deg_ratio)

## [1] 13.71455

quantile(rest_vs_deg_ratio, c(.055, .945))

##     5.5%    94.5% 
## 10.78993 17.24672

mean(rest_vs_deg_ratio > 1)

## [1] 1

#to find percent change: 

(mean(rest_vs_deg_ratio) - 1)*100

## [1] 1271.455

colMeans(ep)

##        V1        V2        V3 
##  9.060297  5.104410 68.605206

That’s about a 1300% increase in restoration sites from degraded sites. WHOA. HUGE. This is likely because of the following:

With only 27 observations, a Poisson model can produce large multiplicative effects when:

one group has very high counts
variance is large
counts are skewed
one treatment has a few big values

Homework 5

1. Why were raw posterior draws used without exponentiating?

Because probability of direction depends only on whether the coefficient is greater or less than zero. The exponential transformation does not change the sign of the effect, so using posterior draws on the log scale is sufficient to determine whether the treatment increases or decreases fish counts.

2. Why does the restored treatment calculation only include the intercept?

Because the restored treatment is the reference category in the model. In treatment coding, the intercept represents the expected log mean count for the reference group, so no additional treatment coefficient is required.

3. What are the units of the “effect” variable?

After exponentiation, both values are on the response scale, so the effect represents the difference in expected fish abundance. The units are number of fish (counts).

4. How would you calculate proportional (multiplicative) change from b_TreatmentDegraded alone?

By exponentiating the coefficient:

#proportionalchange <- exp(postsamp$b_TreatmentDegraded)

#head(proportionalchange)

This gives the multiplicative change in fish counts relative to the reference treatment.

5. Why was a beta prior applied to the intercept in the intercept-only model?

Because in an intercept-only binomial model (y ~ 1), the intercept directly represents the population proportion being estimated. A beta distribution is appropriate since proportions are bounded between 0 and 1, allowing prior knowledge about plausible values to be incorporated.