Homework 4

1 Data Visualized

Choose a dataset (could be the binomial dataset you used grid search for, could be your own data–up to you) and use brms to run a Bayesian GLM on your data. For (b) and (c), I suggest experimenting with the Tidy Bayes Package.

Dataset

Using data I collected from Bellingham Bay, WA in December of 2023. Each sample is a single phytoplankton tow that was analyzed under a microscope and the number of Halosphaera sp. cells were recorded. These cells are ~150-300 \(\mu\)m in size and they are called phycomates in this life stage. Below you can see phycomates collected and you can even make out the rosettes within the cell. These rosettes actually burst out of the phycomates as tons of tiny flagellated motile cells. Like the phycomate is a womb for the next generation. I imaged the photos below using confocal microscopy and a basic EVOS Digital Inverted XL Core AMEX1200 microscope. The sample you are looking at are cells collected on 2/10/24. The sampling site was at Taylor Dock in Bham, see second photo.

Image 1 Image 2

halo <- read.csv("halo.season2.csv")


#gunna clean it up because there is just too much data here. Keeping Cell counts, water temp, salinity, and date. ONLY DOING THIS TO CREATE TABLE

Counts <- halo$cell_count_A
WaterTemp <- halo$water_temp_c
Salinity <- halo$salinity_PSU
Date <- halo$date
Wind <- halo$windspeedanddirection.mph
Notes <- halo$picking_notes

#put it back together 
halodata <- data.frame(cbind(Date, Counts, WaterTemp, Salinity, Wind, Notes))

head(halodata) |>
  gt() |>
  opt_stylize(style=5, color="cyan") |>
  tab_caption(md("**Table 1:** Cell count data for Halosphaera sp. collected at Taylor Dock in Bellingham Bay, WA. Only header shown, 121 total samples in the dataset.")) |>
  tab_options(
    table.font.size = 12)
Table 1: Cell count data for Halosphaera sp. collected at Taylor Dock in Bellingham Bay, WA. Only header shown, 121 total samples in the dataset.
Date Counts WaterTemp Salinity Wind Notes
12/1/2023 0 NA NA S6 lots of critters! No halo, took a few images on the inverted scope
12/2/2023 0 8.4969 27.21819 S5 lots of critters! No halo
12/3/2023 0 8.70334 30.53366 S20 critters, no halo
12/4/2023 0 7.95347 28.13971 N7 critters, diatoms, coscinodiscus, and, peridinium (see video). No halo
12/5/2023 0 9.16134 31.07493 S3 critters, diatoms, copepods, No halo
12/6/2023 0 8.48911 17.34768 Calm Not a whole lot in the water, one interesting unknown "phytoplankton" erupted on video.
halo$Counts <- as.integer(trimws(halo$cell_count_A))
halo$WTemp  <- as.numeric(trimws(halo$water_temp_c))

#drop NAs
halo <- subset(halo, !is.na(Counts) & !is.na(WTemp))
Plot it
plot(halo$cell_count_A ~ halo$water_temp_c, col="purple", main="", xlab="Water Temperature (C)", ylab="Phycomate Counts")

Fit the Poisson

Poisson GLM uses a log link.

halo.model <- brm(Counts ~ WTemp, family = poisson, data = halo, silent = T, refresh=0)
Coefficient summaries (fixef)
fixef(halo.model)
##             Estimate  Est.Error      Q2.5       Q97.5
## Intercept  2.4243897 0.33238702  1.778362  3.06225411
## WTemp     -0.1354473 0.04131707 -0.214774 -0.05383266

Intercept = expected log cell count when WTemp = 0

Slope = change in log expected count per 1°C

exp(fixef(halo.model)["WTemp","Estimate"])
## [1] 0.8733252
Extract Posterior Draw
post <- as.data.frame(halo.model)

a <- post$b_Intercept
b <- post$b_WTemp
Probability of Direction

What’s P(slope > 0)?

mean(b > 0)
## [1] 0.00025

0.125% of posterior draws for the slope are positive.

quantile <- quantile(b, c(0.025, 0.975))
Plot the Posterior Slope Distribution
hist(b, breaks = 80,
     main = "Posterior for temperature effect (b_WTemp)",
     xlab = "Slope on log scale")

  abline(v = 0, lty = 2, col = "red", lwd = 2)

b <- as.data.frame(b)

posterior.plot <- ggplot(b, aes(b))+
  geom_histogram(fill="#C178F7", color="black")+
  labs(
     title = "Posterior for temperature effect (b_WTemp)",
     x = "Slope on log scale")+
  theme_minimal()
Conditional Effects
a <- conditional_effects(halo.model)

For every 1°C increase in water temperature, expected Halosphaera cell counts are multiplied by 0.873 as calculated using the fixef function above.

1-0.873
## [1] 0.127

Cell counts decrease by about 12.7% per degree Celsius.

a distribution

Justify your choice of distribution.

I modeled Halosphaera cell counts using a Poisson distribution because the response variable represents non-negative integer count data. In class, we discussed that the Poisson likelihood is appropriate for modeling counts where the expected value is linked to predictors using a log link function. Because my response variable consists of integer cell counts, the Poisson distribution was a natural choice for this analysis.

b posterior plot

Produce a plot that shows draws from the posterior. Could be a histogram, could be a density plot.

posterior.plot
## `stat_bin()` using `bins = 30`. Pick better value `binwidth`.

c conditional_effects

Produce a plot that shows the relationship between the response and predictor variable.

a
## Ignoring unknown labels:
## • fill : "NA"
## • colour : "NA"
## Ignoring unknown labels:
## • fill : "NA"
## • colour : "NA"

d interpret

Write a sentence for each of the following: effect size, probability of direction, and 95% (or an interval of your choosing) CI.

Effect size: The estimated multiplicative effect of water temperature on Halosphaera cell counts was 0.873, indicating that for each 1°C increase in water temperature, expected cell counts decreased by approximately 12%

Probability of direction: Posterior draws indicated a 99.8% probability that the temperature effect was negative.

mean(b < 0)
## [1] 0.99975

95% CI: The 95% credible interval for the temperature slope ranged from −0.21 to −0.06.

quantile
##        2.5%       97.5% 
## -0.21477397 -0.05383266

2 Final Project

Please fill out this sheet.

Okay will do!