STA6349 - Applied Bayesian Analysis

Project 1
July 17, 2025
Tuesday

Sridevi Autoor, Alexander Hatley, Alexander Malekan
Advisor: Dr. Samantha Seals

Introduction

Accurately gauging racers ability to participate enables street racing crews to schedule events while reducing risks such as car troubles, law enforcement heat, or conflicts with rival crew activity.

Tracking racer tendencies allows crews to optimize resource allocation— whether for car repairs or other necessary expenditures.

Given prior belief, and the data, the goal of the project is

  • to model the number of races a given racer, Roman, participates per week in the street racing

  • to compare Roman with other racers.

Modeling Approach

The Gamma-Poisson model suits this scenario, where the random variable is Roman’s race entries per week.

  1. Roman entered 18 races over 4 weeks which can be modeled with a Poisson likelihood with the count parameter \lambda as 18.

  2. Since \lambda varies across racers, we use a Gamma prior which is a conjugate to Poisson to capture different activity levels (low/moderate/high).

Prior Distribution

To avoid strong prior assumptions, we selected a weakly informative Gamma(2,1) prior for our Gamma-Poisson model.

  • A Gamma(1,1) prior with a mean of 1 race/week: Due to a high PDF(probability density function) at \lambda=0, this implies a strong prior assumption of low activity and the uncertainty of only 1 race per week, which is very low. It is also an exponential function and has the highest probability density at 0, where it is plausible no racers show up.

  • A Gamma (2,1) prior with a mean of 2 races/week:

    • does not imply as strong an assumption.
    • more uncertainty of 2 races/week than the Gamma(1,1).
    • has highest probability density at 1, so 1 racer shows up to race
  • Thus, we start with a conservative guess of 2 races/week and are open to updating.

Prior Graph

Slide 4: Data Distribution

For the data distribution we chose Poisson (\lambda) likelihood with the assumptions:

  • We are modeling for a countable rate, \lambda

  • Races are independent of each other

  • Roman’s racing rate is constant over time

Slide 5: Posterior Distribution

  • Posterior: Gamma(\alpha +y ,\beta+n )= Gamma(20,5)
summarize_gamma_poisson(shape=2, rate=1, sum_y=18, n=4)
  • Mean (\mu = s/r) = 4, this implies an average of 4 races/week which is more than prior (2 races/week) and closer to the data (4.5 races/week)

  • Variance (\sigma^2=s/r^2 ) = 0.8, lower variance than prior (\sigma^2= 1.414) which implies more certainty.

  • The posterior distribution is shifted toward the likelihood data distribution.

Posterior Graph

plot_gamma_poisson(shape=2, rate=1, sum_y=18, n=4)+ ggtitle("Prior, Likelihood and Posterior Distributions")

Slide 6: Credible Interval

# 95% credible interval
ci <- qgamma(c(0.025, 0.975), 20, 5)
ci
[1] 2.443304 5.934171
  • There is a 95% probability that Roman’s true average races lie between 2.44 and 5.94 races/week.

  • The average number of races are 4 for the posterior distribution.

  • Roman racing ranges from moderate to high frequency.

Slide 7: Prior vs. Posterior Probabilities

Hypotheses:

H_0: \lambda \le 4 (Avg. number of races is less than or equal to 4)

H_1: \lambda \gt 4 (Avg. number of races is greater than 4)

Hypotheses Prior Probability Posterior Probability
H_0: \lambda \le 4 P[H_0] = 0.9084 P[H_0 \mid Y=14] = 0.5297
H_1: \lambda > 4 P[H_1] = 0.0916 P[H_1 \mid Y=14] = 0.4703
  • The posterior probability that Roman’s true race rate exceeds 4 races/week is 47%

  • The posterior probability is a lot higher than our prior belief which was around 9.2 % probability.

  • However, the posterior probability is less than 50% showing a week evidence to classify Roman as a high_frequency racer. Tej might classify him as a moderate racer as the probability is close to 50%.

Slide 8: Bayes Factor

  • Bayes Factor
    • When we are comparing two competing hypotheses, H_0 vs. H_1, the Bayes Factor is an odds ratio for H_1:

\text{Bayes Factor} = \frac{\text{posterior odds}}{\text{prior odds}} = \frac{P\left[H_1 | Y\right] / P\left[H_0 | Y\right]}{P\left[H_1\right] / P\left[H_0\right]}

\begin{equation*} \begin{aligned} \text{prior odds} &= \frac{P\left[ H_1 \right]}{P\left[ H_0 \right]} \\ &= \frac{0.0916}{0.9084} \\ &\approx {0.1008} \end{aligned} \qquad % <-- Adds space here \begin{aligned} \text{posterior odds} &= \frac{P\left[ H_1 |\ Y \right]}{P\left[ H_0 |\ Y \right]} \\ &= \frac{0.4703}{0.5297} \\ &\approx {0.8877} \end{aligned} \end{equation*}

Slide 8b : Bayes Factor

Hypotheses Prior Probability Posterior Probability
H_0: \lambda \le 4 P[H_0] = 0.9084 P[H_0 \mid Y=14] = 0.5297
H_1: \lambda > 4 P[H_1] = 0.0916 P[H_1 \mid Y=14] = 0.4703

\begin{align*} \text{Bayes Factor} &= \frac{\text{posterior odds}}{\text{prior odds}} = \frac{P\left[H_1 | Y\right] / P\left[H_0 | Y\right]}{P\left[H_1\right] / P\left[H_0\right]} \\ &= \frac{0.8877}{0.1008} \\ &\approx 8.8057 \end{align*}

Slide 8c : Bayes Factor

prior_odds <- pgamma(4, shape = 2, rate = 1, lower.tail = FALSE)/ pgamma(4, shape = 2, rate = 1)
post_odds <-  pgamma(4, shape = 20, rate = 5, lower.tail = FALSE)/ pgamma(4, shape = 20, rate = 5)
(BF <- post_odds/prior_odds)
[1] 8.805742
  • The Bayes Factor is greater than 1 which shows the probability that Roman’s true race rate exceeds 4 races/week increased in the light of observed data.

  • The greater the Bayes Factor, the more convincing the evidence for alternate hypothesis, H_1.

  • Bayes Factor is approximately 8.81, this tells that the data is 8.8 times more supportive of the alternative hypothesis than the null hypothesis.

Slide 9: Sensitivity Analysis

ggarrange(plot_gamma(2, 1) + theme_bw() + ggtitle("Gamma(2, 1)"),
          plot_gamma(4, 1) + theme_bw() + ggtitle("Gamma(4, 1)"),
          plot_gamma(7, 1) + theme_bw() + ggtitle("Gamma(7, 1)"),
          ncol = 3)

Slide 10: Conclusion

  • Given that Roman’s participation is 18 races over 4 weeks, we estimated with a Gamma-poisson distribution that his average races are about 4 per week(16 over 4 weeks).

  • The 95\% credible interval that Roman’s true race rate exceeds 4 races/week is (2.44, 5.93) races/week, which shows that Roman can be considered as a moderate to high-frequency racer.

  • The hypothesis tests shows that only 47\% posterior probability that Roman’s true race rate exceeds 4 races/week.

  • The Bayes Factor is 8.8 which means the data does favor H_1 over H_0 by a factor of 8.8, this doesn’t necessarily mean H_1 is “likely” - just that the alternate hypothesis more likely than null.

  • We have a fair evidence, but the evidence is not strong enough to classify Roman as a high_frequency racer.

Slide 11: Limitations

  • Despite the data favoring H_1, the posterior probability of H_1 is still below 50\%. This may be due to prior odds \left(\frac{P[H_1]}{P[H_0]} = \frac{1}{10}\right) strongly favoring H_0.

  • Without too much prior knowledge for our assumption, it is hard to classify racers as low or high frequency racers, so Tej may not want to assign Roman to high-stakes missions.

Slide 12: Takeaway Message

  • With a bit more prior knowledge for the data it would yield more accurate results giving a clearer conclusion question of interest.

  • However, it is still an important model to use when trying to model for count data under Bayesian. In Situations where low knowledge of the prior data for the topic of interest this model is very useful.