Project 1
July 17, 2025
Tuesday
Accurately gauging racers ability to participate enables street racing crews to schedule events while reducing risks such as car troubles, law enforcement heat, or conflicts with rival crew activity.
Tracking racer tendencies allows crews to optimize resource allocation— whether for car repairs or other necessary expenditures.
Given prior belief, and the data, the goal of the project is
to model the number of races a given racer, Roman, participates per week in the street racing
to compare Roman with other racers.
The Gamma-Poisson model suits this scenario, where the random variable is Roman’s race entries per week.
Roman entered 18 races over 4 weeks which can be modeled with a Poisson likelihood with the count parameter \lambda as 18.
Since \lambda varies across racers, we use a Gamma prior which is a conjugate to Poisson to capture different activity levels (low/moderate/high).
To avoid strong prior assumptions, we selected a weakly informative Gamma(2,1) prior for our Gamma-Poisson model.
A Gamma(1,1) prior with a mean of 1 race/week: Due to a high PDF(probability density function) at \lambda=0, this implies a strong prior assumption of low activity and the uncertainty of only 1 race per week, which is very low. It is also an exponential function and has the highest probability density at 0, where it is plausible no racers show up.
A Gamma (2,1) prior with a mean of 2 races/week:
Thus, we start with a conservative guess of 2 races/week and are open to updating.
For the data distribution we chose Poisson (\lambda) likelihood with the assumptions:
We are modeling for a countable rate, \lambda
Races are independent of each other
Roman’s racing rate is constant over time
Mean (\mu = s/r) = 4, this implies an average of 4 races/week which is more than prior (2 races/week) and closer to the data (4.5 races/week)
Variance (\sigma^2=s/r^2 ) = 0.8, lower variance than prior (\sigma^2= 1.414) which implies more certainty.
The posterior distribution is shifted toward the likelihood data distribution.
There is a 95% probability that Roman’s true average races lie between 2.44 and 5.94 races/week.
The average number of races are 4 for the posterior distribution.
Roman racing ranges from moderate to high frequency.
Hypotheses:
H_0: \lambda \le 4 (Avg. number of races is less than or equal to 4)
H_1: \lambda \gt 4 (Avg. number of races is greater than 4)
| Hypotheses | Prior Probability | Posterior Probability | ||
|---|---|---|---|---|
| H_0: \lambda \le 4 | P[H_0] = 0.9084 | P[H_0 \mid Y=14] = 0.5297 | ||
| H_1: \lambda > 4 | P[H_1] = 0.0916 | P[H_1 \mid Y=14] = 0.4703 |
The posterior probability that Roman’s true race rate exceeds 4 races/week is 47%
The posterior probability is a lot higher than our prior belief which was around 9.2 % probability.
However, the posterior probability is less than 50% showing a week evidence to classify Roman as a high_frequency racer. Tej might classify him as a moderate racer as the probability is close to 50%.
\text{Bayes Factor} = \frac{\text{posterior odds}}{\text{prior odds}} = \frac{P\left[H_1 | Y\right] / P\left[H_0 | Y\right]}{P\left[H_1\right] / P\left[H_0\right]}
\begin{equation*} \begin{aligned} \text{prior odds} &= \frac{P\left[ H_1 \right]}{P\left[ H_0 \right]} \\ &= \frac{0.0916}{0.9084} \\ &\approx {0.1008} \end{aligned} \qquad % <-- Adds space here \begin{aligned} \text{posterior odds} &= \frac{P\left[ H_1 |\ Y \right]}{P\left[ H_0 |\ Y \right]} \\ &= \frac{0.4703}{0.5297} \\ &\approx {0.8877} \end{aligned} \end{equation*}
| Hypotheses | Prior Probability | Posterior Probability | ||
|---|---|---|---|---|
| H_0: \lambda \le 4 | P[H_0] = 0.9084 | P[H_0 \mid Y=14] = 0.5297 | ||
| H_1: \lambda > 4 | P[H_1] = 0.0916 | P[H_1 \mid Y=14] = 0.4703 |
\begin{align*} \text{Bayes Factor} &= \frac{\text{posterior odds}}{\text{prior odds}} = \frac{P\left[H_1 | Y\right] / P\left[H_0 | Y\right]}{P\left[H_1\right] / P\left[H_0\right]} \\ &= \frac{0.8877}{0.1008} \\ &\approx 8.8057 \end{align*}
prior_odds <- pgamma(4, shape = 2, rate = 1, lower.tail = FALSE)/ pgamma(4, shape = 2, rate = 1)
post_odds <- pgamma(4, shape = 20, rate = 5, lower.tail = FALSE)/ pgamma(4, shape = 20, rate = 5)
(BF <- post_odds/prior_odds)[1] 8.805742
The Bayes Factor is greater than 1 which shows the probability that Roman’s true race rate exceeds 4 races/week increased in the light of observed data.
The greater the Bayes Factor, the more convincing the evidence for alternate hypothesis, H_1.
Bayes Factor is approximately 8.81, this tells that the data is 8.8 times more supportive of the alternative hypothesis than the null hypothesis.
Given that Roman’s participation is 18 races over 4 weeks, we estimated with a Gamma-poisson distribution that his average races are about 4 per week(16 over 4 weeks).
The 95\% credible interval that Roman’s true race rate exceeds 4 races/week is (2.44, 5.93) races/week, which shows that Roman can be considered as a moderate to high-frequency racer.
The hypothesis tests shows that only 47\% posterior probability that Roman’s true race rate exceeds 4 races/week.
The Bayes Factor is 8.8 which means the data does favor H_1 over H_0 by a factor of 8.8, this doesn’t necessarily mean H_1 is “likely” - just that the alternate hypothesis more likely than null.
We have a fair evidence, but the evidence is not strong enough to classify Roman as a high_frequency racer.
Despite the data favoring H_1, the posterior probability of H_1 is still below 50\%. This may be due to prior odds \left(\frac{P[H_1]}{P[H_0]} = \frac{1}{10}\right) strongly favoring H_0.
Without too much prior knowledge for our assumption, it is hard to classify racers as low or high frequency racers, so Tej may not want to assign Roman to high-stakes missions.
With a bit more prior knowledge for the data it would yield more accurate results giving a clearer conclusion question of interest.
However, it is still an important model to use when trying to model for count data under Bayesian. In Situations where low knowledge of the prior data for the topic of interest this model is very useful.
STA6349 - Applied Bayesian Analysis Project 1 - Summer 2025