Code
::include_graphics("pic1.png") knitr
Lesson 2.3: Bayesian Inference for the Poisson Mean
Just like the Binomial distribution, the Poisson distribution is used to model count data
It is used to model the number of occurrences of rare events which are occurring randomly through time (or space) at a constant rate
For example, the Poisson distribution could be used to model the number of accidents on a highway over a month, or the number of COVID-19 patients arriving at an ER every 1-hour interval
If a random variable Y has a Poisson distribution with mean or rate parameter \lambda, then the pmf of Y is P(y=y)=\frac{\lambda^y e^{-\lambda}}{y!}, y=0, 1, 2, \ldots
If Y \sim Poisson(\lambda) then E(Y)=V(Y)=\lambda
Similar approach with Bayesian inference for p using discrete prior
We construct the Bayes box
::include_graphics("pic1.png") knitr
<- c(1.0, 1.5, 2.0, 2.5)
l <- c(1/6, 1/3, 1/3, 1/6)
pr <- (l^2*exp(-l)/factorial(2))
lik <- pr*lik
pl <- pl/sum(pl)
post <- as.data.frame(cbind(l,pr,lik,pl,post))
bbox ::kable(bbox) knitr
l | pr | lik | pl | post |
---|---|---|---|---|
1.0 | 0.1666667 | 0.1839397 | 0.0306566 | 0.1239620 |
1.5 | 0.3333333 | 0.2510214 | 0.0836738 | 0.3383404 |
2.0 | 0.3333333 | 0.2706706 | 0.0902235 | 0.3648246 |
2.5 | 0.1666667 | 0.2565156 | 0.0427526 | 0.1728729 |
%>%
bbox ggplot(aes(x = l, y = post)) +
geom_bar(stat = "identity", width = .004) +
scale_y_continuous(expand=c(0,0), limit = c(0, 0.5)) +
labs(x = "Lambda",
y = "Posterior Probability")+
theme_classic()
<- sum(l*post)
Mean <- sum(l^2*post) - (sum(l*post))^2
Variance print(cbind(Mean, Variance))
Mean Variance
[1,] 1.793304 0.2090422
%>%
bbox filter(l<2) %>%
select(post) %>%
sum()
[1] 0.4623025
One constructs a prior expressing an opinion about the location of the rate \lambda before any data is collected
One takes the sample of intervals and records the number of arrivals in each interval. From this data, one forms the likelihood, the probability of these observations expressed as a function of \lambda
One uses Bayes’ rule to compute the posterior – this distribution updates the prior opinion about \lambda given the information from the data
In addition, one computes the predictive distribution to learn about the number of arrivals in future intervals. The posterior predictive distribution is also useful in checking the appropriateness of our model
Before we identify prior distributions for making Bayesian inference on \lambda, first, let us look at the likelihood function of \lambda:
L(\lambda)=\prod_{i = 1}^{n} \frac{\lambda^{y_i} e^{-\lambda}}{y_i!} \approx \lambda^{\sum_{i=1}^n y_i} e^{-n\lambda}
This likelihood resembles the kernel of a Gamma (\alpha,\beta) density with \alpha=\sum_{i=1}^n y_i+1 and \beta=n
One begins by constructing a prior density to express one’s opinion about the rate parameter \lambda
Since the rate is a positive continuous parameter, one needs to construct a prior density that places its support only on positive values
If we have no idea what the value of \lambda is prior to looking at the data, then we can consider the positive uniform prior density given by
f(\lambda)= \begin{cases} 1, \text{ if } \lambda > 0 \\\ 0,\; \text{elsewhere} \end{cases}
Note that this prior density is improper since its integral over all possible values is infinite
Using this prior density the resulting posterior density will be identical to the likelihood
Another possible prior for \lambda is the Jeffrey’s prior given by
f(\lambda)= \begin{cases} \frac{1}{\sqrt{\lambda}}, \text{ if } \lambda > 0 \\\ 0,\; \text{elsewhere} \end{cases}\
This also an improper prior, but informative since it gives more weight to small values of \lambda
Using this prior density, the posterior density becomes
f(\lambda|y) \approx \lambda^{\sum_{i=1}^n y_i - \frac{1}{2}} e^{-n\lambda}
This likelihood resembles the kernel of a Gamma (\alpha,\beta) density with \alpha=\sum_{i=1}^n y_i+\frac{1}{2} and \beta=n
The convenient choice (conjugate) of prior distributions for Poisson sampling is the Gamma distribution with pdf
f(\lambda|\alpha,\beta) = \begin{cases} \frac{\beta^{\alpha}}{\Gamma(\alpha)} \lambda^{\alpha-1} e^{-\beta\lambda}, \text{ if } \lambda>0 , \text{ and } \alpha,\beta>0 \\\ 0,\; \text{ elsewhere } \end{cases}
The Gamma density is a continuous density where the support is on positive values. It depends on two parameters, a positive shape parameter \alpha and a positive rate parameter \beta
The Gamma density is a flexible family of distributions that can reflect many different types of prior beliefs about the location of the parameter \lambda
One chooses values of the shape \alpha and the rate \beta so that the Gamma density matches one’s prior information about the location of \lambda
In R, the function dgamma() gives the density, pgamma() gives the distribution function, and qgamma() gives the quantile function of the Gamma distribution
These functions are helpful in graphing the prior and choosing values of the shape and rate parameters that match prior statements about Gamma percentiles and probabilities
After plugging in the prior density and the likelihood into the Bayes’ formula, it can be shown (verify!) that the posterior distribution for \lambda is
f(\lambda|y) \approx \lambda^{\sum_{i=1}^n y_i + \alpha - 1} e^{-\lambda (\beta + n)}
\alpha'=\alpha + \sum_{i=1}^n y_i
and
\beta'=\beta + n
What would be the form of the Gamma (a, b) prior? That is, what should \alpha and \beta be?
Graph your prior to check if looks reasonably close to your prior belief.
Summarize your prior knowledge in terms of the mean m and standard deviation s (or variance s^2)
Recall that the mean and variance of Gamma(\alpha, \beta ) are \frac{\alpha}{\beta} and \frac{\alpha}{\beta^2}, respectively
Solve the following system of equations for \alpha and \beta:
m=\frac{\alpha}{\beta}
and
s^2=\frac{\alpha}{\beta^2}
The weekly number of traffic accidents on a highway has the Poisson(\lambda) distribution. Three students are going to count the number of traffic accidents for each of the next eight weeks. They are going to analyze the data in a Bayesian manner, so they each need a prior distribution
The number of accidents on the highway over the next 8 weeks are: 3, 2, 0, 8, 2, 4, 6, 1
Students A and B tried the positive uniform prior and the Jeffrey’s prior, respectively
Student C believes that, on average, \lambda will be close to 2.5 with standard deviation of 1
The prior distributions of the 3 students are summarized below
Student | Prior | Prior density |
---|---|---|
A | Positive uniform | 1 |
B | Jeffrey’s | \frac{1}{\sqrt{\lambda}} |
C | \overline{Y} = 2.5; s = 1 | Gamma(6.25, 2.5) |
Based on the data we have \sum_{i=1}^8=26, thus, the likelihoood is L(\lambda|y) \approx \lambda^{26} e^{-8\lambda}
Thus, the posterior distributions are:
Student | Prior | Prior density | Posterior |
---|---|---|---|
A | Positive uniform | 1 | Gamma(27, 8) |
B | Jeffrey’s | \frac{1}{\sqrt{\lambda}} | Gamma(26.5, 8) |
C | \overline{Y} = 2.5; s = 1 | Gamma(6.25, 2.5) | Gamma(32.25, 10.5) |
Posterior | Mean | Median | SD |
---|---|---|---|
Gamma(27, 8) | 3.375 | 3.333 | 0.6495 |
Gamma(26.5, 8) | 3.313 | 3.271 | 0.6435 |
Gamma(32.25, 10.5) | 3.071 | 3.040 | 0.5408 |
The means and standard deviations were computed using the formulas \frac{\alpha^{'}}{\beta{'}} and \frac{\alpha{'}}{\beta{'}^2}, respectively
While, the medians were computed using the qgamma() function
<- qgamma(0.5,27,8)
mdA <- qgamma(0.5,26.5,8)
mdB <- qgamma(0.5,32.25,10.5)
mdC print(cbind(mdA,mdB,mdC))
mdA mdB mdC
[1,] 3.333426 3.270928 3.039742
<- rgamma(10000,27,8)
g1 <- rgamma(10000,26.5,8)
g2 <- rgamma(10000,32.25,10.5)
g3 <- as.data.frame(cbind(g1, g2, g3))
sim.gamma %>%
sim.gamma summarise_all(list(m=mean, sd=sd))
g1_m g2_m g3_m g1_sd g2_sd g3_sd
1 3.386137 3.301789 3.07363 0.6498671 0.6469492 0.5352202
Using the qgamma() function we can obtain credible intervals for \lambda based on its posterior distribution
For example, the 95% credible interval for \lambda based on Gamma(27,8) is given by
<- qgamma(0.025,27,8)
ci1L <- qgamma(0.975,27,8)
ci1U print(cbind(ci1L,ci1U))
ci1L ci1U
[1,] 2.224146 4.762003
Posterior | 95% Credible Interval |
---|---|
Gamma(27, 8) | (2.224, 4.762) |
Gamma(26.5, 8) | (2.174, 4.688) |
Gamma(32.25, 10.5) | (2.104, 4.219) |
In testing one-side hypothesis, we compute the probability of observing the event in the null hypothesis based on the posterior distribution
In testing two-sided hypothesis, we construct the (1-\alpha)% credible interval
Suppose we want to test H_0: \lambda \leq 3 versus H_1: \lambda > 3
<- pgamma(3,27,8)
p1 <- pgamma(3,26.5,8)
p2 <- pgamma(3,32.25,10.5)
p3 print(cbind(p1,p2,p3))
p1 p2 p3
[1,] 0.2961806 0.3312435 0.4704052
Posterior | P(\lambda \leq 3) |
---|---|
Gamma(27, 8) | 0.2962 |
Gamma(26.5, 8) | 0.3312 |
Gamma(32.25, 10.5) | 0.4704 |
Suppose we wanted to predict the number of car accidents
Simulate \lambda’s from the posterior distribution and use these as input to the Poisson model and simulate Poisson distributed observations (predictive distribution)
Compute the mean of the predictive distribution
Suppose we predict an observation from the posterior distribution of Student C
<- rgamma(10000,32.25,10.5)
lambda <- rpois(10000,lambda)
pois print(mean(pois))
[1] 3.0587