Stat 136 (Bayesian Statistics)

Lesson 2.3: Bayesian Inference for the Poisson Mean

Author

Norberto E. Milla, Jr.

Published

February 11, 2024

Review: The Poisson Distribution

  • Just like the Binomial distribution, the Poisson distribution is used to model count data

  • It is used to model the number of occurrences of rare events which are occurring randomly through time (or space) at a constant rate

  • For example, the Poisson distribution could be used to model the number of accidents on a highway over a month, or the number of COVID-19 patients arriving at an ER every 1-hour interval

  • If a random variable Y has a Poisson distribution with mean or rate parameter \lambda, then the pmf of Y is P(y=y)=\frac{\lambda^y e^{-\lambda}}{y!}, y=0, 1, 2, \ldots

  • If Y \sim Poisson(\lambda) then E(Y)=V(Y)=\lambda

Bayesian Inference for the Poisson Mean Using Discrete Prior

  • Similar approach with Bayesian inference for p using discrete prior

  • We construct the Bayes box

Code
knitr::include_graphics("pic1.png")

  • For example, let Y_i be distributed as Poisson(\lambda).
    • Suppose that we believe there are only four possible values for \lambda: 1.0, 1.5, 2.0, and 2.5.
    • Further, suppose that the two middle values, 1.5 and 2.0, are twice as likely as the two end values 1.0 and 2.5.
    • The observed count is Y=2.
Code
l <- c(1.0, 1.5, 2.0, 2.5)
pr <- c(1/6, 1/3, 1/3, 1/6)
lik <- (l^2*exp(-l)/factorial(2))
pl <- pr*lik
post <- pl/sum(pl)
bbox <- as.data.frame(cbind(l,pr,lik,pl,post))
knitr::kable(bbox)
l pr lik pl post
1.0 0.1666667 0.1839397 0.0306566 0.1239620
1.5 0.3333333 0.2510214 0.0836738 0.3383404
2.0 0.3333333 0.2706706 0.0902235 0.3648246
2.5 0.1666667 0.2565156 0.0427526 0.1728729
Code
bbox %>% 
  ggplot(aes(x = l, y = post)) +
  geom_bar(stat = "identity", width = .004) +
  scale_y_continuous(expand=c(0,0), limit = c(0, 0.5)) +
  labs(x = "Lambda",
       y = "Posterior Probability")+
  theme_classic()

  • What is the mean and variance of the posterior distribution of \lambda?
Code
Mean <-  sum(l*post)
Variance <- sum(l^2*post) - (sum(l*post))^2
print(cbind(Mean, Variance))
         Mean  Variance
[1,] 1.793304 0.2090422
  • What is P(\lambda < 2)?
Code
bbox %>% 
  filter(l<2) %>% 
  select(post) %>%
  sum()
[1] 0.4623025

Bayesian Inference for the Poisson Mean Using Continuous Prior

  1. One constructs a prior expressing an opinion about the location of the rate \lambda before any data is collected

  2. One takes the sample of intervals and records the number of arrivals in each interval. From this data, one forms the likelihood, the probability of these observations expressed as a function of \lambda

  3. One uses Bayes’ rule to compute the posterior – this distribution updates the prior opinion about \lambda given the information from the data

  • In addition, one computes the predictive distribution to learn about the number of arrivals in future intervals. The posterior predictive distribution is also useful in checking the appropriateness of our model

  • Before we identify prior distributions for making Bayesian inference on \lambda, first, let us look at the likelihood function of \lambda:

L(\lambda)=\prod_{i = 1}^{n} \frac{\lambda^{y_i} e^{-\lambda}}{y_i!} \approx \lambda^{\sum_{i=1}^n y_i} e^{-n\lambda}

  • This likelihood resembles the kernel of a Gamma (\alpha,\beta) density with \alpha=\sum_{i=1}^n y_i+1 and \beta=n

  • One begins by constructing a prior density to express one’s opinion about the rate parameter \lambda

  • Since the rate is a positive continuous parameter, one needs to construct a prior density that places its support only on positive values

  • If we have no idea what the value of \lambda is prior to looking at the data, then we can consider the positive uniform prior density given by

f(\lambda)= \begin{cases} 1, \text{ if } \lambda > 0 \\\ 0,\; \text{elsewhere} \end{cases}

  • Note that this prior density is improper since its integral over all possible values is infinite

  • Using this prior density the resulting posterior density will be identical to the likelihood

  • Another possible prior for \lambda is the Jeffrey’s prior given by

f(\lambda)= \begin{cases} \frac{1}{\sqrt{\lambda}}, \text{ if } \lambda > 0 \\\ 0,\; \text{elsewhere} \end{cases}\

  • This also an improper prior, but informative since it gives more weight to small values of \lambda

  • Using this prior density, the posterior density becomes

f(\lambda|y) \approx \lambda^{\sum_{i=1}^n y_i - \frac{1}{2}} e^{-n\lambda}

  • This likelihood resembles the kernel of a Gamma (\alpha,\beta) density with \alpha=\sum_{i=1}^n y_i+\frac{1}{2} and \beta=n

  • The convenient choice (conjugate) of prior distributions for Poisson sampling is the Gamma distribution with pdf

f(\lambda|\alpha,\beta) = \begin{cases} \frac{\beta^{\alpha}}{\Gamma(\alpha)} \lambda^{\alpha-1} e^{-\beta\lambda}, \text{ if } \lambda>0 , \text{ and } \alpha,\beta>0 \\\ 0,\; \text{ elsewhere } \end{cases}

  • The Gamma density is a continuous density where the support is on positive values. It depends on two parameters, a positive shape parameter \alpha and a positive rate parameter \beta

  • The Gamma density is a flexible family of distributions that can reflect many different types of prior beliefs about the location of the parameter \lambda

  • One chooses values of the shape \alpha and the rate \beta so that the Gamma density matches one’s prior information about the location of \lambda

  • In R, the function dgamma() gives the density, pgamma() gives the distribution function, and qgamma() gives the quantile function of the Gamma distribution

  • These functions are helpful in graphing the prior and choosing values of the shape and rate parameters that match prior statements about Gamma percentiles and probabilities

  • After plugging in the prior density and the likelihood into the Bayes’ formula, it can be shown (verify!) that the posterior distribution for \lambda is

f(\lambda|y) \approx \lambda^{\sum_{i=1}^n y_i + \alpha - 1} e^{-\lambda (\beta + n)}

  • Once again the posterior resembles the kernel of the Gamma(\alpha', \beta') density, where:

\alpha'=\alpha + \sum_{i=1}^n y_i

and

\beta'=\beta + n

  • What would be the form of the Gamma (a, b) prior? That is, what should \alpha and \beta be?

  • Graph your prior to check if looks reasonably close to your prior belief.

  • Summarize your prior knowledge in terms of the mean m and standard deviation s (or variance s^2)

  • Recall that the mean and variance of Gamma(\alpha, \beta ) are \frac{\alpha}{\beta} and \frac{\alpha}{\beta^2}, respectively

  • Solve the following system of equations for \alpha and \beta:

m=\frac{\alpha}{\beta}

and

s^2=\frac{\alpha}{\beta^2}

An Example

  • The weekly number of traffic accidents on a highway has the Poisson(\lambda) distribution. Three students are going to count the number of traffic accidents for each of the next eight weeks. They are going to analyze the data in a Bayesian manner, so they each need a prior distribution

  • The number of accidents on the highway over the next 8 weeks are: 3, 2, 0, 8, 2, 4, 6, 1

  • Students A and B tried the positive uniform prior and the Jeffrey’s prior, respectively

  • Student C believes that, on average, \lambda will be close to 2.5 with standard deviation of 1

    • Solving the system of equations outlined earlier the Student C comes up with a Gamma(6.25,2.5) prior.
  • The prior distributions of the 3 students are summarized below

Student Prior Prior density
A Positive uniform 1
B Jeffrey’s \frac{1}{\sqrt{\lambda}}
C \overline{Y} = 2.5; s = 1 Gamma(6.25, 2.5)
  • Based on the data we have \sum_{i=1}^8=26, thus, the likelihoood is L(\lambda|y) \approx \lambda^{26} e^{-8\lambda}

  • Thus, the posterior distributions are:

Student Prior Prior density Posterior
A Positive uniform 1 Gamma(27, 8)
B Jeffrey’s \frac{1}{\sqrt{\lambda}} Gamma(26.5, 8)
C \overline{Y} = 2.5; s = 1 Gamma(6.25, 2.5) Gamma(32.25, 10.5)
  1. Summary statistics
Posterior Mean Median SD
Gamma(27, 8) 3.375 3.333 0.6495
Gamma(26.5, 8) 3.313 3.271 0.6435
Gamma(32.25, 10.5) 3.071 3.040 0.5408
  • The means and standard deviations were computed using the formulas \frac{\alpha^{'}}{\beta{'}} and \frac{\alpha{'}}{\beta{'}^2}, respectively

  • While, the medians were computed using the qgamma() function

Code
mdA <- qgamma(0.5,27,8)
mdB <- qgamma(0.5,26.5,8)
mdC <- qgamma(0.5,32.25,10.5)
print(cbind(mdA,mdB,mdC))
          mdA      mdB      mdC
[1,] 3.333426 3.270928 3.039742
  • We can also simulate observations from each posterior distribution using the rgamma() function and use these simulated observations to compute summary statistics
Code
g1 <- rgamma(10000,27,8)
g2 <- rgamma(10000,26.5,8)
g3 <- rgamma(10000,32.25,10.5)
sim.gamma <- as.data.frame(cbind(g1, g2, g3))
sim.gamma %>% 
  summarise_all(list(m=mean, sd=sd))
      g1_m     g2_m    g3_m     g1_sd     g2_sd     g3_sd
1 3.386137 3.301789 3.07363 0.6498671 0.6469492 0.5352202
  1. Credible intervals
  • Using the qgamma() function we can obtain credible intervals for \lambda based on its posterior distribution

  • For example, the 95% credible interval for \lambda based on Gamma(27,8) is given by

Code
ci1L <- qgamma(0.025,27,8)
ci1U <- qgamma(0.975,27,8)
print(cbind(ci1L,ci1U))
         ci1L     ci1U
[1,] 2.224146 4.762003
  • The 95% credible intervals for \lambda for each of the posterior distributions are given in the following table:
Posterior 95% Credible Interval
Gamma(27, 8) (2.224, 4.762)
Gamma(26.5, 8) (2.174, 4.688)
Gamma(32.25, 10.5) (2.104, 4.219)
  1. Test of hypothesis
  • In testing one-side hypothesis, we compute the probability of observing the event in the null hypothesis based on the posterior distribution

    • If the posterior probability of the null hypothesis is less than \alpha , then we reject the null hypothesis at the \alpha level of significance
  • In testing two-sided hypothesis, we construct the (1-\alpha)% credible interval

    • If the value of \lambda under the null hypothesis lies outside the credible interval, reject H_0; else, we do not reject the null hypothesis and conclude \lambda remains a credible value
  • Suppose we want to test H_0: \lambda \leq 3 versus H_1: \lambda > 3

Code
p1 <- pgamma(3,27,8)
p2 <- pgamma(3,26.5,8)
p3 <- pgamma(3,32.25,10.5)
print(cbind(p1,p2,p3))
            p1        p2        p3
[1,] 0.2961806 0.3312435 0.4704052
Posterior P(\lambda \leq 3)
Gamma(27, 8) 0.2962
Gamma(26.5, 8) 0.3312
Gamma(32.25, 10.5) 0.4704
  • In all cases, H_0 is not rejected.
  1. Prediction
  • Suppose we wanted to predict the number of car accidents

  • Simulate \lambda’s from the posterior distribution and use these as input to the Poisson model and simulate Poisson distributed observations (predictive distribution)

  • Compute the mean of the predictive distribution

  • Suppose we predict an observation from the posterior distribution of Student C

Code
lambda <- rgamma(10000,32.25,10.5)
pois <- rpois(10000,lambda)
print(mean(pois))
[1] 3.0587
  • We expect about 3 accidents