Poisson Likelihood

Suppose that a r.v. \(X\) obeys a Poisson distribution \(\mathcal{P}(\lambda)\), characterized by the following Probability Mass Function (PMF)

\[ f_{\lambda}(x) = \frac{\lambda^{x}e^{- \lambda}}{x!} \]

If we observe a sample \(x_{1},...,x_{n}\), then the likelihood is just the product of the individual PMF

\[ \begin{align*} \pi( x_{1},...,x_{n} \mid \lambda ) & =\frac{\lambda^{\sum_{i=1}^{n} x_{i}}e^{- n \lambda}}{\prod_{i=1}^{n} ( x_{i}!)} \\ & =\frac{\lambda^{n \bar{x}}e^{- n \lambda}}{\prod_{i=1}^{n} ( x_{i}!)} \\ \end{align*} \]

where the model parameter \(\lambda\) is the count of some event of interest and \(\bar{x} = \frac{1}{n} \sum_{i=1}^{n} x_{i}\).

Some examples of Poisson samples

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.2     ✔ readr     2.1.4
## ✔ forcats   1.0.0     ✔ stringr   1.5.0
## ✔ ggplot2   3.4.2     ✔ tibble    3.2.1
## ✔ lubridate 1.9.2     ✔ tidyr     1.3.0
## ✔ purrr     1.0.1     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
## <ScaleContinuous>
##  Range:  
##  Limits:    0 --    1
## <ScaleContinuous>
##  Range:  
##  Limits:    0 --    1
## 
## Attachement du package : 'gridExtra'
## 
## L'objet suivant est masqué depuis 'package:dplyr':
## 
##     combine
## Warning: `stat(count / sum(count))` was deprecated in ggplot2 3.4.0.
## ℹ Please use `after_stat(count / sum(count))` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

Gamma prior

A convenient choice to model the uncertainty about \(\lambda\) is a Gamma distribution as prior, since the support of such a distribution is the interval \([0, \infty[\). A Gamma prior with shape parameter \(\alpha\) and scale parameter \(\beta\) has the following form

\[ \pi( \lambda) = \frac{ \beta^{\alpha} }{ \Gamma(\alpha) } \lambda^{\alpha - 1} e^{- \lambda \beta } \]

where \(\Gamma(\alpha)\) is the Gamma function, defined as \(\int_{0}^{\infty}x^{\alpha - 1} e^{-x} dx = (\alpha - 1) !\).

Let \(X \sim \mathcal{G}a(\alpha, \beta)\). Then \(E[X]= \alpha/ \beta\) and \(var(X)=\alpha /\beta^2\).

It is a convenient and flexible choice since the Gamma distribution can take a wide variety of shapes.

Some examples of the Gamma family

Gamma posterior

The posterior distribution for \(\lambda\) if our data \(x_{1},...,x_{n}\) is modeled with a Poisson likelihood and a Gamma prior is chosen for \(\lambda\) will also have the functional form of a Gamma r.v. Using the Bayes theorem, we have that

\[ \begin{align*} \pi(\lambda \mid x_{1},...,x_{n}) & = \frac{\pi( x_{1},...,x_{n} \mid \lambda) \pi( \lambda) }{\pi(x_{1},...,x_{n})} \\ & =\frac{\lambda^{n \bar{x}}e^{- n \lambda}}{\prod_{i=1}^{n} ( x_{i}!)} \frac{ \beta^{\alpha} }{ \Gamma(\alpha) } \lambda^{\alpha - 1} e^{- \lambda \beta } \\ & = \underbrace{\frac{1}{\prod_{i=1}^{n} ( x_{i}!)} \frac{ \beta^{\alpha} }{ \Gamma(\alpha) }}_{\text{do NOT depend on $\lambda$}} \ \lambda^{n \bar{x}} e^{-n \lambda} \lambda^{\alpha - 1} e^{-\lambda \beta } \\ & \propto \lambda^{\alpha +n \bar{x} - 1} e^{-(n+\beta) \lambda} \\ \end{align*} \]

So we find that \(\pi(\lambda \mid x_{1},...,x_{n})\) Gamma(+ n {x}, n + )$. The posterior mean and variance are given by

\[ E[\lambda] = \frac{\alpha + n \bar{x}}{n + \beta} \ \ \ \ \ \ \ var(\lambda) = \frac{\alpha + n \bar{x}}{(n + \beta)^{2}} \]

where \(n \bar{x} = \sum_{i=1}^{n} x_{i}\) is the sum of the counts and \(n\) is the sample size. Let’s take a few examples and plot the likelihood, a possible prior and the posterior, all at once in R.

General remark on Bayesian inference

Unlike in the traditional Frequentist framework, the Bayesian approach views parameters as random variables rather than fixed, unknown quantities. Given a Poisson sample \(x_{1},...,x_{n}\) and a Poisson parameter \(\lambda\), from the Bayes theorem, we can write

\[ \pi(\lambda \mid x_{1},...,x_{n}) = \frac{\pi( x_{1},...,x_{n} \mid \lambda ) \ \pi( \lambda)}{ \pi(x_{1},...,x_{n}) } \]

Adopting the ‘proportional’ notation, the constant term in the denominator is dropped so that the above expression is rewritten as \(\pi(\lambda \mid x_{1},...,x_{n}) \propto \pi( x_{1},...,x_{n} \mid \lambda ) \ \pi( \lambda)\)

When conjugate models are used (as in the case of a Poisson-Gamma model), the posterior distribution can be identified and closed-form quantities of interest like a mean, a variance or quantiles can be computed. Most of the time in practice, the posterior distribution is intractable so that it is necessary to resort to MCMC techniques.

Working example

Suppose that we record the number of a specific bacteria present in 20 water samples taken in the Mekong Delta (Vietnam) so that we have the following data at hand:

\[ x_{i} = 1, 0, 0, 1, 0, 0, 1, 1, 0, 1, 2, 0, 0, 5, 2, 0, 0, 2, 0, 1 \]

So assuming a Poisson likelihood with parameter \(\lambda =1\), namely a \(\mathcal{P}(1)\) likelihood for the data and using a \(\mathcal{G}a(2,2)\) prior, with mean \(2/2=1\) and variance \(2/2^2 = 0.5\), what is the posterior mean and the \(95 \%\) credible intreval for the model parameter?

Working example: posterior

Posterior quantities obtained from direct sampling

## [1] 0.8636364
## [1] 0.0392562
##   2.5%    50%  97.5% 
## 0.5200 0.8486 1.2931
## [1] 0.8636725

Jeffreys’ prior: definition

Let us first give the general deffinition of the Jeffreys’ prior \(p_{J}(\theta)\):

\[ p_{J}(\theta) = \sqrt{\mathcal{I}_{n}(\theta)} \]

where \(\mathcal{I}_{n}(\theta)\) is the Fisher information of the sample. The Fisher information is defined as follows:

\[ \mathcal{I}_{n}(\theta) = -E_{\theta} \left[ \frac{ \partial^{2} ln \mathcal{L}(\theta \mid \textbf{x} )} {\partial \theta^{2}} \right] \]

or equivalently

\[ \mathcal{I}_{n}(\theta) = var_{\theta} \bigg( \frac{ \partial ln\mathcal{L}(\theta \mid \textbf{x})} {\partial \theta} \bigg) = E _{\theta}\left[ \bigg(\frac{ \partial ln\mathcal{L}(\theta \mid \textbf{x})} {\partial \theta}\bigg)^{2} \right] \]

The first derivative of the log-likelihood function with respect to the model parameter \(\frac{ \partial ln\mathcal{L}(\theta \mid \textbf{x})} {\partial \theta}\) is sometimes referred to as the score function.

Likelihood and derived functions for a Poisson model

Likelihood and log-likelihood, score function and second derivative of the log-likelihood function for a Poisson model:

\[ \begin{align*} & \mathcal{L}(\lambda \mid \textbf{x} ) = \prod_{i=1}^{n} f_{\lambda}(x_{i}) = \prod_{i=1}^{n} \frac{\lambda^{x_{i}} e^{- \lambda}}{x_{i} !} = \frac{\lambda^{ \sum_{i=1}^{n} x_{i}} e^{- n \lambda}}{ \prod_{i=1}^{n} x_{i} !}\\ & l(p \mid \textbf{x} ) = ln \big( \mathcal{L}(\lambda \mid \textbf{x} ) \big) = \sum_{i=1}^{n} x_{i} \ ln(\lambda) + - n \lambda - \sum_{i=1}^{n} log(x_{i}!) \\ & \frac{ \partial ln\mathcal{L}(\lambda \mid \textbf{x})} {\partial \lambda} = \frac{ \sum_{i=1}^{n} x_{i}}{\lambda} - n \\ & \frac{ \partial^{2} ln \mathcal{L}(\lambda \mid \textbf{x} )} {\partial \lambda^{2}}= - \frac{\sum_{i=1}^{n} x_{i}}{ \lambda^{2}} \\ \end{align*} \]

Fisher information and Jeffrey’s prior for a Poisson model

So we want

\[ \mathcal{I}_{n}(\lambda) = -E\left[ \frac{ \partial^{2} ln \mathcal{L}(\theta \mid \textbf{x} )} {\partial \lambda^{2}} \right] \]

and we note that \[ E \bigg[ \sum_{i=1}^{n}x_{i} \bigg] = E[n \overline{x}] = n E [\overline{x}] = n \lambda \]

Thus we have that

\[ \begin{align*} & \mathcal{I}_{n}(\lambda) = \frac{E \bigg[ \sum_{i=1}^{n}x_{i} \bigg] }{ \lambda^{2}} \\ & \ \ \ \ \ \ = \frac{n \lambda}{ \lambda^{2}} \\ & \ \ \ \ \ \ = \frac{n }{\lambda } \\ & \ \ \ \ \ \ \propto \lambda^{-1} \\ \end{align*} \]

And we conclude that

\[ p_{J}(\lambda) = \sqrt{\mathcal{I}(\lambda )} \propto \lambda^{-1/2} \]

which is the Gamma distribution \(\mathcal{G}a(1/2, 0)\) (improper prior)

Gamma Jeffreys prior

## [1] 0.8636364
## [1] 0.0392562
##   2.5%    50%  97.5% 
## 0.5200 0.8486 1.2931
## [1] 0.8636725
## Warning in dgamma(lambda, shape = 1/2, scale = 0): Production de NaN
## Warning: Duplicated aesthetics after name standardisation: linetype

Back to the working example

Getting back to our water samples data

\[ x_{i} = 1, 0, 0, 1, 0, 0, 1, 1, 0, 1, 2, 0, 0, 5, 2, 0, 0, 2, 0, 1 \]

Assuming a Poisson likelihood for the data and using weakly informative prior close to the Jeffreys prior, namely a \(Gamma(1/2, 1/2)\), what is the posterior mean and the \(95 \%\) credible intreval for the model parameter ?

From the Bayes Theorem, we know that

\[ p( \lambda \mid x ) = \frac{p(x \mid \theta) \ p(\theta)}{ p(x) } \propto p( x \mid \theta) \ p(\theta) \]

In our case, we have that

\[ \begin{align*} p( \lambda \mid x ) & \propto \underbrace{ \lambda^{\sum_{i=1}^{n}x_{i}} e^{-10\lambda}}_{\text{Poisson Likelihood}} \ \underbrace{ p(\lambda)}_{\text{ Prior}} \\ & \propto \lambda^{17} e^{-20\lambda} \ \lambda^{ -1/2 } e^{- 1/2 \lambda } \\ & \propto \lambda^{-16.5} \ e^{- 19.5 \lambda } \\ \end{align*} \]

and we recognize the functional form of a Gamma density, that is \(Gamma(\alpha = 17.5, \beta = 19.5)\). The posterior mean is given by \(\alpha / \beta = 0.8974\).

Working example: in conclusion

So the theoretical posterior mean is given by

\[ E[\lambda] = \frac{\alpha + n \bar{x}}{n + \beta} = \frac{2 + 20 *0.85}{20 + 2} = 19/22 = 0.8636364 \]

By direct sampling, using 108 number of simulations, the posterior sample mean is \(0.8636725\).

By direct sampling, a \(95\%\) Credible Interval is given by \[[−0.5200, 1.2931]\]

So, combining modeling and simulations, we are now able to generalize and infer to the whole population of bacteria in the Mekong Delta those values from a sample of size 20.

Poisson-Gamma Gibbs sampling: example

Example of a multi-stage Gibbs sampler for reliability analysis. We want to set up a Bayesian analysis to eventually draw from the posterior distribution of the number of failures for a given time of observation of nuclear plant pumps based on some initial data.

Keywords: Reliability, Poisson process, Gibbs sampling, Poisson-Gamma model, Empirical Bayes

Pump data

The data: number of failures and times of observation of \(10\) pumps in a nuclear plant water system (Source: Gaver and O’Muircheartaigh, 1987)

library(knitr)
library(kableExtra)
## Warning in !is.null(rmarkdown::metadata$output) && rmarkdown::metadata$output
## %in% : ‘length(x) = 2 > 1’ dans la conversion automatique vers ‘logical(1)’
## 
## Attachement du package : 'kableExtra'
## L'objet suivant est masqué depuis 'package:dplyr':
## 
##     group_rows
dataset = data.frame('Pump' = 1:10,
                     'Failures' = c(5,1,5,14,3,19,1,1,4,22),
                     'Time' = c(94.32, 15.72, 62.88, 125.76, 5.24, 31.44, 1.05, 1.05, 2.10, 10.48))
kable(dataset, col.names = c("Pump", "Failures", "Time"), escape = F, caption = "") %>%
  kable_styling(latex_options = "hold_position")
Pump Failures Time
1 5 94.32
2 1 15.72
3 5 62.88
4 14 125.76
5 3 5.24
6 19 31.44
7 1 1.05
8 1 1.05
9 4 2.10
10 22 10.48

Model and assumptions

The failure of the \(i\)th pump follow a Poisson process with parameter \(\lambda_{i}\), for \(i = 1, 2,... ,10\). For an observed time \(t_{i}\) the number of failures \(p_{i}\) is thus a Poisson \(\mathcal{P}(\lambda_{i}t_{i})\) r.v.

Likelihood of failure:

\[ y_{i} \sim \mathcal{P}(\lambda_{i}t_{i}) \]

Prior on \(\lambda_{i}\) and prior on \(\beta\):

\[ \lambda_{i} \sim \mathcal{G}a(\alpha, \beta) \]

\[ \beta \sim \mathcal{G}a(\gamma, \delta) \]

with \(\alpha = 1.8\), \(\gamma = 0.01\) and \(\delta = 1\) (see. Gaver and O’Muircheartaigh 1987 for a motivation of these numerical values)

Joint and conditional distributions

Joint posterior distribution:

\[ \begin{align*} & \pi(\lambda_{1}, ..., \lambda_{10}, \beta \mid t_{1}, ..., t_{10}, p_{1}, ..., p_{10}) \\ & \propto \prod_{i = 1}^{10} \bigg( (\lambda_{i}t_{i})^{p_{i}} e^{-\lambda_{i}t_{i}} \lambda_{i}^{\alpha - 1} e^{-\beta \lambda_{i}}\bigg) \beta^{10 \alpha} \beta^{ \gamma - 1}e^{-\delta \beta} \\ & \propto \prod_{i = 1}^{10} \bigg( (\lambda_{i})^{p_{i} + \alpha - 1} e^{- (t_{i} + \beta) \lambda_{i}} e^{-\beta \lambda_{i}}\bigg) \beta^{10 \alpha + \gamma - 1}e^{-\delta \beta} \\ \end{align*} \]

and a natural decomposition of \(\pi\) in conditional distributions is

\[ \lambda_{i} \mid \beta, t_{i}, p_{i} \sim \mathcal{G}a(p_{i} + \alpha , t_{i} + \beta) \]

\[ \beta \mid \lambda_{1}, ..., \lambda_{10} \sim \mathcal{G}a \bigg(\gamma + 10\alpha, \delta \sum_{i=1}^{10} \lambda_{i}\bigg) \]

Multistage Gibbs sampler for Poisson-Gamma model

Posterior quantities obtained from Gibbs sampling

library(knitr)
library(kableExtra)
mat1 <- matrix(c(0.6510, 2.4598, 0.6534, 0.7101), byrow = TRUE, ncol =2)
colnames(mat1) = c('lambda', 'beta')
rownames(mat1) = c('post mean', 'post sd')
df1 <- data.frame(mat1)
kable(df1, col.names = c("lambda", "beta"), escape = F, caption = "My caption") %>%
  kable_styling(latex_options = "hold_position")
My caption
lambda beta
post mean 0.6510 2.4598
post sd 0.6534 0.7101

We see that the distribution of the number of failures given time is bimodal and right skewed. We can now draw our conclusions not only based on point estimates but we have the whole distribution.

Posterior distributions

## Warning: The dot-dot notation (`..density..`) was deprecated in ggplot2 3.4.0.
## ℹ Please use `after_stat(density)` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

References and code

Donald P. Gaver & I. G. O’Muircheartaigh (1987). Robust Empirical Bayes Analyses of Event Rates, Technometrics, 29:1, 1-15, DOI: 10.1080/00401706.1987.10488178

Robert, C. P., Casella, G. (1999). Monte Carlo statistical methods (Vol. 2). New York: Springer.

The R Project for Statistical Computing: