Suppose that a r.v. \(X\) obeys a Poisson distribution \(\mathcal{P}(\lambda)\), characterized by the following Probability Mass Function (PMF)
\[ f_{\lambda}(x) = \frac{\lambda^{x}e^{- \lambda}}{x!} \]
If we observe a sample \(x_{1},...,x_{n}\), then the likelihood is just the product of the individual PMF
\[ \begin{align*} \pi( x_{1},...,x_{n} \mid \lambda ) & =\frac{\lambda^{\sum_{i=1}^{n} x_{i}}e^{- n \lambda}}{\prod_{i=1}^{n} ( x_{i}!)} \\ & =\frac{\lambda^{n \bar{x}}e^{- n \lambda}}{\prod_{i=1}^{n} ( x_{i}!)} \\ \end{align*} \]
where the model parameter \(\lambda\) is the count of some event of interest and \(\bar{x} = \frac{1}{n} \sum_{i=1}^{n} x_{i}\).
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.2 ✔ readr 2.1.4
## ✔ forcats 1.0.0 ✔ stringr 1.5.0
## ✔ ggplot2 3.4.2 ✔ tibble 3.2.1
## ✔ lubridate 1.9.2 ✔ tidyr 1.3.0
## ✔ purrr 1.0.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
## <ScaleContinuous>
## Range:
## Limits: 0 -- 1
## <ScaleContinuous>
## Range:
## Limits: 0 -- 1
##
## Attachement du package : 'gridExtra'
##
## L'objet suivant est masqué depuis 'package:dplyr':
##
## combine
## Warning: `stat(count / sum(count))` was deprecated in ggplot2 3.4.0.
## ℹ Please use `after_stat(count / sum(count))` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
A convenient choice to model the uncertainty about \(\lambda\) is a Gamma distribution as prior, since the support of such a distribution is the interval \([0, \infty[\). A Gamma prior with shape parameter \(\alpha\) and scale parameter \(\beta\) has the following form
\[ \pi( \lambda) = \frac{ \beta^{\alpha} }{ \Gamma(\alpha) } \lambda^{\alpha - 1} e^{- \lambda \beta } \]
where \(\Gamma(\alpha)\) is the Gamma function, defined as \(\int_{0}^{\infty}x^{\alpha - 1} e^{-x} dx = (\alpha - 1) !\).
Let \(X \sim \mathcal{G}a(\alpha, \beta)\). Then \(E[X]= \alpha/ \beta\) and \(var(X)=\alpha /\beta^2\).
It is a convenient and flexible choice since the Gamma distribution can take a wide variety of shapes.
The posterior distribution for \(\lambda\) if our data \(x_{1},...,x_{n}\) is modeled with a Poisson likelihood and a Gamma prior is chosen for \(\lambda\) will also have the functional form of a Gamma r.v. Using the Bayes theorem, we have that
\[ \begin{align*} \pi(\lambda \mid x_{1},...,x_{n}) & = \frac{\pi( x_{1},...,x_{n} \mid \lambda) \pi( \lambda) }{\pi(x_{1},...,x_{n})} \\ & =\frac{\lambda^{n \bar{x}}e^{- n \lambda}}{\prod_{i=1}^{n} ( x_{i}!)} \frac{ \beta^{\alpha} }{ \Gamma(\alpha) } \lambda^{\alpha - 1} e^{- \lambda \beta } \\ & = \underbrace{\frac{1}{\prod_{i=1}^{n} ( x_{i}!)} \frac{ \beta^{\alpha} }{ \Gamma(\alpha) }}_{\text{do NOT depend on $\lambda$}} \ \lambda^{n \bar{x}} e^{-n \lambda} \lambda^{\alpha - 1} e^{-\lambda \beta } \\ & \propto \lambda^{\alpha +n \bar{x} - 1} e^{-(n+\beta) \lambda} \\ \end{align*} \]
So we find that \(\pi(\lambda \mid x_{1},...,x_{n})\) Gamma(+ n {x}, n + )$. The posterior mean and variance are given by
\[ E[\lambda] = \frac{\alpha + n \bar{x}}{n + \beta} \ \ \ \ \ \ \ var(\lambda) = \frac{\alpha + n \bar{x}}{(n + \beta)^{2}} \]
where \(n \bar{x} = \sum_{i=1}^{n} x_{i}\) is the sum of the counts and \(n\) is the sample size. Let’s take a few examples and plot the likelihood, a possible prior and the posterior, all at once in R.
Unlike in the traditional Frequentist framework, the Bayesian approach views parameters as random variables rather than fixed, unknown quantities. Given a Poisson sample \(x_{1},...,x_{n}\) and a Poisson parameter \(\lambda\), from the Bayes theorem, we can write
\[ \pi(\lambda \mid x_{1},...,x_{n}) = \frac{\pi( x_{1},...,x_{n} \mid \lambda ) \ \pi( \lambda)}{ \pi(x_{1},...,x_{n}) } \]
Adopting the ‘proportional’ notation, the constant term in the denominator is dropped so that the above expression is rewritten as \(\pi(\lambda \mid x_{1},...,x_{n}) \propto \pi( x_{1},...,x_{n} \mid \lambda ) \ \pi( \lambda)\)
When conjugate models are used (as in the case of a Poisson-Gamma model), the posterior distribution can be identified and closed-form quantities of interest like a mean, a variance or quantiles can be computed. Most of the time in practice, the posterior distribution is intractable so that it is necessary to resort to MCMC techniques.
Suppose that we record the number of a specific bacteria present in 20 water samples taken in the Mekong Delta (Vietnam) so that we have the following data at hand:
\[ x_{i} = 1, 0, 0, 1, 0, 0, 1, 1, 0, 1, 2, 0, 0, 5, 2, 0, 0, 2, 0, 1 \]
So assuming a Poisson likelihood with parameter \(\lambda =1\), namely a \(\mathcal{P}(1)\) likelihood for the data and using a \(\mathcal{G}a(2,2)\) prior, with mean \(2/2=1\) and variance \(2/2^2 = 0.5\), what is the posterior mean and the \(95 \%\) credible intreval for the model parameter?
## [1] 0.8636364
## [1] 0.0392562
## 2.5% 50% 97.5%
## 0.5200 0.8486 1.2931
## [1] 0.8636725
Let us first give the general deffinition of the Jeffreys’ prior \(p_{J}(\theta)\):
\[ p_{J}(\theta) = \sqrt{\mathcal{I}_{n}(\theta)} \]
where \(\mathcal{I}_{n}(\theta)\) is the Fisher information of the sample. The Fisher information is defined as follows:
\[ \mathcal{I}_{n}(\theta) = -E_{\theta} \left[ \frac{ \partial^{2} ln \mathcal{L}(\theta \mid \textbf{x} )} {\partial \theta^{2}} \right] \]
or equivalently
\[ \mathcal{I}_{n}(\theta) = var_{\theta} \bigg( \frac{ \partial ln\mathcal{L}(\theta \mid \textbf{x})} {\partial \theta} \bigg) = E _{\theta}\left[ \bigg(\frac{ \partial ln\mathcal{L}(\theta \mid \textbf{x})} {\partial \theta}\bigg)^{2} \right] \]
The first derivative of the log-likelihood function with respect to the model parameter \(\frac{ \partial ln\mathcal{L}(\theta \mid \textbf{x})} {\partial \theta}\) is sometimes referred to as the score function.
Likelihood and log-likelihood, score function and second derivative of the log-likelihood function for a Poisson model:
\[ \begin{align*} & \mathcal{L}(\lambda \mid \textbf{x} ) = \prod_{i=1}^{n} f_{\lambda}(x_{i}) = \prod_{i=1}^{n} \frac{\lambda^{x_{i}} e^{- \lambda}}{x_{i} !} = \frac{\lambda^{ \sum_{i=1}^{n} x_{i}} e^{- n \lambda}}{ \prod_{i=1}^{n} x_{i} !}\\ & l(p \mid \textbf{x} ) = ln \big( \mathcal{L}(\lambda \mid \textbf{x} ) \big) = \sum_{i=1}^{n} x_{i} \ ln(\lambda) + - n \lambda - \sum_{i=1}^{n} log(x_{i}!) \\ & \frac{ \partial ln\mathcal{L}(\lambda \mid \textbf{x})} {\partial \lambda} = \frac{ \sum_{i=1}^{n} x_{i}}{\lambda} - n \\ & \frac{ \partial^{2} ln \mathcal{L}(\lambda \mid \textbf{x} )} {\partial \lambda^{2}}= - \frac{\sum_{i=1}^{n} x_{i}}{ \lambda^{2}} \\ \end{align*} \]
So we want
\[ \mathcal{I}_{n}(\lambda) = -E\left[ \frac{ \partial^{2} ln \mathcal{L}(\theta \mid \textbf{x} )} {\partial \lambda^{2}} \right] \]
and we note that \[ E \bigg[ \sum_{i=1}^{n}x_{i} \bigg] = E[n \overline{x}] = n E [\overline{x}] = n \lambda \]
Thus we have that
\[ \begin{align*} & \mathcal{I}_{n}(\lambda) = \frac{E \bigg[ \sum_{i=1}^{n}x_{i} \bigg] }{ \lambda^{2}} \\ & \ \ \ \ \ \ = \frac{n \lambda}{ \lambda^{2}} \\ & \ \ \ \ \ \ = \frac{n }{\lambda } \\ & \ \ \ \ \ \ \propto \lambda^{-1} \\ \end{align*} \]
And we conclude that
\[ p_{J}(\lambda) = \sqrt{\mathcal{I}(\lambda )} \propto \lambda^{-1/2} \]
which is the Gamma distribution \(\mathcal{G}a(1/2, 0)\) (improper prior)
## [1] 0.8636364
## [1] 0.0392562
## 2.5% 50% 97.5%
## 0.5200 0.8486 1.2931
## [1] 0.8636725
## Warning in dgamma(lambda, shape = 1/2, scale = 0): Production de NaN
## Warning: Duplicated aesthetics after name standardisation: linetype
Getting back to our water samples data
\[ x_{i} = 1, 0, 0, 1, 0, 0, 1, 1, 0, 1, 2, 0, 0, 5, 2, 0, 0, 2, 0, 1 \]
Assuming a Poisson likelihood for the data and using weakly informative prior close to the Jeffreys prior, namely a \(Gamma(1/2, 1/2)\), what is the posterior mean and the \(95 \%\) credible intreval for the model parameter ?
From the Bayes Theorem, we know that
\[ p( \lambda \mid x ) = \frac{p(x \mid \theta) \ p(\theta)}{ p(x) } \propto p( x \mid \theta) \ p(\theta) \]
In our case, we have that
\[ \begin{align*} p( \lambda \mid x ) & \propto \underbrace{ \lambda^{\sum_{i=1}^{n}x_{i}} e^{-10\lambda}}_{\text{Poisson Likelihood}} \ \underbrace{ p(\lambda)}_{\text{ Prior}} \\ & \propto \lambda^{17} e^{-20\lambda} \ \lambda^{ -1/2 } e^{- 1/2 \lambda } \\ & \propto \lambda^{-16.5} \ e^{- 19.5 \lambda } \\ \end{align*} \]
and we recognize the functional form of a Gamma density, that is \(Gamma(\alpha = 17.5, \beta = 19.5)\). The posterior mean is given by \(\alpha / \beta = 0.8974\).
So the theoretical posterior mean is given by
\[ E[\lambda] = \frac{\alpha + n \bar{x}}{n + \beta} = \frac{2 + 20 *0.85}{20 + 2} = 19/22 = 0.8636364 \]
By direct sampling, using 108 number of simulations, the posterior sample mean is \(0.8636725\).
By direct sampling, a \(95\%\) Credible Interval is given by \[[−0.5200, 1.2931]\]
So, combining modeling and simulations, we are now able to generalize and infer to the whole population of bacteria in the Mekong Delta those values from a sample of size 20.
Example of a multi-stage Gibbs sampler for reliability analysis. We want to set up a Bayesian analysis to eventually draw from the posterior distribution of the number of failures for a given time of observation of nuclear plant pumps based on some initial data.
Keywords: Reliability, Poisson process, Gibbs sampling, Poisson-Gamma model, Empirical Bayes
The data: number of failures and times of observation of \(10\) pumps in a nuclear plant water system (Source: Gaver and O’Muircheartaigh, 1987)
library(knitr)
library(kableExtra)
## Warning in !is.null(rmarkdown::metadata$output) && rmarkdown::metadata$output
## %in% : ‘length(x) = 2 > 1’ dans la conversion automatique vers ‘logical(1)’
##
## Attachement du package : 'kableExtra'
## L'objet suivant est masqué depuis 'package:dplyr':
##
## group_rows
dataset = data.frame('Pump' = 1:10,
'Failures' = c(5,1,5,14,3,19,1,1,4,22),
'Time' = c(94.32, 15.72, 62.88, 125.76, 5.24, 31.44, 1.05, 1.05, 2.10, 10.48))
kable(dataset, col.names = c("Pump", "Failures", "Time"), escape = F, caption = "") %>%
kable_styling(latex_options = "hold_position")
| Pump | Failures | Time |
|---|---|---|
| 1 | 5 | 94.32 |
| 2 | 1 | 15.72 |
| 3 | 5 | 62.88 |
| 4 | 14 | 125.76 |
| 5 | 3 | 5.24 |
| 6 | 19 | 31.44 |
| 7 | 1 | 1.05 |
| 8 | 1 | 1.05 |
| 9 | 4 | 2.10 |
| 10 | 22 | 10.48 |
The failure of the \(i\)th pump follow a Poisson process with parameter \(\lambda_{i}\), for \(i = 1, 2,... ,10\). For an observed time \(t_{i}\) the number of failures \(p_{i}\) is thus a Poisson \(\mathcal{P}(\lambda_{i}t_{i})\) r.v.
Likelihood of failure:
\[ y_{i} \sim \mathcal{P}(\lambda_{i}t_{i}) \]
Prior on \(\lambda_{i}\) and prior on \(\beta\):
\[ \lambda_{i} \sim \mathcal{G}a(\alpha, \beta) \]
\[ \beta \sim \mathcal{G}a(\gamma, \delta) \]
with \(\alpha = 1.8\), \(\gamma = 0.01\) and \(\delta = 1\) (see. Gaver and O’Muircheartaigh 1987 for a motivation of these numerical values)
Joint posterior distribution:
\[ \begin{align*} & \pi(\lambda_{1}, ..., \lambda_{10}, \beta \mid t_{1}, ..., t_{10}, p_{1}, ..., p_{10}) \\ & \propto \prod_{i = 1}^{10} \bigg( (\lambda_{i}t_{i})^{p_{i}} e^{-\lambda_{i}t_{i}} \lambda_{i}^{\alpha - 1} e^{-\beta \lambda_{i}}\bigg) \beta^{10 \alpha} \beta^{ \gamma - 1}e^{-\delta \beta} \\ & \propto \prod_{i = 1}^{10} \bigg( (\lambda_{i})^{p_{i} + \alpha - 1} e^{- (t_{i} + \beta) \lambda_{i}} e^{-\beta \lambda_{i}}\bigg) \beta^{10 \alpha + \gamma - 1}e^{-\delta \beta} \\ \end{align*} \]
and a natural decomposition of \(\pi\) in conditional distributions is
\[ \lambda_{i} \mid \beta, t_{i}, p_{i} \sim \mathcal{G}a(p_{i} + \alpha , t_{i} + \beta) \]
\[ \beta \mid \lambda_{1}, ..., \lambda_{10} \sim \mathcal{G}a \bigg(\gamma + 10\alpha, \delta \sum_{i=1}^{10} \lambda_{i}\bigg) \]
library(knitr)
library(kableExtra)
mat1 <- matrix(c(0.6510, 2.4598, 0.6534, 0.7101), byrow = TRUE, ncol =2)
colnames(mat1) = c('lambda', 'beta')
rownames(mat1) = c('post mean', 'post sd')
df1 <- data.frame(mat1)
kable(df1, col.names = c("lambda", "beta"), escape = F, caption = "My caption") %>%
kable_styling(latex_options = "hold_position")
| lambda | beta | |
|---|---|---|
| post mean | 0.6510 | 2.4598 |
| post sd | 0.6534 | 0.7101 |
We see that the distribution of the number of failures given time is bimodal and right skewed. We can now draw our conclusions not only based on point estimates but we have the whole distribution.
## Warning: The dot-dot notation (`..density..`) was deprecated in ggplot2 3.4.0.
## ℹ Please use `after_stat(density)` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
Donald P. Gaver & I. G. O’Muircheartaigh (1987). Robust Empirical Bayes Analyses of Event Rates, Technometrics, 29:1, 1-15, DOI: 10.1080/00401706.1987.10488178
Robert, C. P., Casella, G. (1999). Monte Carlo statistical methods (Vol. 2). New York: Springer.
The R Project for Statistical Computing: