Bayes Estimator for an Exponential Rate Parameter Using a Gamma Prior

Introduction

We derive the Bayes estimator for an exponential rate parameter \(\theta\) under squared error loss, using a conjugate Gamma prior.

Model Setup

Likelihood

Let \(Y_1, \dots, Y_n \mid \theta \sim \text{Exponential}(\theta)\), i.i.d. The probability density function for one observation is:

\[ f(y_i \mid \theta) = \theta e^{-\theta y_i}, \quad y_i > 0, \ \theta > 0 \]

For \(n\) independent observations, the joint likelihood is:

\[ f(\mathbf{y} \mid \theta) = \prod_{i=1}^n \theta e^{-\theta y_i} = \theta^n e^{-\theta \sum_{i=1}^n y_i} \]

As a kernel (ignoring constants not involving \(\theta\)):

\[ f(\mathbf{y} \mid \theta) \propto \theta^n e^{-\theta \sum y_i} \]

Prior Distribution

The conjugate prior for the rate parameter \(\theta\) of an exponential distribution is the Gamma distribution.

Let \(\theta \sim \text{Gamma}(\alpha, \beta)\) with shape \(\alpha > 0\) and rate \(\beta > 0\). The probability density function is:

\[ \pi(\theta) = \frac{\beta^\alpha}{\Gamma(\alpha)} \theta^{\alpha-1} e^{-\beta \theta}, \quad \theta > 0 \]

As a kernel:

\[ \pi(\theta) \propto \theta^{\alpha-1} e^{-\beta \theta} \]

Loss Function

We use squared error loss:

\[ L(\theta, \hat{\theta}) = (\theta - \hat{\theta})^2 \]

For squared error loss, the Bayes estimator is the posterior mean:

\[ \hat{\theta}_{\text{Bayes}} = \mathbb{E}[\theta \mid \mathbf{y}] \]

Step 1: Apply Bayes’ Theorem

Bayes’ theorem states:

\[ \pi(\theta \mid \mathbf{y}) = \frac{f(\mathbf{y} \mid \theta) \pi(\theta)}{m(\mathbf{y})} \propto f(\mathbf{y} \mid \theta) \times \pi(\theta) \]

where \(m(\mathbf{y})\) is the marginal likelihood (normalizing constant).

Step 2: Multiply Likelihood and Prior

Likelihood kernel:

\[ f(\mathbf{y} \mid \theta) \propto \theta^n e^{-\theta \sum y_i} \]

Prior kernel:

\[ \pi(\theta) \propto \theta^{\alpha-1} e^{-\beta \theta} \]

Multiplying:

\[ \pi(\theta \mid \mathbf{y}) \propto \theta^n e^{-\theta \sum y_i} \times \theta^{\alpha-1} e^{-\beta \theta} \]

Combine powers of \(\theta\):

\[ \pi(\theta \mid \mathbf{y}) \propto \theta^{(n + \alpha - 1)} e^{-\theta (\beta + \sum y_i)} \]

Step 3: Recognize the Posterior Distribution

The kernel

\[ \theta^{n + \alpha - 1} e^{-\theta (\beta + \sum y_i)} \]

is exactly the kernel of a Gamma distribution with:

Shape parameter: \(a = n + \alpha\)
Rate parameter: \(b = \beta + \sum_{i=1}^n y_i\)

Therefore:

\[ \theta \mid \mathbf{y} \sim \text{Gamma}\left( n + \alpha,\ \beta + \sum_{i=1}^n y_i \right) \]

The full posterior density is:

\[ \pi(\theta \mid \mathbf{y}) = \frac{(\beta + \sum y_i)^{n+\alpha}}{\Gamma(n+\alpha)} \theta^{n+\alpha-1} e^{-\theta (\beta + \sum y_i)} \]

Step 4: Compute the Posterior Mean

For a \(\text{Gamma}(a, b)\) distribution (shape \(a\), rate \(b\)), the mean is:

\[ \mathbb{E}[\theta] = \frac{a}{b} \]

Applying this to our posterior:

\[ \mathbb{E}[\theta \mid \mathbf{y}] = \frac{n + \alpha}{\beta + \sum_{i=1}^n y_i} \]

Step 5: State the Bayes Estimator

Under squared error loss, the Bayes estimator is:

\[ \boxed{\hat{\theta}_{\text{Bayes}} = \frac{n + \alpha}{\beta + \sum_{i=1}^n y_i}} \]

where:

\(n\) = sample size
\(\sum y_i\) = sum of observed data
\(\alpha, \beta\) = prior hyperparameters (shape and rate of Gamma prior)

Special Case: Example 4.6

If we choose the prior \(\text{Gamma}(\alpha = 1, \beta = \mu)\), then:

Prior shape = 1 (exponential prior with rate \(\mu\))
Prior rate = \(\mu\)

The Bayes estimator becomes:

\[ \hat{\theta}_{\text{Bayes}} = \frac{n + 1}{\mu + \sum_{i=1}^n y_i} \]

This matches the formula from Example 4.6.

Interpretation

The estimator can be written as:

\[ \hat{\theta} = \frac{n + \alpha}{\beta + n \bar{y}} \]

where \(\bar{y} = \frac{1}{n} \sum y_i\).

The MLE for the exponential rate is \(\hat{\theta}_{\text{MLE}} = \frac{n}{\sum y_i} = \frac{1}{\bar{y}}\).

The Bayes estimator shrinks the MLE toward the prior mean \(\alpha / \beta\).

Numerical Examples

Example 1: Weak prior, MLE = 0.5

Suppose we observe \(n = 5\) observations with \(\sum y_i = 10\), and we use a prior \(\text{Gamma}(\alpha = 1, \beta = 2)\):

n <- 5
sum_y <- 10
alpha <- 1
beta <- 2

theta_hat <- (n + alpha) / (beta + sum_y)
theta_hat

## [1] 0.5

mle <- n / sum_y
mle

## [1] 0.5