Suppose we have a random sample of normal data
\[ Y_i \sim N(\mu, \sigma^2), \quad i = 1, \dots, n \]
where \(\sigma^2\) is known but
\(\mu\) is unknown.
The sample mean is
\[ \bar{Y} = \frac{1}{n} \sum_{i=1}^n Y_i \sim N\left(\mu, \frac{\sigma^2}{n}\right) \]
so the likelihood function comes from \(\bar{Y}\).
Assume the prior distribution for \(\mu\) is
\[ \mu \sim N\left(\gamma, \frac{\sigma^2}{n_0}\right) \]
where \(\gamma\) and \(n_0\) are known constants.
(i) Show that the posterior distribution for \(\mu\) is normal with
\[ E(\mu \mid \bar{y}) = \frac{n_0 \gamma + n \bar{y}}{n_0 + n}, \quad \mathrm{Var}(\mu \mid \bar{y}) = \frac{\sigma^2}{n_0 + n}. \]
The probability density function for \(\bar{Y}\) given \(\mu\) is:
\[ f(\bar{y} \mid \mu) = \left( \frac{n}{2\pi \sigma^2} \right)^{1/2} \exp\left( -\frac{n(\bar{y} - \mu)^2}{2\sigma^2} \right) \]
Up to a proportionality constant (independent of \(\mu\)), the likelihood kernel is:
\[ f(\bar{y} \mid \mu) \propto \exp\left( -\frac{n(\bar{y} - \mu)^2}{2\sigma^2} \right) \]
The prior density for \(\mu\) is:
\[ \pi(\mu) = \left( \frac{n_0}{2\pi \sigma^2} \right)^{1/2} \exp\left( -\frac{n_0(\mu - \gamma)^2}{2\sigma^2} \right) \]
Up to a constant in \(\mu\), the prior kernel is:
\[ \pi(\mu) \propto \exp\left( -\frac{n_0(\mu - \gamma)^2}{2\sigma^2} \right) \]
\[ \pi(\mu \mid \bar{y}) \propto \exp\left( -\frac{n(\bar{y} - \mu)^2}{2\sigma^2} \right) \times \exp\left( -\frac{n_0(\mu - \gamma)^2}{2\sigma^2} \right) \]
Combining the exponents:
\[ \pi(\mu \mid \bar{y}) \propto \exp\left[ -\frac{1}{2\sigma^2} \left( n(\bar{y} - \mu)^2 + n_0(\mu - \gamma)^2 \right) \right] \]
Let
\[ Q = n(\bar{y} - \mu)^2 + n_0(\mu - \gamma)^2 \]
Expand each term:
\[ n(\bar{y}^2 - 2\bar{y}\mu + \mu^2) + n_0(\mu^2 - 2\gamma\mu + \gamma^2) \] \[ = (n + n_0)\mu^2 - 2(n\bar{y} + n_0\gamma)\mu + (n\bar{y}^2 + n_0\gamma^2) \]
Write \(A = n + n_0\), \(B = n\bar{y} + n_0\gamma\), \(C = n\bar{y}^2 + n_0\gamma^2\).
Then:
\[ Q = A\mu^2 - 2B\mu + C \]
Complete the square:
\[ A\mu^2 - 2B\mu = A\left( \mu^2 - \frac{2B}{A}\mu \right) = A\left[ (\mu - \frac{B}{A})^2 - \left(\frac{B}{A}\right)^2 \right] \]
So:
\[ Q = A(\mu - m)^2 + \left(C - \frac{B^2}{A}\right) \]
where
\[ m = \frac{B}{A} = \frac{n\bar{y} + n_0\gamma}{n_0 + n} \]
The term \(\left(C - \frac{B^2}{A}\right)\) is constant with respect to \(\mu\) (depends only on \(\bar{y}\), \(\gamma\), \(n_0\), \(n\)).
\[ \pi(\mu \mid \bar{y}) \propto \exp\left( -\frac{A(\mu - m)^2}{2\sigma^2} \right) \times \exp\left( -\frac{\text{constant}}{2\sigma^2} \right) \]
The second exponential is just a normalizing constant (does not depend on \(\mu\)), so we have:
\[ \pi(\mu \mid \bar{y}) \propto \exp\left( -\frac{(n_0 + n)(\mu - m)^2}{2\sigma^2} \right) \]
This is the kernel of a normal distribution with mean \(m\) and variance \(\frac{\sigma^2}{n_0 + n}\).
Therefore, the posterior distribution is:
\[ \mu \mid \bar{y} \sim N\left( \frac{n_0\gamma + n\bar{y}}{n_0 + n} , \frac{\sigma^2}{n_0 + n} \right) \]
That is:
\[ \boxed{E(\mu \mid \bar{y}) = \frac{n_0\gamma + n\bar{y}}{n_0 + n}, \quad \mathrm{Var}(\mu \mid \bar{y}) = \frac{\sigma^2}{n_0 + n}} \]
The posterior mean is a weighted average of:
The weights are: - \(n_0\): prior “sample size” (how much confidence we have in the prior mean) - \(n\): actual sample size (how much confidence we have in the data)
Interpretation in words:
The posterior mean combines prior information and observed data, giving more weight to the source with greater “sample size” (i.e., higher precision / lower variance).
The posterior variance measures our uncertainty about \(\mu\) after observing the data.
Interpretation in words:
Posterior precision (inverse variance) = prior precision + data precision.
Prior precision:
\[
\frac{1}{\mathrm{Var}_{\text{prior}}} = \frac{n_0}{\sigma^2}
\] Data precision (from \(\bar{y}\)):
\[
\frac{1}{\mathrm{Var}(\bar{Y} \mid \mu)} = \frac{n}{\sigma^2}
\]
Thus: \[ \frac{1}{\mathrm{Var}(\mu \mid \bar{y})} = \frac{n_0}{\sigma^2} + \frac{n}{\sigma^2} = \frac{n_0 + n}{\sigma^2} \]
| Quantity | Formula | Interpretation |
|---|---|---|
| Prior mean | \(\gamma\) | Best guess before data |
| Prior “sample size” | \(n_0\) | Confidence in prior mean (higher \(n_0\) = more confidence) |
| Sample mean | \(\bar{y}\) | Information from data |
| Posterior mean | \(\frac{n_0 \gamma + n \bar{y}}{n_0 + n}\) | Weighted average, weights = \(n_0\) and \(n\) |
| Posterior variance | \(\frac{\sigma^2}{n_0 + n}\) | Total uncertainty after combining data and prior |
Suppose \(\sigma^2 = 1\), prior mean \(\gamma = 0\), prior sample size \(n_0 = 10\), data: \(\bar{y} = 5\), \(n = 5\).
Then: \[ E(\mu \mid \bar{y}) = \frac{10 \times 0 + 5 \times 5}{10 + 5} = \frac{25}{15} \approx 1.67 \] \[ \mathrm{Var}(\mu \mid \bar{y}) = \frac{1}{15} \approx 0.067 \]
Thus: - Information from prior and data combine additively in precision (1/variance). - The effective total “sample size” for the posterior is \(n_0 + n\). - This is a classic example of conjugate Bayesian updating for the normal mean with known variance.
We have derived and interpreted the posterior distribution for \(\mu\) when the prior is normal and the variance is known. The results show a natural combination of prior and sample information through precision weighting, providing a clear Bayesian learning mechanism.
Comments