Posterior Distribution for Normal Mean with Known Variance

Problem Statement

Suppose we have a random sample of normal data

\[ Y_i \sim N(\mu, \sigma^2), \quad i = 1, \dots, n \]

where \(\sigma^2\) is known but \(\mu\) is unknown.
The sample mean is

\[ \bar{Y} = \frac{1}{n} \sum_{i=1}^n Y_i \sim N\left(\mu, \frac{\sigma^2}{n}\right) \]

so the likelihood function comes from \(\bar{Y}\).

Assume the prior distribution for \(\mu\) is

\[ \mu \sim N\left(\gamma, \frac{\sigma^2}{n_0}\right) \]

where \(\gamma\) and \(n_0\) are known constants.

(i) Show that the posterior distribution for \(\mu\) is normal with

\[ E(\mu \mid \bar{y}) = \frac{n_0 \gamma + n \bar{y}}{n_0 + n}, \quad \mathrm{Var}(\mu \mid \bar{y}) = \frac{\sigma^2}{n_0 + n}. \]

Solution

Step 1: Likelihood (in terms of \(\bar{y}\))

The probability density function for \(\bar{Y}\) given \(\mu\) is:

\[ f(\bar{y} \mid \mu) = \left( \frac{n}{2\pi \sigma^2} \right)^{1/2} \exp\left( -\frac{n(\bar{y} - \mu)^2}{2\sigma^2} \right) \]

Up to a proportionality constant (independent of \(\mu\)), the likelihood kernel is:

\[ f(\bar{y} \mid \mu) \propto \exp\left( -\frac{n(\bar{y} - \mu)^2}{2\sigma^2} \right) \]

Step 2: Prior distribution

The prior density for \(\mu\) is:

\[ \pi(\mu) = \left( \frac{n_0}{2\pi \sigma^2} \right)^{1/2} \exp\left( -\frac{n_0(\mu - \gamma)^2}{2\sigma^2} \right) \]

Up to a constant in \(\mu\), the prior kernel is:

\[ \pi(\mu) \propto \exp\left( -\frac{n_0(\mu - \gamma)^2}{2\sigma^2} \right) \]

Step 3: Posterior is proportional to likelihood × prior

\[ \pi(\mu \mid \bar{y}) \propto \exp\left( -\frac{n(\bar{y} - \mu)^2}{2\sigma^2} \right) \times \exp\left( -\frac{n_0(\mu - \gamma)^2}{2\sigma^2} \right) \]

Combining the exponents:

\[ \pi(\mu \mid \bar{y}) \propto \exp\left[ -\frac{1}{2\sigma^2} \left( n(\bar{y} - \mu)^2 + n_0(\mu - \gamma)^2 \right) \right] \]

Step 4: Expand the quadratic forms

Let

\[ Q = n(\bar{y} - \mu)^2 + n_0(\mu - \gamma)^2 \]

Expand each term:

\[ n(\bar{y}^2 - 2\bar{y}\mu + \mu^2) + n_0(\mu^2 - 2\gamma\mu + \gamma^2) \] \[ = (n + n_0)\mu^2 - 2(n\bar{y} + n_0\gamma)\mu + (n\bar{y}^2 + n_0\gamma^2) \]

Step 5: Complete the square in \(\mu\)

Write \(A = n + n_0\), \(B = n\bar{y} + n_0\gamma\), \(C = n\bar{y}^2 + n_0\gamma^2\).

Then:

\[ Q = A\mu^2 - 2B\mu + C \]

Complete the square:

\[ A\mu^2 - 2B\mu = A\left( \mu^2 - \frac{2B}{A}\mu \right) = A\left[ (\mu - \frac{B}{A})^2 - \left(\frac{B}{A}\right)^2 \right] \]

So:

\[ Q = A(\mu - m)^2 + \left(C - \frac{B^2}{A}\right) \]

where

\[ m = \frac{B}{A} = \frac{n\bar{y} + n_0\gamma}{n_0 + n} \]

The term \(\left(C - \frac{B^2}{A}\right)\) is constant with respect to \(\mu\) (depends only on \(\bar{y}\), \(\gamma\), \(n_0\), \(n\)).

Step 6: Posterior kernel

\[ \pi(\mu \mid \bar{y}) \propto \exp\left( -\frac{A(\mu - m)^2}{2\sigma^2} \right) \times \exp\left( -\frac{\text{constant}}{2\sigma^2} \right) \]

The second exponential is just a normalizing constant (does not depend on \(\mu\)), so we have:

\[ \pi(\mu \mid \bar{y}) \propto \exp\left( -\frac{(n_0 + n)(\mu - m)^2}{2\sigma^2} \right) \]

This is the kernel of a normal distribution with mean \(m\) and variance \(\frac{\sigma^2}{n_0 + n}\).

Step 7: Final result

Therefore, the posterior distribution is:

\[ \mu \mid \bar{y} \sim N\left( \frac{n_0\gamma + n\bar{y}}{n_0 + n} , \frac{\sigma^2}{n_0 + n} \right) \]

That is:

\[ \boxed{E(\mu \mid \bar{y}) = \frac{n_0\gamma + n\bar{y}}{n_0 + n}, \quad \mathrm{Var}(\mu \mid \bar{y}) = \frac{\sigma^2}{n_0 + n}} \]

Comments

This is a classic conjugate prior example: normal prior for \(\mu\) with known variance leads to a normal posterior.
The posterior mean is a weighted average of the prior mean \(\gamma\) and the sample mean \(\bar{y}\), with weights proportional to \(n_0\) and \(n\).
The posterior precision is prior precision \(+\) data precision: \[ \frac{1}{\mathrm{Var}(\mu \mid \bar{y})} = \frac{n_0}{\sigma^2} + \frac{n}{\sigma^2} \]
As \(n \to \infty\), posterior mean \(\to \bar{y}\) and posterior variance \(\to 0\) (consistency).

Part (ii): Interpretation of \(E(\mu \mid \bar{y})\) and \(\mathrm{Var}(\mu \mid \bar{y})\)

1. Interpretation of \(E(\mu \mid \bar{y}) = \frac{n_0 \gamma + n \bar{y}}{n_0 + n}\)

The posterior mean is a weighted average of:

\(\gamma\): the prior mean (our best guess for \(\mu\) before seeing the data)
\(\bar{y}\): the sample mean (the information from the data)

The weights are: - \(n_0\): prior “sample size” (how much confidence we have in the prior mean) - \(n\): actual sample size (how much confidence we have in the data)

Interpretation in words:

The posterior mean combines prior information and observed data, giving more weight to the source with greater “sample size” (i.e., higher precision / lower variance).

If \(n_0\) is large relative to \(n\), the prior dominates → posterior mean stays close to \(\gamma\).
If \(n\) is large relative to \(n_0\), the data dominates → posterior mean stays close to \(\bar{y}\).
If \(n_0 = n\), the posterior mean is the simple average \((\gamma + \bar{y}) / 2\).
This is a convex combination because weights sum to 1:
\[ \frac{n_0}{n_0 + n} + \frac{n}{n_0 + n} = 1 \]

2. Interpretation of \(\mathrm{Var}(\mu \mid \bar{y}) = \frac{\sigma^2}{n_0 + n}\)

The posterior variance measures our uncertainty about \(\mu\) after observing the data.

Interpretation in words:

Posterior precision (inverse variance) = prior precision + data precision.

Prior precision:
\[ \frac{1}{\mathrm{Var}_{\text{prior}}} = \frac{n_0}{\sigma^2} \] Data precision (from \(\bar{y}\)):
\[ \frac{1}{\mathrm{Var}(\bar{Y} \mid \mu)} = \frac{n}{\sigma^2} \]

Thus: \[ \frac{1}{\mathrm{Var}(\mu \mid \bar{y})} = \frac{n_0}{\sigma^2} + \frac{n}{\sigma^2} = \frac{n_0 + n}{\sigma^2} \]

More data (larger \(n\)) → smaller posterior variance → less uncertainty.
Stronger prior (larger \(n_0\)) → smaller posterior variance → more certainty even before seeing data.
If \(n_0 = 0\) (flat prior, no prior info), then: \[ \mathrm{Var}(\mu \mid \bar{y}) = \frac{\sigma^2}{n} \] which matches the sampling variance of \(\bar{Y}\).
If \(n = 0\) (no data), then: \[ \mathrm{Var}(\mu \mid \bar{y}) = \frac{\sigma^2}{n_0} \] which matches the prior variance.

3. Summary Table

Quantity	Formula	Interpretation
Prior mean	\(\gamma\)	Best guess before data
Prior “sample size”	\(n_0\)	Confidence in prior mean (higher \(n_0\) = more confidence)
Sample mean	\(\bar{y}\)	Information from data
Posterior mean	\(\frac{n_0 \gamma + n \bar{y}}{n_0 + n}\)	Weighted average, weights = \(n_0\) and \(n\)
Posterior variance	\(\frac{\sigma^2}{n_0 + n}\)	Total uncertainty after combining data and prior

4. Example for Intuition

Suppose \(\sigma^2 = 1\), prior mean \(\gamma = 0\), prior sample size \(n_0 = 10\), data: \(\bar{y} = 5\), \(n = 5\).

Then: \[ E(\mu \mid \bar{y}) = \frac{10 \times 0 + 5 \times 5}{10 + 5} = \frac{25}{15} \approx 1.67 \] \[ \mathrm{Var}(\mu \mid \bar{y}) = \frac{1}{15} \approx 0.067 \]

Prior said \(\mu\) near 0, data says \(\mu\) near 5.
But prior was stronger (\(n_0 = 10\) vs \(n = 5\)), so posterior mean pulls more toward prior (0) than toward 5 → final estimate ~1.67.
Posterior variance (0.067) is much smaller than prior variance (0.1) and sampling variance (0.2), showing reduced uncertainty.

5. Bayesian Learning Interpretation

Starting prior: \(\mu \sim N(\gamma, \sigma^2 / n_0)\)
After observing \(\bar{y} \sim N(\mu, \sigma^2 / n)\)
Updated posterior: \(\mu \mid \bar{y} \sim N\left( \frac{n_0 \gamma + n \bar{y}}{n_0 + n}, \frac{\sigma^2}{n_0 + n} \right)\)

Thus: - Information from prior and data combine additively in precision (1/variance). - The effective total “sample size” for the posterior is \(n_0 + n\). - This is a classic example of conjugate Bayesian updating for the normal mean with known variance.

Conclusion

We have derived and interpreted the posterior distribution for \(\mu\) when the prior is normal and the variance is known. The results show a natural combination of prior and sample information through precision weighting, providing a clear Bayesian learning mechanism.