Suppose \(Y_1, Y_2, \dots, Y_n\) are i.i.d. \(N(\theta, \sigma^2)\) observations. The joint prior distribution for \(\theta\) and \(\sigma^2\) is
\[ \pi(\theta, \sigma^2) = \frac{1}{\sigma^2}. \]
The likelihood function is:
\[ L(\theta, \sigma^2) \propto (\sigma^2)^{-n/2} \exp\left[ -\frac{1}{2\sigma^2} \sum_{i=1}^n (Y_i - \theta)^2 \right] \]
The prior is \(\pi(\theta, \sigma^2) \propto \frac{1}{\sigma^2}\).
By Bayes’ theorem:
\[ \pi(\theta, \sigma^2 \mid y) \propto L(\theta, \sigma^2) \times \pi(\theta, \sigma^2) \]
\[ \pi(\theta, \sigma^2 \mid y) \propto (\sigma^2)^{-n/2} \cdot \frac{1}{\sigma^2} \cdot \exp\left[ -\frac{1}{2\sigma^2} \sum_{i=1}^n (Y_i - \theta)^2 \right] \]
\[ \boxed{\pi(\theta, \sigma^2 \mid y) \propto (\sigma^2)^{-(n+2)/2} \exp\left[ -\frac{1}{2\sigma^2} \sum_{i=1}^n (Y_i - \theta)^2 \right]} \]
For fixed \(\sigma^2\):
\[ \pi(\theta \mid \sigma^2, y) \propto \exp\left[ -\frac{1}{2\sigma^2} \sum_{i=1}^n (Y_i - \theta)^2 \right] \]
Step 1: Expand the left-hand side
Write \(Y_i - \theta = (Y_i - \bar{y}) + (\bar{y} - \theta)\).
Then:
\[ (Y_i - \theta)^2 = (Y_i - \bar{y})^2 + (\bar{y} - \theta)^2 + 2(Y_i - \bar{y})(\bar{y} - \theta) \]
Step 2: Sum over \(i=1\) to \(n\)
\[ \sum_{i=1}^n (Y_i - \theta)^2 = \sum_{i=1}^n (Y_i - \bar{y})^2 + \sum_{i=1}^n (\bar{y} - \theta)^2 + 2\sum_{i=1}^n (Y_i - \bar{y})(\bar{y} - \theta) \]
Step 3: Simplify each term
First term: \(\sum_{i=1}^n (Y_i - \bar{y})^2\) — this is \(S^2\) (or \((n-1)s^2\)).
Second term: \(\sum_{i=1}^n (\bar{y} - \theta)^2 = n(\bar{y} - \theta)^2\) because \((\bar{y} - \theta)^2\) is constant with respect to \(i\).
Third term (the cross product term):
\[ 2\sum_{i=1}^n (Y_i - \bar{y})(\bar{y} - \theta) = 2(\bar{y} - \theta) \sum_{i=1}^n (Y_i - \bar{y}) \]
Step 4: Show the cross product is zero
We need to evaluate \(\sum_{i=1}^n (Y_i - \bar{y})\).
\[ \sum_{i=1}^n (Y_i - \bar{y}) = \sum_{i=1}^n Y_i - \sum_{i=1}^n \bar{y} \]
But \(\sum_{i=1}^n Y_i = n\bar{y}\) (by definition of \(\bar{y}\)), and \(\sum_{i=1}^n \bar{y} = n\bar{y}\).
Thus:
\[ \sum_{i=1}^n (Y_i - \bar{y}) = n\bar{y} - n\bar{y} = 0 \]
Therefore:
\[ 2(\bar{y} - \theta) \times 0 = 0 \]
Step 5: Conclusion
The cross product term vanishes, leaving:
\[ \sum_{i=1}^n (Y_i - \theta)^2 = \sum_{i=1}^n (Y_i - \bar{y})^2 + n(\bar{y} - \theta)^2 \]
Thus:
\[ \pi(\theta \mid \sigma^2, y) \propto \exp\left[ -\frac{1}{2\sigma^2} \left( S^2 + n(\bar{y} - \theta)^2 \right) \right] \]
Since \(S^2\) does not depend on \(\theta\):
\[ \pi(\theta \mid \sigma^2, y) \propto \exp\left[ -\frac{n}{2\sigma^2} (\theta - \bar{y})^2 \right] \]
\[ \boxed{\theta \mid \sigma^2, y \sim N\left(\bar{y}, \frac{\sigma^2}{n}\right)} \]
Let \(Q = \sum_{i=1}^n (Y_i - \theta)^2\). Then:
\[ \pi(\sigma^2 \mid \theta, y) \propto (\sigma^2)^{-(n+2)/2} \exp\left( -\frac{Q}{2\sigma^2} \right) \]
The Inverse-Gamma distribution with shape parameter \(\alpha > 0\) and scale parameter \(\beta > 0\), denoted \(IG(\alpha, \beta)\), has probability density function:
\[ f(x; \alpha, \beta) = \frac{\beta^\alpha}{\Gamma(\alpha)} x^{-\alpha-1} e^{-\beta/x}, \quad x > 0 \]
where \(\Gamma(\alpha)\) is the gamma function.
Key properties:
|:——–|:——–| | Mean (for \(\alpha > 1\)) | \(E[X] = \frac{\beta}{\alpha - 1}\) | | Variance (for \(\alpha > 2\)) | \(\operatorname{Var}(X) = \frac{\beta^2}{(\alpha-1)^2(\alpha-2)}\) | | Mode | \(\frac{\beta}{\alpha+1}\) |
Relationship to Gamma distribution:
If \(X \sim \text{Gamma}(\alpha, \beta)\) with pdf \(f(x) \propto x^{\alpha-1} e^{-\beta x}\), then \(1/X \sim \text{Inverse-Gamma}(\alpha, \beta)\).
Why does it appear here?
Our kernel is:
\[ \pi(\sigma^2 \mid \theta, y) \propto (\sigma^2)^{-(n+2)/2} \exp\left( -\frac{Q}{2\sigma^2} \right) \]
Compare with the Inverse-Gamma pdf kernel: \(x^{-\alpha-1} e^{-\beta/x}\).
We have:
Thus:
\[ \boxed{\sigma^2 \mid \theta, y \sim IG\left(\frac{n}{2}, \frac{Q}{2}\right)} \]
Verification: For \(n=1\), this gives \(IG(0.5, Q/2)\), which is a proper distribution. For \(n \ge 1\), the posterior is proper as long as \(Q > 0\) (i.e., not all \(Y_i = \theta\)).
Integrate out \(\sigma^2\):
\[ \pi(\theta \mid y) \propto \int_0^\infty (\sigma^2)^{-(n+2)/2} \exp\left( -\frac{Q}{2\sigma^2} \right) d\sigma^2 \]
Using the Inverse-Gamma normalizing constant: \(\int_0^\infty x^{-\alpha-1} e^{-\beta/x} dx = \frac{\Gamma(\alpha)}{\beta^\alpha}\).
Here, \(\alpha = n/2\) and \(\beta = Q/2\), so:
\[ \int_0^\infty (\sigma^2)^{-(n+2)/2} e^{-Q/(2\sigma^2)} d\sigma^2 = \frac{\Gamma(n/2)}{(Q/2)^{n/2}} \]
Thus:
\[ \pi(\theta \mid y) \propto Q^{-n/2} = \left[ \sum_{i=1}^n (y_i - \theta)^2 \right]^{-n/2} \]
Express in terms of \(\bar{y}\) and sample variance \(s^2 = \frac{1}{n-1}\sum (y_i - \bar{y})^2\):
\[ \sum (y_i - \theta)^2 = (n-1)s^2 + n(\bar{y} - \theta)^2 \]
Thus:
\[ \pi(\theta \mid y) \propto \left[ 1 + \frac{n(\bar{y} - \theta)^2}{(n-1)s^2} \right]^{-n/2} \]
Let \(t = \frac{\theta - \bar{y}}{s/\sqrt{n}}\). Then:
\[ \pi(\theta \mid y) \propto \left( 1 + \frac{t^2}{n-1} \right)^{-n/2} \]
This is the kernel of a \(t\)-distribution with \(n-1\) degrees of freedom.
\[ \boxed{\theta \mid y \sim \bar{y} + \frac{s}{\sqrt{n}} \cdot t_{n-1}} \]
For \(n > 2\): \(\quad E[\theta \mid y] = \bar{y}\)
For \(n > 3\): \(\quad \operatorname{Var}(\theta \mid y) = \frac{s^2}{n} \cdot \frac{n-1}{n-3}\)
|:——–|:———————–| | Joint posterior | \(\pi(\theta, \sigma^2 \mid y) \propto (\sigma^2)^{-(n+2)/2} \exp\left[ -\frac{1}{2\sigma^2} \sum (y_i - \theta)^2 \right]\) | | \(\theta \mid \sigma^2, y\) | \(N\left(\bar{y}, \frac{\sigma^2}{n}\right)\) | | \(\sigma^2 \mid \theta, y\) | \(IG\left(\frac{n}{2}, \frac{1}{2}\sum (y_i - \theta)^2\right)\) | | \(\theta \mid y\) | \(\bar{y} + \frac{s}{\sqrt{n}} \cdot t_{n-1}\) |
The prior \(\pi(\theta, \sigma^2) = 1/\sigma^2\) is improper (it does not integrate to a finite value). However, the posterior is proper for \(n \ge 1\) because the likelihood provides enough information. This prior is equivalent to the limiting case of the conjugate Normal-Inverse-Gamma prior with variance hyperparameters approaching zero.