Derivation of the Posterior Predictive Mean for Y0|y

1. The transformation they define

They define:

\[ \widetilde{Y}_0 = Y_0 - \Sigma_{12} H^{-1} y \]

where \(y\) is the observed data vector.

Then they state that the predictive distribution of \(\widetilde{Y}_0 \mid y\) is:

\[ \widetilde{Y}_0 \mid y \;\sim\; t\left( g\beta^*,\; \frac{2b^*}{n+2a}\delta^2 + g(M^*)^{-1}g,\; n+2a \right). \]

Here: - \(g\) is a row vector (often the new predictor \(x_0\) after a transformation) - \(\beta^*\) is the posterior mean of \(\beta\) - \(M^*\) is the posterior precision matrix of \(\beta\) - \(\delta^2 = 1 - \Sigma_{12} H^{-1} \Sigma_{21}\)

2. What is \(g\beta^*\)?

In Bayesian linear models with a conjugate normal-inverse-gamma prior:

\(\beta^* = E[\beta \mid y]\)
\(g\beta^* = E[g\beta \mid y]\) is the posterior mean of the linear combination \(g\beta\).

For prediction, often \(g = x_0 - \Sigma_{12} H^{-1} X\), so \(g\beta = (x_0 - \Sigma_{12} H^{-1} X)\beta\).

Thus \(g\beta^*\) is the plug-in prediction using the posterior mean of \(\beta\), adjusted for the correlation structure.

3. Relationship between \(\widetilde{Y}_0\) and \(Y_0\)

From the definition:

\[ \widetilde{Y}_0 = Y_0 - \Sigma_{12} H^{-1} y. \]

Since \(y\) is observed (given), \(\Sigma_{12} H^{-1} y\) is a constant in the predictive distribution.

Therefore:

\[ \text{Mean}(Y_0 \mid y) = \text{Mean}(\widetilde{Y}_0 \mid y) + \Sigma_{12} H^{-1} y. \]

From the \(t\)-distribution given for \(\widetilde{Y}_0 \mid y\):

\[ \text{Mean}(\widetilde{Y}_0 \mid y) = g\beta^* \quad (\text{provided degrees of freedom } n+2a > 1). \]

Hence:

\[ \boxed{\text{Mean}(Y_0 \mid y) = g\beta^* + \Sigma_{12} H^{-1} y}. \]

4. Why did they introduce \(\widetilde{Y}_0\)?

They transformed \(Y_0\) to \(\widetilde{Y}_0\) to remove the dependence of the conditional mean on \(y\) given \(\beta, \sigma^2\).

Recall from the earlier conditional normal distribution (Equation 6.10):

\[ Y_0 \mid y, \beta, \sigma^2 \sim N\big( x_0\beta + \Sigma_{12} H^{-1} (y - X\beta),\; \sigma^2[1 - \Sigma_{12} H^{-1} \Sigma_{21}] \big). \]

Now compute \(\widetilde{Y}_0 \mid y, \beta, \sigma^2\):

\[ \begin{aligned} \widetilde{Y}_0 &= Y_0 - \Sigma_{12} H^{-1} y \\ &\sim N\big( x_0\beta + \Sigma_{12} H^{-1} (y - X\beta) - \Sigma_{12} H^{-1} y,\; \sigma^2[1 - \Sigma_{12} H^{-1} \Sigma_{21}] \big) \\ &= N\big( x_0\beta - \Sigma_{12} H^{-1} X\beta,\; \sigma^2[1 - \Sigma_{12} H^{-1} \Sigma_{21}] \big) \\ &= N\big( (x_0 - \Sigma_{12} H^{-1} X)\beta,\; \sigma^2\delta^2 \big), \end{aligned} \] where \(\delta^2 = 1 - \Sigma_{12} H^{-1} \Sigma_{21}\).

Define: \[ g = x_0 - \Sigma_{12} H^{-1} X. \]

Then: \[ \widetilde{Y}_0 \mid y, \beta, \sigma^2 \sim N\big( g\beta,\; \sigma^2\delta^2 \big). \]

Key point: The mean no longer depends on \(y\) — that’s the purpose of the transformation.

5. Marginalizing over \(\beta, \sigma^2\)

If we put a normal-inverse-gamma prior on \((\beta, \sigma^2)\):

Posterior of \(\beta \mid \sigma^2, y \sim N(\beta^*, \sigma^2 (M^*)^{-1})\)
Posterior of \(\sigma^2 \mid y \sim \text{Inv-Gamma}(a^*, b^*)\) with \(a^* = a + n/2\), \(b^* = b + \frac12 (y - X\beta^*)^T (y - X\beta^*) + \frac12 (\beta^* - \mu_0)^T M_0 (\beta^* - \mu_0)\) (details depend on prior).

Then integrating out \(\beta, \sigma^2\) yields a Student-t distribution for \(\widetilde{Y}_0 \mid y\):

\[ \widetilde{Y}_0 \mid y \sim t\left( g\beta^*,\; \frac{2b^*}{n+2a}\delta^2 + g(M^*)^{-1}g,\; n+2a \right). \]

This matches the form given in the text.

6. Back to \(Y_0 \mid y\)

Since \(\widetilde{Y}_0 = Y_0 - \Sigma_{12} H^{-1} y\), we add back the constant:

\[ Y_0 \mid y \;\sim\; t\left( g\beta^* + \Sigma_{12} H^{-1} y,\; \frac{2b^*}{n+2a}\delta^2 + g(M^*)^{-1}g,\; n+2a \right). \]

Thus the mean of \(Y_0 \mid y\) is:

\[ \text{Mean}(Y_0 \mid y) = g\beta^* + \Sigma_{12} H^{-1} y. \]

7. Expanding \(g\) to get the familiar form

Recall \(g = x_0 - \Sigma_{12} H^{-1} X\). Substitute:

\[ \begin{aligned} \text{Mean}(Y_0 \mid y) &= (x_0 - \Sigma_{12} H^{-1} X)\beta^* + \Sigma_{12} H^{-1} y \\ &= x_0\beta^* - \Sigma_{12} H^{-1} X\beta^* + \Sigma_{12} H^{-1} y \\ &= x_0\beta^* + \Sigma_{12} H^{-1} (y - X\beta^*). \end{aligned} \]

This is exactly the expression you asked about:

\[ \boxed{x_0\beta^* + \Sigma_{12} H^{-1} (y - X\beta^*)}. \]

8. Interpretation

\(x_0\beta^*\): the standard plug-in prediction using the posterior mean of \(\beta\).
\(\Sigma_{12} H^{-1} (y - X\beta^*)\): a correction term that adjusts the prediction based on the observed residuals \(y - X\beta^*\), weighted by the correlation between \(Y_0\) and \(Y\) (via \(\Sigma_{12}\)) and the inverse correlation of \(Y\) (via \(H^{-1}\)).

If \(Y_0\) is independent of \(Y\) given \(\beta, \sigma^2\), then \(\Sigma_{12} = 0\), and the mean reduces to \(x_0\beta^*\) (the usual predictive mean).

Summary

The transformation \(\widetilde{Y}_0 = Y_0 - \Sigma_{12} H^{-1} y\) removes \(y\) from the conditional mean given \(\beta, \sigma^2\).
Then \(\widetilde{Y}_0 \mid y\) has a simple \(t\)-distribution with mean \(g\beta^*\).
Adding back the constant gives \(\text{Mean}(Y_0 \mid y) = g\beta^* + \Sigma_{12} H^{-1} y\).
Substituting \(g = x_0 - \Sigma_{12} H^{-1} X\) yields the final form \(x_0\beta^* + \Sigma_{12} H^{-1}(y - X\beta^*)\).

This explains why the mean of \(Y_0 \mid y\) is written that way.