They define:
\[ \widetilde{Y}_0 = Y_0 - \Sigma_{12} H^{-1} y \]
where \(y\) is the observed data vector.
Then they state that the predictive distribution of \(\widetilde{Y}_0 \mid y\) is:
\[ \widetilde{Y}_0 \mid y \;\sim\; t\left( g\beta^*,\; \frac{2b^*}{n+2a}\delta^2 + g(M^*)^{-1}g,\; n+2a \right). \]
Here: - \(g\) is a row vector (often the new predictor \(x_0\) after a transformation) - \(\beta^*\) is the posterior mean of \(\beta\) - \(M^*\) is the posterior precision matrix of \(\beta\) - \(\delta^2 = 1 - \Sigma_{12} H^{-1} \Sigma_{21}\)
In Bayesian linear models with a conjugate normal-inverse-gamma prior:
For prediction, often \(g = x_0 - \Sigma_{12} H^{-1} X\), so \(g\beta = (x_0 - \Sigma_{12} H^{-1} X)\beta\).
Thus \(g\beta^*\) is the plug-in prediction using the posterior mean of \(\beta\), adjusted for the correlation structure.
From the definition:
\[ \widetilde{Y}_0 = Y_0 - \Sigma_{12} H^{-1} y. \]
Since \(y\) is observed (given), \(\Sigma_{12} H^{-1} y\) is a constant in the predictive distribution.
Therefore:
\[ \text{Mean}(Y_0 \mid y) = \text{Mean}(\widetilde{Y}_0 \mid y) + \Sigma_{12} H^{-1} y. \]
From the \(t\)-distribution given for \(\widetilde{Y}_0 \mid y\):
\[ \text{Mean}(\widetilde{Y}_0 \mid y) = g\beta^* \quad (\text{provided degrees of freedom } n+2a > 1). \]
Hence:
\[ \boxed{\text{Mean}(Y_0 \mid y) = g\beta^* + \Sigma_{12} H^{-1} y}. \]
They transformed \(Y_0\) to \(\widetilde{Y}_0\) to remove the dependence of the conditional mean on \(y\) given \(\beta, \sigma^2\).
Recall from the earlier conditional normal distribution (Equation 6.10):
\[ Y_0 \mid y, \beta, \sigma^2 \sim N\big( x_0\beta + \Sigma_{12} H^{-1} (y - X\beta),\; \sigma^2[1 - \Sigma_{12} H^{-1} \Sigma_{21}] \big). \]
Now compute \(\widetilde{Y}_0 \mid y, \beta, \sigma^2\):
\[ \begin{aligned} \widetilde{Y}_0 &= Y_0 - \Sigma_{12} H^{-1} y \\ &\sim N\big( x_0\beta + \Sigma_{12} H^{-1} (y - X\beta) - \Sigma_{12} H^{-1} y,\; \sigma^2[1 - \Sigma_{12} H^{-1} \Sigma_{21}] \big) \\ &= N\big( x_0\beta - \Sigma_{12} H^{-1} X\beta,\; \sigma^2[1 - \Sigma_{12} H^{-1} \Sigma_{21}] \big) \\ &= N\big( (x_0 - \Sigma_{12} H^{-1} X)\beta,\; \sigma^2\delta^2 \big), \end{aligned} \] where \(\delta^2 = 1 - \Sigma_{12} H^{-1} \Sigma_{21}\).
Define: \[ g = x_0 - \Sigma_{12} H^{-1} X. \]
Then: \[ \widetilde{Y}_0 \mid y, \beta, \sigma^2 \sim N\big( g\beta,\; \sigma^2\delta^2 \big). \]
Key point: The mean no longer depends on \(y\) — that’s the purpose of the transformation.
If we put a normal-inverse-gamma prior on \((\beta, \sigma^2)\):
Then integrating out \(\beta, \sigma^2\) yields a Student-t distribution for \(\widetilde{Y}_0 \mid y\):
\[ \widetilde{Y}_0 \mid y \sim t\left( g\beta^*,\; \frac{2b^*}{n+2a}\delta^2 + g(M^*)^{-1}g,\; n+2a \right). \]
This matches the form given in the text.
Since \(\widetilde{Y}_0 = Y_0 - \Sigma_{12} H^{-1} y\), we add back the constant:
\[ Y_0 \mid y \;\sim\; t\left( g\beta^* + \Sigma_{12} H^{-1} y,\; \frac{2b^*}{n+2a}\delta^2 + g(M^*)^{-1}g,\; n+2a \right). \]
Thus the mean of \(Y_0 \mid y\) is:
\[ \text{Mean}(Y_0 \mid y) = g\beta^* + \Sigma_{12} H^{-1} y. \]
Recall \(g = x_0 - \Sigma_{12} H^{-1} X\). Substitute:
\[ \begin{aligned} \text{Mean}(Y_0 \mid y) &= (x_0 - \Sigma_{12} H^{-1} X)\beta^* + \Sigma_{12} H^{-1} y \\ &= x_0\beta^* - \Sigma_{12} H^{-1} X\beta^* + \Sigma_{12} H^{-1} y \\ &= x_0\beta^* + \Sigma_{12} H^{-1} (y - X\beta^*). \end{aligned} \]
This is exactly the expression you asked about:
\[ \boxed{x_0\beta^* + \Sigma_{12} H^{-1} (y - X\beta^*)}. \]
If \(Y_0\) is independent of \(Y\) given \(\beta, \sigma^2\), then \(\Sigma_{12} = 0\), and the mean reduces to \(x_0\beta^*\) (the usual predictive mean).
This explains why the mean of \(Y_0 \mid y\) is written that way.