对于任意的\(\mathbf x_i\), 需要预测的观测值\(y_{ie}\)(满足\(\mathbb E_e[y_{ie}] = t_i\) 和 \(\mathbb E_e[y_{ie}] = \hat{y}_i\))与预测值\(\hat{y}_i\)之间的差异由什么决定? 对于任意的训练集\(d\)(对于\(K\)-fold-cv, \(d = 1, \cdots,\ K\)), 测试集样本对应的残差平方\((\hat{y}_{d} - y_{de})^2\)关于\(d\)的期望满足:
\[
\begin{aligned}
\mathbb E_d \big[\mathbb E_{ie} [(\hat{y}_{di} - y_{die})^2| \mathbf x_{di}]\big]
=&
\mathbb E_d \big[\mathbb E_{ie} [(\hat{y}_{di} - t_{di} + t_{di} - y_{die})^2]\big]\\
=&
\mathbb E_d \big[\mathbb E_{i} [(\hat{y}_{di} - t_{di})^2] + \mathbb E_{ie} [(t_{di} - y_{die})^2] + 2 \underbrace{\mathbb E_{ie} [(\hat{y}_{di} - t_{di})(t_{di} - y_{die})]}_0\big] \\
=&
\mathbb E_{di} [(\hat{y}_{di} - \mathbb E_{di} [\hat{y}_{di}| \mathbf x_{di}] + \mathbb E_{di} [\hat{y}_{di} | \mathbf x_{di}] - t_{di})^2] + \mathbb E_{die}[(t_{di} - y_{die})^2]\\
=&
\mathbb E_{di} [(\hat{y}_{di} - \mathbb E_{di} [\hat{y}_{di} | \mathbf x_{di}])^2] + \mathbb E_{di} [(\mathbb E_{di} [\hat{y}_{di} | \mathbf x_{di}] - t_{di})^2] + \mathbb E_{die} [(t_{di} - y_{die})^2] \\
&+ 2 \underbrace{
\mathbb E_{di} [\big(\hat{y}_{di} - \mathbb E_{di} [\hat{y}_{di}| \mathbf x_{di}]\big) \big( \mathbb E_{di} [\hat{y}_{di} | \mathbf x_{di}] - t_{di} \big)]}_{0}.
\end{aligned}
\]
\[
\begin{aligned}
\underbrace{\mathbb E_{die} [(\hat{y}_{di} - y_{die})^2| \mathbf x_{di}]}_{测试集均方误差\rm{MSE}}
=&
\underbrace{\mathbb E_{di} \big[(\hat{y}_{di} - \mathbb E_{di} [\hat{y}_{di} | \mathbf x_{di}])^2 \big]}_{方差(\rm variance)}
+
\underbrace{\mathbb E_{di} \big[(\mathbb E_{di} [\hat{y}_{di} | \mathbf x_{di}] - t_{di})^2 \big]}_{偏差(\rm bias)} \big]
\\
&+ \underbrace{\mathbb E_{die}[(t_{di} - y_{die})^2 | \mathbf x_{di}]}_{误差(\rm noise)}.
\end{aligned}
\]