C. Donovan
9 Feb 2018
\[ \mathbf{y} = f(\mathbf{X}) + \mathbf{e} \]
Conceptually:
Key points
Overfitting - which can variously mean (these are different angles of the same thing) the model is:
from our perspective, let's consider it not being optimal in terms of generalisation error by being too complex.
\[ R^2=1-\frac{(n-1)^{-1}\sum_{i=1}^n(y_i-\hat{y}_i)^2}{(n-1)^{-1}\sum_{i=1}^n(y_i-\bar{y})^2} = 1-\frac{{\mathrm SSE}/(n-1)}{{\mathrm SST}/(n-1)} \]
Can we fix it a bit?…
\( \rm{adjusted}-R^2= \)
\[ 1-\frac{(n-p-1)^{-1}\sum_{i=1}^n(y_i-\hat{y}_i)^2}{(n-1)^{-1}\sum_{i=1}^n(y_i-\bar{y})^2} =1-\frac{SSE/(n-p-1)}{SST/(n-1)} \]
A penalized Likelihood
\[ AIC=-2\ell + 2p \qquad (= 2n(log_e(RSS/n)) + 2p) \]
Penalized Likelihood
\[ AIC=-2\ell + 2p \qquad (= 2n(log_e(RSS/n)) + 2p) \]
Model fit against unseen data – two approaches:
Model fit against new' data
[obvs?] All assume our data is similar to what future data will be like (representative of future signal/noise)