Prove:with Gaussian errors and a linear model, Mallows’ Cp and AIC are equivalent
Using definition of Mallows’ \(C_p\) and AIC \[C_p=\frac{1}{n}(RSS+2d\hat\sigma^2)\] \[AIC=-2logL+2d\] Where L is the maximised value of the likelihood function.
First,find the loglikelihood function for Gaussian errors: Given Y~N(\(\mu_1\),\(\sigma_1^2\)) and \(\hat Y\)~N(\(\mu_2\),\(\sigma_2^2\)),for large sample,we have: \(Y-\hat Y\)=\(\epsilon\)~N(\(\mu\),\(\sigma^2\)) where \(\mu\)=0 and \(\sigma^2\)=\(\sigma_1^2\)+\(\sigma_2^2\) Then,the probability density function f of normal distribution can be writen as follows: \[f((Y-\hat Y|0,\sigma^2)=\frac{1}{\sqrt{2\pi\sigma^2}}e^{-\frac{(y-\hat y)^2}{2\sigma^2}}\]
The likelihood formula can be written as: \[L(\theta;\epsilon_1,...,\epsilon_n)=\prod_{i=1}^nf(y_i-\hat y_i)=(\frac{1}{\sqrt{2\pi\sigma^2}})^ne^{-\frac{\sum_{i=1}^n{(y_i-\hat y_i)^2}}{2\sigma^2}}\]
Given the MLE estimators for \(\hat\mu\) and \(\sigma^2\),the log-likelihood formula is therefore given by: \[logL(\theta;\epsilon_1,...,\epsilon_n)=-\frac{n}{2}log2\pi-\frac{n}{2}log\hat\sigma^2-\frac{RSS}{2\hat\sigma^2}\]
Plug the above log-likelihood formula into AIC, have: \[AIC=nlog2\pi+nlog\hat\sigma^2+\frac{RSS}{\hat\sigma^2}+2d\] It seems that AIC and \(C_p\) are not equivalent.(Can I remove the extraneous factor of n in front of RSS in AIC derivation?)
I try to express AIC and \(C_p\) in another form.
(Reference:Page 44~46 of “Multivariate Statistical Modeling and Data Analysis” by H.Bozdogan & Arjun K.Gupta,2012. for more details: http://t.cn/RWDtOar)
Mallows’ \(C_p\) criterion: \[C_p=\frac{1}{\sigma^2}RSS-n+2p\] Where \(\sigma^2\) is estimated by \(\hat\sigma^2\)=\(\frac{RSS}{n}\).
AIC criterion for linear regression model: \[AIC=nlog\hat\sigma^2+2d\] Rewrite AIC in the equivalent form,have: \[AIC=nlog\hat\sigma^2+2d-nlog\sigma^2\] Then,prove \(C_p\)=AIC.
Proof.\[AIC=nlog\frac{\hat\sigma^2}{\sigma^2}+2d=nlog\frac{\sigma^2+(\hat\sigma^2-\sigma^2)}{\sigma^2}+2d=nlog(1+\frac{\hat\sigma^2-\sigma^2}{\sigma^2})+2d\]
According to Taylor expansion: \[log(1+x)=\sum_{i=1}^n\frac{(-1)^{n+1}}{n}x^n,\forall x\in(-1,1]\]
Using the approximation,we have: \[AIC=n(\frac{\hat\sigma^2-\sigma^2}{\sigma^2})+2d=n\frac{\hat\sigma^2}{\sigma^2}-n+2d=\frac{1}{\sigma^2}RSS-n+2=C_p\] The proof is done.