9.9
(a)
(1)\(R^2_{a.p}\) (2)\(AIC_P\) (3)\(C_P\)
library(olsrr)
library(qpcR)
data = read.table("CH06PR15.txt")
n = length(data$V1)
colnames(data)=c("Y","X1","X2","X3")
# all possible regression
g = lm(Y ~ X1+X2+X3, data=data)
aprm <- ols_step_all_possible(g)
plot(aprm)
aprm
## Index N Predictors R-Square Adj. R-Square Mallow's Cp
## 1 1 1 X1 0.6189843 0.6103248 8.353606
## 3 2 1 X3 0.4154975 0.4022134 35.245643
## 2 3 1 X2 0.3635387 0.3490737 42.112324
## 5 4 2 X1 X3 0.6760864 0.6610206 2.807204
## 4 5 2 X1 X2 0.6549559 0.6389073 5.599735
## 6 6 2 X2 X3 0.4684545 0.4437314 30.247056
## 7 7 3 X1 X2 X3 0.6821943 0.6594939 4.000000
best_subset <- ols_step_best_subset(g)
plot(best_subset)
best_subset
## Best Subsets Regression
## -----------------------
## Model Index Predictors
## -----------------------
## 1 X1
## 2 X1 X3
## 3 X1 X2 X3
## -----------------------
##
## Subsets Regression Summary
## ------------------------------------------------------------------------------------------------------------------------------------
## Adj. Pred
## Model R-Square R-Square R-Square C(p) AIC SBIC SBC MSEP FPE HSP APC
## ------------------------------------------------------------------------------------------------------------------------------------
## 1 0.6190 0.6103 0.5834 8.3536 353.0717 222.1686 358.5577 5325.6912 120.8043 2.6923 0.4157
## 2 0.6761 0.6610 0.6333 2.8072 347.6030 217.4970 354.9176 4635.3382 107.2773 2.3978 0.3691
## 3 0.6822 0.6595 0.6217 4.0000 348.7273 218.9287 357.8705 4658.8561 109.9596 2.4674 0.3783
## ------------------------------------------------------------------------------------------------------------------------------------
## AIC: Akaike Information Criteria
## SBIC: Sawa's Bayesian Information Criteria
## SBC: Schwarz Bayesian Criteria
## MSEP: Estimated error of prediction, assuming multivariate normality
## FPE: Final Prediction Error
## HSP: Hocking's Sp
## APC: Amemiya Prediction Criteria
From all the graphs we can clearly see that all the criteria recommends the same subset X1 and X3.
(4)\(PRESS_P\)
Because the model gives us Pred R-Square instead of the original \(PRESS_P\), we can either revert it back to its original number or calculate it directly again.
PRESS(lm(Y~X1,data=data))$stat
## .........10.........20.........30.........40......
## [1] 5569.562
PRESS(lm(Y~X1+X3,data=data))$stat
## .........10.........20.........30.........40......
## [1] 4902.751
PRESS(lm(Y~X1+X2+X3,data=data))$stat
## .........10.........20.........30.........40......
## [1] 5057.886
We can see that \(PRESS_p\) for X1 is 5569.5615609, for X1X3 is 4902.7513971, for X1X2X3 is 5057.8862675, which recommends the same best subset X1 and X3.
(b)
Yes, the four criteria all recommend X1 and X3 as our best subset. However, this will not always happen.
(c)
No. Here we only have 3 variables. Forward step-wise regression was originally developed to economize on computational efforts, as compared with the various all-possible-regressions procedures. Here the computational burden is relatively low, so the all-possible-regressions is a better method.
9.17
(a)
model1 = lm(Y~X1,data = data)
model2 = lm(Y~X2,data = data)
model3 = lm(Y~X3,data = data)
summary(model1)$fstatistic[1]
## value
## 71.48079
summary(model2)$fstatistic[1]
## value
## 25.13225
summary(model3)$fstatistic[1]
## value
## 31.2777
In the first round, we take each \(X_i\) as our predictor and examine its F-statistic, and we take the best one X1 into our model.
model2 = lm(Y~X1+X2,data = data)
model3 = lm(Y~X1+X3,data = data)
F2 = anova(model2)$"Sum Sq"[2]/anova(model2)$"Sum Sq"[3]*(n-3)
F3 = anova(model3)$"Sum Sq"[2]/anova(model3)$"Sum Sq"[3]*(n-3)
Because we have F3 = 7.5803902 greater than F2 = 4.4828434, and greater than 3, we add \(X_3\) into the model.
model1 = lm(Y~X3+X1,data = data)
F1 = anova(model1)$"Sum Sq"[2]/anova(model1)$"Sum Sq"[3]*(n-3)
model2 = lm(Y~X1+X3+X2,data = data)
F2 = anova(model2)$"Sum Sq"[3]/anova(model2)$"Sum Sq"[4]*(n-4)
model1 = lm(Y~X1+X3,data = data)
beta1 = model1$coefficients[2]
beta2 = model1$coefficients[3]
Because we have F1 = 34.593544 > 2.9, we do not delete \(X_1\) from the model. Also, we have F2 = 0.8072038 < 3, hence, we do not add F2 into the model. And we have the final model: \(y = -1.2004715X_1 + -16.7420519X_3\).
1 - pf(3,1,n-2)
## [1] 0.09027054
1 - pf(3,1,n-3)
## [1] 0.09043337
1 - pf(3,1,n-4)
## [1] 0.09060394
The F-value for adding a new variable is to test whether the new \(\beta_k\) of \(X_k\) is zero, hence the significance of the test is approximately equivalent to 9%.
model1 = lm(Y~X1,data = data)
model2 = lm(Y~X2,data = data)
model3 = lm(Y~X3,data = data)
"F statistic of X1"
## [1] "F statistic of X1"
summary(model1)$fstatistic[1]
## value
## 71.48079
"F statistic of X2"
## [1] "F statistic of X2"
summary(model2)$fstatistic[1]
## value
## 25.13225
"F statistic of X3"
## [1] "F statistic of X3"
summary(model3)$fstatistic[1]
## value
## 31.2777
For forward selection, the first step is the same as forward step-wise regression and without dropping entered variables. So the conclusion from the first step is the same from before, we add \(X_1\) into the model.
model2 = lm(Y~X1+X2,data = data)
model3 = lm(Y~X1+X3,data = data)
"F statistic of X2"
## [1] "F statistic of X2"
anova(model2)$"Sum Sq"[2]/anova(model2)$"Sum Sq"[3]*(n-3)
## [1] 4.482843
"F statistic of X3"
## [1] "F statistic of X3"
anova(model3)$"Sum Sq"[2]/anova(model3)$"Sum Sq"[3]*(n-3)
## [1] 7.58039
From the second step, we see that both \(X_2\) and \(X_3\) have satisfied the required F statistic, with that of \(X_3\) bigger, hence we add \(X_3\) into the model.
model2 = lm(Y~X1+X3+X2,data = data)
"F statistic of X2"
## [1] "F statistic of X2"
anova(model2)$"Sum Sq"[3]/anova(model2)$"Sum Sq"[4]*(n-4)
## [1] 0.8072038
In this case, we decide not to add \(X_2\) to the model of \(X_1\) and \(X_3\).
Finally, the two models(good subsets of variables) we identified is:\(y = \beta_1X1 + \beta_3X3\).
model1 = lm(Y~X2+X3+X1,data = data)
model2 = lm(Y~X1+X3+X2,data = data)
model3 = lm(Y~X1+X2+X3,data = data)
"F statistic of X1"
## [1] "F statistic of X1"
anova(model1)$"Sum Sq"[3]/anova(model1)$"Sum Sq"[4]*(n-4)
## [1] 28.24706
"F statistic of X2"
## [1] "F statistic of X2"
anova(model2)$"Sum Sq"[3]/anova(model2)$"Sum Sq"[4]*(n-4)
## [1] 0.8072038
"F statistic of X3"
## [1] "F statistic of X3"
anova(model3)$"Sum Sq"[3]/anova(model3)$"Sum Sq"[4]*(n-4)
## [1] 3.599735
From the first step of backward elimination, we find out that F statistic of \(X_2\) lower than 2.9. Hence we drop \(X_2\) from our model.
model1 = lm(Y~X3+X1,data = data)
model3 = lm(Y~X1+X3,data = data)
"F statistic of X1"
## [1] "F statistic of X1"
anova(model1)$"Sum Sq"[2]/anova(model1)$"Sum Sq"[3]*(n-3)
## [1] 34.59354
"F statistic of X3"
## [1] "F statistic of X3"
anova(model3)$"Sum Sq"[2]/anova(model3)$"Sum Sq"[3]*(n-3)
## [1] 7.58039
In the second step, both \(X_1\) and \(X_3\) have F statistic greater than 2.9, hence we keep them in the model and reach our conclusion. The model we identified is \(y = \beta_1X1 + \beta_3X3\).
The results are consistent among the three selection procedures. All of them identified the best subset of \(X_1\) and \(X_3\). The results are also the same with that of all possible regression in Problem 9.9.