STAT GU4205/GR5205 (Section 004) Linear Regression Models

HW7 by Dongrui Liu, Uni:dl3390

9.9

(a)

(1)\(R^2_{a.p}\) (2)\(AIC_P\) (3)\(C_P\)

library(olsrr)
library(qpcR)
data = read.table("CH06PR15.txt")
n = length(data$V1)
colnames(data)=c("Y","X1","X2","X3")
# all possible regression
g = lm(Y ~ X1+X2+X3, data=data)
aprm <- ols_step_all_possible(g)
plot(aprm)

aprm
##   Index N Predictors  R-Square Adj. R-Square Mallow's Cp
## 1     1 1         X1 0.6189843     0.6103248    8.353606
## 3     2 1         X3 0.4154975     0.4022134   35.245643
## 2     3 1         X2 0.3635387     0.3490737   42.112324
## 5     4 2      X1 X3 0.6760864     0.6610206    2.807204
## 4     5 2      X1 X2 0.6549559     0.6389073    5.599735
## 6     6 2      X2 X3 0.4684545     0.4437314   30.247056
## 7     7 3   X1 X2 X3 0.6821943     0.6594939    4.000000
best_subset <- ols_step_best_subset(g)
plot(best_subset)

best_subset
## Best Subsets Regression
## -----------------------
## Model Index    Predictors
## -----------------------
##      1         X1       
##      2         X1 X3    
##      3         X1 X2 X3 
## -----------------------
## 
##                                                      Subsets Regression Summary                                                     
## ------------------------------------------------------------------------------------------------------------------------------------
##                        Adj.        Pred                                                                                              
## Model    R-Square    R-Square    R-Square     C(p)       AIC         SBIC        SBC         MSEP         FPE        HSP       APC  
## ------------------------------------------------------------------------------------------------------------------------------------
##   1        0.6190      0.6103      0.5834    8.3536    353.0717    222.1686    358.5577    5325.6912    120.8043    2.6923    0.4157 
##   2        0.6761      0.6610      0.6333    2.8072    347.6030    217.4970    354.9176    4635.3382    107.2773    2.3978    0.3691 
##   3        0.6822      0.6595      0.6217    4.0000    348.7273    218.9287    357.8705    4658.8561    109.9596    2.4674    0.3783 
## ------------------------------------------------------------------------------------------------------------------------------------
## AIC: Akaike Information Criteria 
##  SBIC: Sawa's Bayesian Information Criteria 
##  SBC: Schwarz Bayesian Criteria 
##  MSEP: Estimated error of prediction, assuming multivariate normality 
##  FPE: Final Prediction Error 
##  HSP: Hocking's Sp 
##  APC: Amemiya Prediction Criteria

From all the graphs we can clearly see that all the criteria recommends the same subset X1 and X3.

(4)\(PRESS_P\)

Because the model gives us Pred R-Square instead of the original \(PRESS_P\), we can either revert it back to its original number or calculate it directly again.

PRESS(lm(Y~X1,data=data))$stat
## .........10.........20.........30.........40......
## [1] 5569.562
PRESS(lm(Y~X1+X3,data=data))$stat
## .........10.........20.........30.........40......
## [1] 4902.751
PRESS(lm(Y~X1+X2+X3,data=data))$stat
## .........10.........20.........30.........40......
## [1] 5057.886

We can see that \(PRESS_p\) for X1 is 5569.5615609, for X1X3 is 4902.7513971, for X1X2X3 is 5057.8862675, which recommends the same best subset X1 and X3.

(b)

Yes, the four criteria all recommend X1 and X3 as our best subset. However, this will not always happen.

(c)

No. Here we only have 3 variables. Forward step-wise regression was originally developed to economize on computational efforts, as compared with the various all-possible-regressions procedures. Here the computational burden is relatively low, so the all-possible-regressions is a better method.

9.17

(a)

model1 = lm(Y~X1,data = data)
model2 = lm(Y~X2,data = data)
model3 = lm(Y~X3,data = data)
summary(model1)$fstatistic[1]
##    value 
## 71.48079
summary(model2)$fstatistic[1]
##    value 
## 25.13225
summary(model3)$fstatistic[1]
##   value 
## 31.2777

In the first round, we take each \(X_i\) as our predictor and examine its F-statistic, and we take the best one X1 into our model.

model2 = lm(Y~X1+X2,data = data)
model3 = lm(Y~X1+X3,data = data)
F2 = anova(model2)$"Sum Sq"[2]/anova(model2)$"Sum Sq"[3]*(n-3)
F3 = anova(model3)$"Sum Sq"[2]/anova(model3)$"Sum Sq"[3]*(n-3)

Because we have F3 = 7.5803902 greater than F2 = 4.4828434, and greater than 3, we add \(X_3\) into the model.

model1 = lm(Y~X3+X1,data = data)
F1 = anova(model1)$"Sum Sq"[2]/anova(model1)$"Sum Sq"[3]*(n-3)
model2 = lm(Y~X1+X3+X2,data = data)
F2 = anova(model2)$"Sum Sq"[3]/anova(model2)$"Sum Sq"[4]*(n-4)
model1 = lm(Y~X1+X3,data = data)
beta1 = model1$coefficients[2]
beta2 = model1$coefficients[3]

Because we have F1 = 34.593544 > 2.9, we do not delete \(X_1\) from the model. Also, we have F2 = 0.8072038 < 3, hence, we do not add F2 into the model. And we have the final model: \(y = -1.2004715X_1 + -16.7420519X_3\).

1 - pf(3,1,n-2)
## [1] 0.09027054
1 - pf(3,1,n-3)
## [1] 0.09043337
1 - pf(3,1,n-4)
## [1] 0.09060394

The F-value for adding a new variable is to test whether the new \(\beta_k\) of \(X_k\) is zero, hence the significance of the test is approximately equivalent to 9%.

model1 = lm(Y~X1,data = data)
model2 = lm(Y~X2,data = data)
model3 = lm(Y~X3,data = data)
"F statistic of X1"
## [1] "F statistic of X1"
summary(model1)$fstatistic[1]
##    value 
## 71.48079
"F statistic of X2"
## [1] "F statistic of X2"
summary(model2)$fstatistic[1]
##    value 
## 25.13225
"F statistic of X3"
## [1] "F statistic of X3"
summary(model3)$fstatistic[1]
##   value 
## 31.2777

For forward selection, the first step is the same as forward step-wise regression and without dropping entered variables. So the conclusion from the first step is the same from before, we add \(X_1\) into the model.

model2 = lm(Y~X1+X2,data = data)
model3 = lm(Y~X1+X3,data = data)
"F statistic of X2"
## [1] "F statistic of X2"
anova(model2)$"Sum Sq"[2]/anova(model2)$"Sum Sq"[3]*(n-3)
## [1] 4.482843
"F statistic of X3"
## [1] "F statistic of X3"
anova(model3)$"Sum Sq"[2]/anova(model3)$"Sum Sq"[3]*(n-3)
## [1] 7.58039

From the second step, we see that both \(X_2\) and \(X_3\) have satisfied the required F statistic, with that of \(X_3\) bigger, hence we add \(X_3\) into the model.

model2 = lm(Y~X1+X3+X2,data = data)
"F statistic of X2"
## [1] "F statistic of X2"
anova(model2)$"Sum Sq"[3]/anova(model2)$"Sum Sq"[4]*(n-4)
## [1] 0.8072038

In this case, we decide not to add \(X_2\) to the model of \(X_1\) and \(X_3\).

Finally, the two models(good subsets of variables) we identified is:\(y = \beta_1X1 + \beta_3X3\).

model1 = lm(Y~X2+X3+X1,data = data)
model2 = lm(Y~X1+X3+X2,data = data)
model3 = lm(Y~X1+X2+X3,data = data)
"F statistic of X1"
## [1] "F statistic of X1"
anova(model1)$"Sum Sq"[3]/anova(model1)$"Sum Sq"[4]*(n-4)
## [1] 28.24706
"F statistic of X2"
## [1] "F statistic of X2"
anova(model2)$"Sum Sq"[3]/anova(model2)$"Sum Sq"[4]*(n-4)
## [1] 0.8072038
"F statistic of X3"
## [1] "F statistic of X3"
anova(model3)$"Sum Sq"[3]/anova(model3)$"Sum Sq"[4]*(n-4)
## [1] 3.599735

From the first step of backward elimination, we find out that F statistic of \(X_2\) lower than 2.9. Hence we drop \(X_2\) from our model.

model1 = lm(Y~X3+X1,data = data)
model3 = lm(Y~X1+X3,data = data)
"F statistic of X1"
## [1] "F statistic of X1"
anova(model1)$"Sum Sq"[2]/anova(model1)$"Sum Sq"[3]*(n-3)
## [1] 34.59354
"F statistic of X3"
## [1] "F statistic of X3"
anova(model3)$"Sum Sq"[2]/anova(model3)$"Sum Sq"[3]*(n-3)
## [1] 7.58039

In the second step, both \(X_1\) and \(X_3\) have F statistic greater than 2.9, hence we keep them in the model and reach our conclusion. The model we identified is \(y = \beta_1X1 + \beta_3X3\).

The results are consistent among the three selection procedures. All of them identified the best subset of \(X_1\) and \(X_3\). The results are also the same with that of all possible regression in Problem 9.9.