Problem 9.12

bailsofhay — Dec 3, 2013, 10:58 PM

data=read.table("http://www.stat.lsu.edu/exstweb/statlab/datasets/KNNLData/APPENC03.txt")
names(data)=c("ID","y","x1","x2","x3","x4","x5","x6")


### Part a ####
fit=lm(y~x1+x2+I(x3==1)+I(x4==1)+I(x6==1999)+I(x6==2001)+I(x6==2002), data =data)
fit

Call:
lm(formula = y ~ x1 + x2 + I(x3 == 1) + I(x4 == 1) + I(x6 == 
    1999) + I(x6 == 2001) + I(x6 == 2002), data = data)

Coefficients:
      (Intercept)                 x1                 x2  
         3.02e+00          -2.47e-01          -9.65e-05  
   I(x3 == 1)TRUE     I(x4 == 1)TRUE  I(x6 == 1999)TRUE  
         4.09e-01           1.24e-01           1.32e-02  
I(x6 == 2001)TRUE  I(x6 == 2002)TRUE  
        -1.09e-01          -8.31e-02  

library(MASS)
stepAIC(fit , k = log(36))
Start:  AIC=-115.6
y ~ x1 + x2 + I(x3 == 1) + I(x4 == 1) + I(x6 == 1999) + I(x6 == 
    2001) + I(x6 == 2002)

                Df Sum of Sq   RSS    AIC
- I(x6 == 1999)  1     0.000 0.655 -119.2
- x2             1     0.006 0.660 -118.9
- I(x6 == 2002)  1     0.022 0.676 -118.0
- x1             1     0.036 0.691 -117.3
- I(x6 == 2001)  1     0.054 0.709 -116.3
<none>                       0.654 -115.6
- I(x4 == 1)     1     0.119 0.774 -113.2
- I(x3 == 1)     1     1.350 2.004  -78.9

Step:  AIC=-119.2
y ~ x1 + x2 + I(x3 == 1) + I(x4 == 1) + I(x6 == 2001) + I(x6 == 
    2002)

                Df Sum of Sq   RSS    AIC
- x2             1     0.005 0.660 -122.5
- I(x6 == 2002)  1     0.025 0.679 -121.4
- x1             1     0.036 0.691 -120.8
- I(x6 == 2001)  1     0.063 0.717 -119.5
<none>                       0.655 -119.2
- I(x4 == 1)     1     0.123 0.778 -116.6
- I(x3 == 1)     1     1.350 2.005  -82.5

Step:  AIC=-122.5
y ~ x1 + I(x3 == 1) + I(x4 == 1) + I(x6 == 2001) + I(x6 == 2002)

                Df Sum of Sq   RSS    AIC
- I(x6 == 2002)  1     0.020 0.680 -125.0
- x1             1     0.032 0.692 -124.4
- I(x6 == 2001)  1     0.057 0.717 -123.0
<none>                       0.660 -122.5
- I(x4 == 1)     1     0.118 0.778 -120.1
- I(x3 == 1)     1     1.362 2.023  -85.7

Step:  AIC=-125
y ~ x1 + I(x3 == 1) + I(x4 == 1) + I(x6 == 2001)

                Df Sum of Sq   RSS  AIC
- I(x6 == 2001)  1     0.038 0.718 -127
<none>                       0.680 -125
- x1             1     0.087 0.768 -124
- I(x4 == 1)     1     0.119 0.799 -123
- I(x3 == 1)     1     1.361 2.041  -89

Step:  AIC=-126.6
y ~ x1 + I(x3 == 1) + I(x4 == 1)

             Df Sum of Sq   RSS    AIC
<none>                    0.718 -126.6
- x1          1     0.113 0.831 -124.9
- I(x4 == 1)  1     0.118 0.836 -124.7
- I(x3 == 1)  1     1.361 2.079  -91.9

Call:
lm(formula = y ~ x1 + I(x3 == 1) + I(x4 == 1), data = data)

Coefficients:
   (Intercept)              x1  I(x3 == 1)TRUE  I(x4 == 1)TRUE  
         3.185          -0.353           0.399           0.118  


#### Part b #####

# The best model from above is Yi=x1+x3+x4
# Yes they are in agreement for part c of 8.42 since dropping x2 and x6 passed the test.