1. For the prostate data, fit a model with lpsa as the response and the other variables as predictors.
library(faraway)
data("prostate", package = "faraway")
head(prostate)
##       lcavol lweight age      lbph svi      lcp gleason pgg45     lpsa
## 1 -0.5798185  2.7695  50 -1.386294   0 -1.38629       6     0 -0.43078
## 2 -0.9942523  3.3196  58 -1.386294   0 -1.38629       6     0 -0.16252
## 3 -0.5108256  2.6912  74 -1.386294   0 -1.38629       7    20 -0.16252
## 4 -1.2039728  3.2828  58 -1.386294   0 -1.38629       6     0 -0.16252
## 5  0.7514161  3.4324  62 -1.386294   0 -1.38629       6     0  0.37156
## 6 -1.0498221  3.2288  50 -1.386294   0 -1.38629       6     0  0.76547
lmod<-lm(lpsa~lcavol+lweight+age+lbph+svi+lcp+gleason+pgg45, data=prostate)
summary(lmod)
## 
## Call:
## lm(formula = lpsa ~ lcavol + lweight + age + lbph + svi + lcp + 
##     gleason + pgg45, data = prostate)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.7331 -0.3713 -0.0170  0.4141  1.6381 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.669337   1.296387   0.516  0.60693    
## lcavol       0.587022   0.087920   6.677 2.11e-09 ***
## lweight      0.454467   0.170012   2.673  0.00896 ** 
## age         -0.019637   0.011173  -1.758  0.08229 .  
## lbph         0.107054   0.058449   1.832  0.07040 .  
## svi          0.766157   0.244309   3.136  0.00233 ** 
## lcp         -0.105474   0.091013  -1.159  0.24964    
## gleason      0.045142   0.157465   0.287  0.77503    
## pgg45        0.004525   0.004421   1.024  0.30886    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.7084 on 88 degrees of freedom
## Multiple R-squared:  0.6548, Adjusted R-squared:  0.6234 
## F-statistic: 20.86 on 8 and 88 DF,  p-value: < 2.2e-16

(a)Compute and comment on the condition numbers.

x <- model.matrix(lmod)[,-1]
e <- eigen(t(x) %*% x)
e$val
## [1] 4.790826e+05 6.190704e+04 2.109042e+02 1.756329e+02 6.479853e+01
## [6] 4.452379e+01 2.023914e+01 8.093145e+00
sqrt(e$val[1]/e$val)
## [1]   1.00000   2.78186  47.66094  52.22787  85.98499 103.73114 153.85414
## [8] 243.30248

There is a wide range in the eigenvalues and several condition numbers are large. This means that problems are being caused by more than just one linear combination.

  1. Compute and comment on the correlations between the predictors.
round(cor(prostate[,-9]),2)
##         lcavol lweight  age  lbph   svi   lcp gleason pgg45
## lcavol    1.00    0.19 0.22  0.03  0.54  0.68    0.43  0.43
## lweight   0.19    1.00 0.31  0.43  0.11  0.10    0.00  0.05
## age       0.22    0.31 1.00  0.35  0.12  0.13    0.27  0.28
## lbph      0.03    0.43 0.35  1.00 -0.09 -0.01    0.08  0.08
## svi       0.54    0.11 0.12 -0.09  1.00  0.67    0.32  0.46
## lcp       0.68    0.10 0.13 -0.01  0.67  1.00    0.51  0.63
## gleason   0.43    0.00 0.27  0.08  0.32  0.51    1.00  0.75
## pgg45     0.43    0.05 0.28  0.08  0.46  0.63    0.75  1.00

Some highly correlated pairs include: gleason:pgg45, lcp:lcavol, svi:lcp (c)Compute the variance inflation factors. Is there any evidence that collinearity causes some predictors not to be significant? Explain.

vif(lmod)
##   lcavol  lweight      age     lbph      svi      lcp  gleason    pgg45 
## 2.054115 1.363704 1.323599 1.375534 1.956881 3.097954 2.473411 2.974361

Considering there is no much inflation of the variance of the coefficients, there is no evidence that collinearity is causing predictors to be significanct.

(d)Does the removal of insignificant predictors from the model reduce collinearity?

from the summary, gleason is the most insignificant variable and we can remove it from the model.

prostate1<-prostate[,-7]

lmod1<-lm(lpsa~lcavol+lweight+age+lbph+svi+lcp+pgg45, data=prostate1)
summary(lmod1)
## 
## Call:
## lm(formula = lpsa ~ lcavol + lweight + age + lbph + svi + lcp + 
##     pgg45, data = prostate1)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.73117 -0.38137 -0.01728  0.43364  1.63513 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.953926   0.829439   1.150  0.25319    
## lcavol       0.591615   0.086001   6.879 8.07e-10 ***
## lweight      0.448292   0.167771   2.672  0.00897 ** 
## age         -0.019336   0.011066  -1.747  0.08402 .  
## lbph         0.107671   0.058108   1.853  0.06720 .  
## svi          0.757734   0.241282   3.140  0.00229 ** 
## lcp         -0.104482   0.090478  -1.155  0.25127    
## pgg45        0.005318   0.003433   1.549  0.12488    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.7048 on 89 degrees of freedom
## Multiple R-squared:  0.6544, Adjusted R-squared:  0.6273 
## F-statistic: 24.08 on 7 and 89 DF,  p-value: < 2.2e-16

from the summary, lcavol,lweight and svi still seem significant. lcp varibale seems insignificant.

prostate2<-prostate1[,-6]
prostate2
##        lcavol lweight age      lbph svi pgg45     lpsa
## 1  -0.5798185  2.7695  50 -1.386294   0     0 -0.43078
## 2  -0.9942523  3.3196  58 -1.386294   0     0 -0.16252
## 3  -0.5108256  2.6912  74 -1.386294   0    20 -0.16252
## 4  -1.2039728  3.2828  58 -1.386294   0     0 -0.16252
## 5   0.7514161  3.4324  62 -1.386294   0     0  0.37156
## 6  -1.0498221  3.2288  50 -1.386294   0     0  0.76547
## 7   0.7371641  3.4735  64  0.615186   0     0  0.76547
## 8   0.6931472  3.5395  58  1.536867   0     0  0.85442
## 9  -0.7765288  3.5395  47 -1.386294   0     0  1.04732
## 10  0.2231436  3.2445  63 -1.386294   0     0  1.04732
## 11  0.2546422  3.6041  65 -1.386294   0     0  1.26695
## 12 -1.3470736  3.5987  63  1.266948   0     0  1.26695
## 13  1.6134299  3.0229  63 -1.386294   0    30  1.26695
## 14  1.4770487  2.9982  67 -1.386294   0     5  1.34807
## 15  1.2059708  3.4420  57 -1.386294   0     5  1.39872
## 16  1.5411591  3.0611  66 -1.386294   0     0  1.44692
## 17 -0.4155154  3.5160  70  1.244155   0    30  1.47018
## 18  2.2884862  3.6494  66 -1.386294   0     0  1.49290
## 19 -0.5621189  3.2677  41 -1.386294   0     0  1.55814
## 20  0.1823216  3.8254  70  1.658228   0     0  1.59939
## 21  1.1474025  3.4194  59 -1.386294   0     0  1.63900
## 22  2.0592388  3.5010  60  1.474763   0    20  1.65823
## 23 -0.5447272  3.3759  59 -0.798508   0     0  1.69562
## 24  1.7817091  3.4516  63  0.438255   0    60  1.71380
## 25  0.3852624  3.6674  69  1.599388   0     0  1.73166
## 26  1.4469190  3.1246  68  0.300105   0     0  1.76644
## 27  0.5128236  3.7197  65 -1.386294   0    70  1.80006
## 28 -0.4004776  3.8660  67  1.816452   0    20  1.81645
## 29  1.0402767  3.1290  67  0.223144   0    80  1.84845
## 30  2.4096442  3.3759  65 -1.386294   0     0  1.89462
## 31  0.2851789  4.0902  65  1.962908   0     0  1.92425
## 32  0.1823216  6.1076  65  1.704748   0     0  2.00821
## 33  1.2753628  3.0374  71  1.266948   0     0  2.00821
## 34  0.0099503  3.2677  54 -1.386294   0     0  2.02155
## 35 -0.0100503  3.2169  63 -1.386294   0     0  2.04769
## 36  1.3083328  4.1198  64  2.171337   0     5  2.08567
## 37  1.4231083  3.6571  73 -0.579818   0    15  2.15756
## 38  0.4574248  2.3749  64 -1.386294   0    15  2.19165
## 39  2.6609586  4.0851  68  1.373716   1    35  2.21375
## 40  0.7975072  3.0131  56  0.936093   0     5  2.27727
## 41  0.6205765  3.1420  60 -1.386294   0    80  2.29757
## 42  1.4422020  3.6826  68 -1.386294   0    10  2.30757
## 43  0.5822156  3.8660  62  1.713798   0     0  2.32728
## 44  1.7715568  3.8969  61 -1.386294   0     6  2.37491
## 45  1.4861397  3.4095  66  1.749200   0    20  2.52172
## 46  1.6639261  3.3928  61  0.615186   0    15  2.55334
## 47  2.7278528  3.9954  79  1.879465   1   100  2.56879
## 48  1.1631508  4.0351  68  1.713798   0    40  2.56879
## 49  1.7457155  3.4980  43 -1.386294   0     0  2.59152
## 50  1.2208299  3.5681  70  1.373716   0     0  2.59152
## 51  1.0919233  3.9936  68 -1.386294   0    50  2.65676
## 52  1.6601310  4.2348  64  2.073172   0     0  2.67759
## 53  0.5128236  3.6336  64  1.492904   0    70  2.68444
## 54  2.1270405  4.1215  68  1.766442   0    40  2.69124
## 55  3.1535904  3.5160  59 -1.386294   0     5  2.70471
## 56  1.2669476  4.2801  66  2.122262   0    15  2.71800
## 57  0.9745596  2.8651  47 -1.386294   0     4  2.78809
## 58  0.4637340  3.7647  49  1.423108   0     0  2.79423
## 59  0.5423243  4.1782  70  0.438255   0    20  2.80639
## 60  1.0612565  3.8512  61  1.294727   0    40  2.81241
## 61  0.4574248  4.5245  73  2.326302   0     0  2.84200
## 62  1.9974177  3.7197  63  1.619388   1    40  2.85359
## 63  2.7757088  3.5249  72 -1.386294   0    95  2.85359
## 64  2.0347056  3.9170  66  2.008214   1    60  2.88200
## 65  2.0731719  3.6230  64 -1.386294   0     0  2.88200
## 66  1.4586150  3.8362  61  1.321756   0    20  2.88759
## 67  2.0228712  3.8785  68  1.783391   0    70  2.92047
## 68  2.1983351  4.0509  72  2.307573   0    10  2.96269
## 69 -0.4462871  4.4085  69 -1.386294   0     0  2.96269
## 70  1.1939225  4.7804  72  2.326302   0     5  2.97298
## 71  1.8640801  3.5932  60 -1.386294   1    60  3.01308
## 72  1.1600209  3.3411  77  1.749200   0    25  3.03735
## 73  1.2149127  3.8254  69 -1.386294   1    20  3.05636
## 74  1.8389611  3.2367  60  0.438255   1    90  3.07501
## 75  2.9992262  3.8491  69 -1.386294   1    20  3.27526
## 76  3.1411305  3.2638  68 -0.051293   1    50  3.33755
## 77  2.0108950  4.4338  72  2.122262   0    60  3.39283
## 78  2.5376572  4.3548  78  2.326302   0    10  3.43560
## 79  2.6483002  3.5821  69 -1.386294   1    70  3.45789
## 80  2.7794402  3.8232  63 -1.386294   0    50  3.51304
## 81  1.4678743  3.0704  66  0.559616   0    40  3.51601
## 82  2.5136561  3.4735  57  0.438255   0    60  3.53076
## 83  2.6130067  3.8888  77 -0.527633   1    30  3.56530
## 84  2.6775910  3.8384  65  1.115142   0    70  3.57094
## 85  1.5623463  3.7099  60  1.695616   0    30  3.58768
## 86  3.3028493  3.5190  64 -1.386294   1    60  3.63099
## 87  2.0241931  3.7317  58  1.638997   0     0  3.68009
## 88  1.7316555  3.3690  62 -1.386294   1    30  3.71235
## 89  2.8075938  4.7181  65 -1.386294   1    60  3.98434
## 90  1.5623463  3.6951  76  0.936093   1    75  3.99360
## 91  3.2464910  4.1018  68 -1.386294   0     0  4.02981
## 92  2.5329028  3.6776  61  1.348073   1    15  4.12955
## 93  2.8302678  3.8764  68 -1.386294   1    60  4.38515
## 94  3.8210036  3.8969  44 -1.386294   1    40  4.68444
## 95  2.9074474  3.3962  52 -1.386294   1    10  5.14312
## 96  2.8825636  3.7739  68  1.558145   1    80  5.47751
## 97  3.4719665  3.9750  68  0.438255   1    20  5.58293
lmod2<-lm(lpsa~lcavol+lweight+age+lbph+svi+pgg45, data=prostate2)
summary(lmod2)
## 
## Call:
## lm(formula = lpsa ~ lcavol + lweight + age + lbph + svi + pgg45, 
##     data = prostate2)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.77711 -0.41708  0.00002  0.40676  1.59681 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.980085   0.830665   1.180  0.24116    
## lcavol       0.545770   0.076431   7.141 2.31e-10 ***
## lweight      0.449450   0.168078   2.674  0.00890 ** 
## age         -0.017470   0.010967  -1.593  0.11469    
## lbph         0.105755   0.058191   1.817  0.07249 .  
## svi          0.641666   0.219757   2.920  0.00442 ** 
## pgg45        0.003528   0.003068   1.150  0.25331    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.7061 on 90 degrees of freedom
## Multiple R-squared:  0.6493, Adjusted R-squared:  0.6259 
## F-statistic: 27.77 on 6 and 90 DF,  p-value: < 2.2e-16

from the summary, lcavol,lweight and svi still seem significant. pgg45 varibale seems insignificant.

prostate3<-prostate2[,-6]
lmod3<-lm(lpsa~lcavol+lweight+age+lbph+svi, data=prostate3)
summary(lmod3)
## 
## Call:
## lm(formula = lpsa ~ lcavol + lweight + age + lbph + svi, data = prostate3)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.83505 -0.39396  0.00414  0.46336  1.57888 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.95100    0.83175   1.143 0.255882    
## lcavol       0.56561    0.07459   7.583 2.77e-11 ***
## lweight      0.42369    0.16687   2.539 0.012814 *  
## age         -0.01489    0.01075  -1.385 0.169528    
## lbph         0.11184    0.05805   1.927 0.057160 .  
## svi          0.72095    0.20902   3.449 0.000854 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.7073 on 91 degrees of freedom
## Multiple R-squared:  0.6441, Adjusted R-squared:  0.6245 
## F-statistic: 32.94 on 5 and 91 DF,  p-value: < 2.2e-16

from the summary, lcavol,lweight and svi still seem significant. age varibale seems insignificant.

prostate4<-prostate3[,-3]
prostate4
##        lcavol lweight      lbph svi     lpsa
## 1  -0.5798185  2.7695 -1.386294   0 -0.43078
## 2  -0.9942523  3.3196 -1.386294   0 -0.16252
## 3  -0.5108256  2.6912 -1.386294   0 -0.16252
## 4  -1.2039728  3.2828 -1.386294   0 -0.16252
## 5   0.7514161  3.4324 -1.386294   0  0.37156
## 6  -1.0498221  3.2288 -1.386294   0  0.76547
## 7   0.7371641  3.4735  0.615186   0  0.76547
## 8   0.6931472  3.5395  1.536867   0  0.85442
## 9  -0.7765288  3.5395 -1.386294   0  1.04732
## 10  0.2231436  3.2445 -1.386294   0  1.04732
## 11  0.2546422  3.6041 -1.386294   0  1.26695
## 12 -1.3470736  3.5987  1.266948   0  1.26695
## 13  1.6134299  3.0229 -1.386294   0  1.26695
## 14  1.4770487  2.9982 -1.386294   0  1.34807
## 15  1.2059708  3.4420 -1.386294   0  1.39872
## 16  1.5411591  3.0611 -1.386294   0  1.44692
## 17 -0.4155154  3.5160  1.244155   0  1.47018
## 18  2.2884862  3.6494 -1.386294   0  1.49290
## 19 -0.5621189  3.2677 -1.386294   0  1.55814
## 20  0.1823216  3.8254  1.658228   0  1.59939
## 21  1.1474025  3.4194 -1.386294   0  1.63900
## 22  2.0592388  3.5010  1.474763   0  1.65823
## 23 -0.5447272  3.3759 -0.798508   0  1.69562
## 24  1.7817091  3.4516  0.438255   0  1.71380
## 25  0.3852624  3.6674  1.599388   0  1.73166
## 26  1.4469190  3.1246  0.300105   0  1.76644
## 27  0.5128236  3.7197 -1.386294   0  1.80006
## 28 -0.4004776  3.8660  1.816452   0  1.81645
## 29  1.0402767  3.1290  0.223144   0  1.84845
## 30  2.4096442  3.3759 -1.386294   0  1.89462
## 31  0.2851789  4.0902  1.962908   0  1.92425
## 32  0.1823216  6.1076  1.704748   0  2.00821
## 33  1.2753628  3.0374  1.266948   0  2.00821
## 34  0.0099503  3.2677 -1.386294   0  2.02155
## 35 -0.0100503  3.2169 -1.386294   0  2.04769
## 36  1.3083328  4.1198  2.171337   0  2.08567
## 37  1.4231083  3.6571 -0.579818   0  2.15756
## 38  0.4574248  2.3749 -1.386294   0  2.19165
## 39  2.6609586  4.0851  1.373716   1  2.21375
## 40  0.7975072  3.0131  0.936093   0  2.27727
## 41  0.6205765  3.1420 -1.386294   0  2.29757
## 42  1.4422020  3.6826 -1.386294   0  2.30757
## 43  0.5822156  3.8660  1.713798   0  2.32728
## 44  1.7715568  3.8969 -1.386294   0  2.37491
## 45  1.4861397  3.4095  1.749200   0  2.52172
## 46  1.6639261  3.3928  0.615186   0  2.55334
## 47  2.7278528  3.9954  1.879465   1  2.56879
## 48  1.1631508  4.0351  1.713798   0  2.56879
## 49  1.7457155  3.4980 -1.386294   0  2.59152
## 50  1.2208299  3.5681  1.373716   0  2.59152
## 51  1.0919233  3.9936 -1.386294   0  2.65676
## 52  1.6601310  4.2348  2.073172   0  2.67759
## 53  0.5128236  3.6336  1.492904   0  2.68444
## 54  2.1270405  4.1215  1.766442   0  2.69124
## 55  3.1535904  3.5160 -1.386294   0  2.70471
## 56  1.2669476  4.2801  2.122262   0  2.71800
## 57  0.9745596  2.8651 -1.386294   0  2.78809
## 58  0.4637340  3.7647  1.423108   0  2.79423
## 59  0.5423243  4.1782  0.438255   0  2.80639
## 60  1.0612565  3.8512  1.294727   0  2.81241
## 61  0.4574248  4.5245  2.326302   0  2.84200
## 62  1.9974177  3.7197  1.619388   1  2.85359
## 63  2.7757088  3.5249 -1.386294   0  2.85359
## 64  2.0347056  3.9170  2.008214   1  2.88200
## 65  2.0731719  3.6230 -1.386294   0  2.88200
## 66  1.4586150  3.8362  1.321756   0  2.88759
## 67  2.0228712  3.8785  1.783391   0  2.92047
## 68  2.1983351  4.0509  2.307573   0  2.96269
## 69 -0.4462871  4.4085 -1.386294   0  2.96269
## 70  1.1939225  4.7804  2.326302   0  2.97298
## 71  1.8640801  3.5932 -1.386294   1  3.01308
## 72  1.1600209  3.3411  1.749200   0  3.03735
## 73  1.2149127  3.8254 -1.386294   1  3.05636
## 74  1.8389611  3.2367  0.438255   1  3.07501
## 75  2.9992262  3.8491 -1.386294   1  3.27526
## 76  3.1411305  3.2638 -0.051293   1  3.33755
## 77  2.0108950  4.4338  2.122262   0  3.39283
## 78  2.5376572  4.3548  2.326302   0  3.43560
## 79  2.6483002  3.5821 -1.386294   1  3.45789
## 80  2.7794402  3.8232 -1.386294   0  3.51304
## 81  1.4678743  3.0704  0.559616   0  3.51601
## 82  2.5136561  3.4735  0.438255   0  3.53076
## 83  2.6130067  3.8888 -0.527633   1  3.56530
## 84  2.6775910  3.8384  1.115142   0  3.57094
## 85  1.5623463  3.7099  1.695616   0  3.58768
## 86  3.3028493  3.5190 -1.386294   1  3.63099
## 87  2.0241931  3.7317  1.638997   0  3.68009
## 88  1.7316555  3.3690 -1.386294   1  3.71235
## 89  2.8075938  4.7181 -1.386294   1  3.98434
## 90  1.5623463  3.6951  0.936093   1  3.99360
## 91  3.2464910  4.1018 -1.386294   0  4.02981
## 92  2.5329028  3.6776  1.348073   1  4.12955
## 93  2.8302678  3.8764 -1.386294   1  4.38515
## 94  3.8210036  3.8969 -1.386294   1  4.68444
## 95  2.9074474  3.3962 -1.386294   1  5.14312
## 96  2.8825636  3.7739  1.558145   1  5.47751
## 97  3.4719665  3.9750  0.438255   1  5.58293
lmod4<-lm(lpsa~lcavol+lweight+lbph+svi, data=prostate4)
summary(lmod4)
## 
## Call:
## lm(formula = lpsa ~ lcavol + lweight + lbph + svi, data = prostate4)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.82653 -0.42270  0.04362  0.47041  1.48530 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.14554    0.59747   0.244  0.80809    
## lcavol       0.54960    0.07406   7.422 5.64e-11 ***
## lweight      0.39088    0.16600   2.355  0.02067 *  
## lbph         0.09009    0.05617   1.604  0.11213    
## svi          0.71174    0.20996   3.390  0.00103 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.7108 on 92 degrees of freedom
## Multiple R-squared:  0.6366, Adjusted R-squared:  0.6208 
## F-statistic: 40.29 on 4 and 92 DF,  p-value: < 2.2e-16

from the summary, lcavol,lweight and svi still seem significant.