library(faraway)
data("prostate", package = "faraway")
head(prostate)
## lcavol lweight age lbph svi lcp gleason pgg45 lpsa
## 1 -0.5798185 2.7695 50 -1.386294 0 -1.38629 6 0 -0.43078
## 2 -0.9942523 3.3196 58 -1.386294 0 -1.38629 6 0 -0.16252
## 3 -0.5108256 2.6912 74 -1.386294 0 -1.38629 7 20 -0.16252
## 4 -1.2039728 3.2828 58 -1.386294 0 -1.38629 6 0 -0.16252
## 5 0.7514161 3.4324 62 -1.386294 0 -1.38629 6 0 0.37156
## 6 -1.0498221 3.2288 50 -1.386294 0 -1.38629 6 0 0.76547
lmod<-lm(lpsa~lcavol+lweight+age+lbph+svi+lcp+gleason+pgg45, data=prostate)
summary(lmod)
##
## Call:
## lm(formula = lpsa ~ lcavol + lweight + age + lbph + svi + lcp +
## gleason + pgg45, data = prostate)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.7331 -0.3713 -0.0170 0.4141 1.6381
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.669337 1.296387 0.516 0.60693
## lcavol 0.587022 0.087920 6.677 2.11e-09 ***
## lweight 0.454467 0.170012 2.673 0.00896 **
## age -0.019637 0.011173 -1.758 0.08229 .
## lbph 0.107054 0.058449 1.832 0.07040 .
## svi 0.766157 0.244309 3.136 0.00233 **
## lcp -0.105474 0.091013 -1.159 0.24964
## gleason 0.045142 0.157465 0.287 0.77503
## pgg45 0.004525 0.004421 1.024 0.30886
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.7084 on 88 degrees of freedom
## Multiple R-squared: 0.6548, Adjusted R-squared: 0.6234
## F-statistic: 20.86 on 8 and 88 DF, p-value: < 2.2e-16
(a)Compute and comment on the condition numbers.
x <- model.matrix(lmod)[,-1]
e <- eigen(t(x) %*% x)
e$val
## [1] 4.790826e+05 6.190704e+04 2.109042e+02 1.756329e+02 6.479853e+01
## [6] 4.452379e+01 2.023914e+01 8.093145e+00
sqrt(e$val[1]/e$val)
## [1] 1.00000 2.78186 47.66094 52.22787 85.98499 103.73114 153.85414
## [8] 243.30248
There is a wide range in the eigenvalues and several condition numbers are large. This means that problems are being caused by more than just one linear combination.
round(cor(prostate[,-9]),2)
## lcavol lweight age lbph svi lcp gleason pgg45
## lcavol 1.00 0.19 0.22 0.03 0.54 0.68 0.43 0.43
## lweight 0.19 1.00 0.31 0.43 0.11 0.10 0.00 0.05
## age 0.22 0.31 1.00 0.35 0.12 0.13 0.27 0.28
## lbph 0.03 0.43 0.35 1.00 -0.09 -0.01 0.08 0.08
## svi 0.54 0.11 0.12 -0.09 1.00 0.67 0.32 0.46
## lcp 0.68 0.10 0.13 -0.01 0.67 1.00 0.51 0.63
## gleason 0.43 0.00 0.27 0.08 0.32 0.51 1.00 0.75
## pgg45 0.43 0.05 0.28 0.08 0.46 0.63 0.75 1.00
Some highly correlated pairs include: gleason:pgg45, lcp:lcavol, svi:lcp (c)Compute the variance inflation factors. Is there any evidence that collinearity causes some predictors not to be significant? Explain.
vif(lmod)
## lcavol lweight age lbph svi lcp gleason pgg45
## 2.054115 1.363704 1.323599 1.375534 1.956881 3.097954 2.473411 2.974361
Considering there is no much inflation of the variance of the coefficients, there is no evidence that collinearity is causing predictors to be significanct.
(d)Does the removal of insignificant predictors from the model reduce collinearity?
from the summary, gleason is the most insignificant variable and we can remove it from the model.
prostate1<-prostate[,-7]
lmod1<-lm(lpsa~lcavol+lweight+age+lbph+svi+lcp+pgg45, data=prostate1)
summary(lmod1)
##
## Call:
## lm(formula = lpsa ~ lcavol + lweight + age + lbph + svi + lcp +
## pgg45, data = prostate1)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.73117 -0.38137 -0.01728 0.43364 1.63513
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.953926 0.829439 1.150 0.25319
## lcavol 0.591615 0.086001 6.879 8.07e-10 ***
## lweight 0.448292 0.167771 2.672 0.00897 **
## age -0.019336 0.011066 -1.747 0.08402 .
## lbph 0.107671 0.058108 1.853 0.06720 .
## svi 0.757734 0.241282 3.140 0.00229 **
## lcp -0.104482 0.090478 -1.155 0.25127
## pgg45 0.005318 0.003433 1.549 0.12488
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.7048 on 89 degrees of freedom
## Multiple R-squared: 0.6544, Adjusted R-squared: 0.6273
## F-statistic: 24.08 on 7 and 89 DF, p-value: < 2.2e-16
from the summary, lcavol,lweight and svi still seem significant. lcp varibale seems insignificant.
prostate2<-prostate1[,-6]
prostate2
## lcavol lweight age lbph svi pgg45 lpsa
## 1 -0.5798185 2.7695 50 -1.386294 0 0 -0.43078
## 2 -0.9942523 3.3196 58 -1.386294 0 0 -0.16252
## 3 -0.5108256 2.6912 74 -1.386294 0 20 -0.16252
## 4 -1.2039728 3.2828 58 -1.386294 0 0 -0.16252
## 5 0.7514161 3.4324 62 -1.386294 0 0 0.37156
## 6 -1.0498221 3.2288 50 -1.386294 0 0 0.76547
## 7 0.7371641 3.4735 64 0.615186 0 0 0.76547
## 8 0.6931472 3.5395 58 1.536867 0 0 0.85442
## 9 -0.7765288 3.5395 47 -1.386294 0 0 1.04732
## 10 0.2231436 3.2445 63 -1.386294 0 0 1.04732
## 11 0.2546422 3.6041 65 -1.386294 0 0 1.26695
## 12 -1.3470736 3.5987 63 1.266948 0 0 1.26695
## 13 1.6134299 3.0229 63 -1.386294 0 30 1.26695
## 14 1.4770487 2.9982 67 -1.386294 0 5 1.34807
## 15 1.2059708 3.4420 57 -1.386294 0 5 1.39872
## 16 1.5411591 3.0611 66 -1.386294 0 0 1.44692
## 17 -0.4155154 3.5160 70 1.244155 0 30 1.47018
## 18 2.2884862 3.6494 66 -1.386294 0 0 1.49290
## 19 -0.5621189 3.2677 41 -1.386294 0 0 1.55814
## 20 0.1823216 3.8254 70 1.658228 0 0 1.59939
## 21 1.1474025 3.4194 59 -1.386294 0 0 1.63900
## 22 2.0592388 3.5010 60 1.474763 0 20 1.65823
## 23 -0.5447272 3.3759 59 -0.798508 0 0 1.69562
## 24 1.7817091 3.4516 63 0.438255 0 60 1.71380
## 25 0.3852624 3.6674 69 1.599388 0 0 1.73166
## 26 1.4469190 3.1246 68 0.300105 0 0 1.76644
## 27 0.5128236 3.7197 65 -1.386294 0 70 1.80006
## 28 -0.4004776 3.8660 67 1.816452 0 20 1.81645
## 29 1.0402767 3.1290 67 0.223144 0 80 1.84845
## 30 2.4096442 3.3759 65 -1.386294 0 0 1.89462
## 31 0.2851789 4.0902 65 1.962908 0 0 1.92425
## 32 0.1823216 6.1076 65 1.704748 0 0 2.00821
## 33 1.2753628 3.0374 71 1.266948 0 0 2.00821
## 34 0.0099503 3.2677 54 -1.386294 0 0 2.02155
## 35 -0.0100503 3.2169 63 -1.386294 0 0 2.04769
## 36 1.3083328 4.1198 64 2.171337 0 5 2.08567
## 37 1.4231083 3.6571 73 -0.579818 0 15 2.15756
## 38 0.4574248 2.3749 64 -1.386294 0 15 2.19165
## 39 2.6609586 4.0851 68 1.373716 1 35 2.21375
## 40 0.7975072 3.0131 56 0.936093 0 5 2.27727
## 41 0.6205765 3.1420 60 -1.386294 0 80 2.29757
## 42 1.4422020 3.6826 68 -1.386294 0 10 2.30757
## 43 0.5822156 3.8660 62 1.713798 0 0 2.32728
## 44 1.7715568 3.8969 61 -1.386294 0 6 2.37491
## 45 1.4861397 3.4095 66 1.749200 0 20 2.52172
## 46 1.6639261 3.3928 61 0.615186 0 15 2.55334
## 47 2.7278528 3.9954 79 1.879465 1 100 2.56879
## 48 1.1631508 4.0351 68 1.713798 0 40 2.56879
## 49 1.7457155 3.4980 43 -1.386294 0 0 2.59152
## 50 1.2208299 3.5681 70 1.373716 0 0 2.59152
## 51 1.0919233 3.9936 68 -1.386294 0 50 2.65676
## 52 1.6601310 4.2348 64 2.073172 0 0 2.67759
## 53 0.5128236 3.6336 64 1.492904 0 70 2.68444
## 54 2.1270405 4.1215 68 1.766442 0 40 2.69124
## 55 3.1535904 3.5160 59 -1.386294 0 5 2.70471
## 56 1.2669476 4.2801 66 2.122262 0 15 2.71800
## 57 0.9745596 2.8651 47 -1.386294 0 4 2.78809
## 58 0.4637340 3.7647 49 1.423108 0 0 2.79423
## 59 0.5423243 4.1782 70 0.438255 0 20 2.80639
## 60 1.0612565 3.8512 61 1.294727 0 40 2.81241
## 61 0.4574248 4.5245 73 2.326302 0 0 2.84200
## 62 1.9974177 3.7197 63 1.619388 1 40 2.85359
## 63 2.7757088 3.5249 72 -1.386294 0 95 2.85359
## 64 2.0347056 3.9170 66 2.008214 1 60 2.88200
## 65 2.0731719 3.6230 64 -1.386294 0 0 2.88200
## 66 1.4586150 3.8362 61 1.321756 0 20 2.88759
## 67 2.0228712 3.8785 68 1.783391 0 70 2.92047
## 68 2.1983351 4.0509 72 2.307573 0 10 2.96269
## 69 -0.4462871 4.4085 69 -1.386294 0 0 2.96269
## 70 1.1939225 4.7804 72 2.326302 0 5 2.97298
## 71 1.8640801 3.5932 60 -1.386294 1 60 3.01308
## 72 1.1600209 3.3411 77 1.749200 0 25 3.03735
## 73 1.2149127 3.8254 69 -1.386294 1 20 3.05636
## 74 1.8389611 3.2367 60 0.438255 1 90 3.07501
## 75 2.9992262 3.8491 69 -1.386294 1 20 3.27526
## 76 3.1411305 3.2638 68 -0.051293 1 50 3.33755
## 77 2.0108950 4.4338 72 2.122262 0 60 3.39283
## 78 2.5376572 4.3548 78 2.326302 0 10 3.43560
## 79 2.6483002 3.5821 69 -1.386294 1 70 3.45789
## 80 2.7794402 3.8232 63 -1.386294 0 50 3.51304
## 81 1.4678743 3.0704 66 0.559616 0 40 3.51601
## 82 2.5136561 3.4735 57 0.438255 0 60 3.53076
## 83 2.6130067 3.8888 77 -0.527633 1 30 3.56530
## 84 2.6775910 3.8384 65 1.115142 0 70 3.57094
## 85 1.5623463 3.7099 60 1.695616 0 30 3.58768
## 86 3.3028493 3.5190 64 -1.386294 1 60 3.63099
## 87 2.0241931 3.7317 58 1.638997 0 0 3.68009
## 88 1.7316555 3.3690 62 -1.386294 1 30 3.71235
## 89 2.8075938 4.7181 65 -1.386294 1 60 3.98434
## 90 1.5623463 3.6951 76 0.936093 1 75 3.99360
## 91 3.2464910 4.1018 68 -1.386294 0 0 4.02981
## 92 2.5329028 3.6776 61 1.348073 1 15 4.12955
## 93 2.8302678 3.8764 68 -1.386294 1 60 4.38515
## 94 3.8210036 3.8969 44 -1.386294 1 40 4.68444
## 95 2.9074474 3.3962 52 -1.386294 1 10 5.14312
## 96 2.8825636 3.7739 68 1.558145 1 80 5.47751
## 97 3.4719665 3.9750 68 0.438255 1 20 5.58293
lmod2<-lm(lpsa~lcavol+lweight+age+lbph+svi+pgg45, data=prostate2)
summary(lmod2)
##
## Call:
## lm(formula = lpsa ~ lcavol + lweight + age + lbph + svi + pgg45,
## data = prostate2)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.77711 -0.41708 0.00002 0.40676 1.59681
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.980085 0.830665 1.180 0.24116
## lcavol 0.545770 0.076431 7.141 2.31e-10 ***
## lweight 0.449450 0.168078 2.674 0.00890 **
## age -0.017470 0.010967 -1.593 0.11469
## lbph 0.105755 0.058191 1.817 0.07249 .
## svi 0.641666 0.219757 2.920 0.00442 **
## pgg45 0.003528 0.003068 1.150 0.25331
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.7061 on 90 degrees of freedom
## Multiple R-squared: 0.6493, Adjusted R-squared: 0.6259
## F-statistic: 27.77 on 6 and 90 DF, p-value: < 2.2e-16
from the summary, lcavol,lweight and svi still seem significant. pgg45 varibale seems insignificant.
prostate3<-prostate2[,-6]
lmod3<-lm(lpsa~lcavol+lweight+age+lbph+svi, data=prostate3)
summary(lmod3)
##
## Call:
## lm(formula = lpsa ~ lcavol + lweight + age + lbph + svi, data = prostate3)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.83505 -0.39396 0.00414 0.46336 1.57888
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.95100 0.83175 1.143 0.255882
## lcavol 0.56561 0.07459 7.583 2.77e-11 ***
## lweight 0.42369 0.16687 2.539 0.012814 *
## age -0.01489 0.01075 -1.385 0.169528
## lbph 0.11184 0.05805 1.927 0.057160 .
## svi 0.72095 0.20902 3.449 0.000854 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.7073 on 91 degrees of freedom
## Multiple R-squared: 0.6441, Adjusted R-squared: 0.6245
## F-statistic: 32.94 on 5 and 91 DF, p-value: < 2.2e-16
from the summary, lcavol,lweight and svi still seem significant. age varibale seems insignificant.
prostate4<-prostate3[,-3]
prostate4
## lcavol lweight lbph svi lpsa
## 1 -0.5798185 2.7695 -1.386294 0 -0.43078
## 2 -0.9942523 3.3196 -1.386294 0 -0.16252
## 3 -0.5108256 2.6912 -1.386294 0 -0.16252
## 4 -1.2039728 3.2828 -1.386294 0 -0.16252
## 5 0.7514161 3.4324 -1.386294 0 0.37156
## 6 -1.0498221 3.2288 -1.386294 0 0.76547
## 7 0.7371641 3.4735 0.615186 0 0.76547
## 8 0.6931472 3.5395 1.536867 0 0.85442
## 9 -0.7765288 3.5395 -1.386294 0 1.04732
## 10 0.2231436 3.2445 -1.386294 0 1.04732
## 11 0.2546422 3.6041 -1.386294 0 1.26695
## 12 -1.3470736 3.5987 1.266948 0 1.26695
## 13 1.6134299 3.0229 -1.386294 0 1.26695
## 14 1.4770487 2.9982 -1.386294 0 1.34807
## 15 1.2059708 3.4420 -1.386294 0 1.39872
## 16 1.5411591 3.0611 -1.386294 0 1.44692
## 17 -0.4155154 3.5160 1.244155 0 1.47018
## 18 2.2884862 3.6494 -1.386294 0 1.49290
## 19 -0.5621189 3.2677 -1.386294 0 1.55814
## 20 0.1823216 3.8254 1.658228 0 1.59939
## 21 1.1474025 3.4194 -1.386294 0 1.63900
## 22 2.0592388 3.5010 1.474763 0 1.65823
## 23 -0.5447272 3.3759 -0.798508 0 1.69562
## 24 1.7817091 3.4516 0.438255 0 1.71380
## 25 0.3852624 3.6674 1.599388 0 1.73166
## 26 1.4469190 3.1246 0.300105 0 1.76644
## 27 0.5128236 3.7197 -1.386294 0 1.80006
## 28 -0.4004776 3.8660 1.816452 0 1.81645
## 29 1.0402767 3.1290 0.223144 0 1.84845
## 30 2.4096442 3.3759 -1.386294 0 1.89462
## 31 0.2851789 4.0902 1.962908 0 1.92425
## 32 0.1823216 6.1076 1.704748 0 2.00821
## 33 1.2753628 3.0374 1.266948 0 2.00821
## 34 0.0099503 3.2677 -1.386294 0 2.02155
## 35 -0.0100503 3.2169 -1.386294 0 2.04769
## 36 1.3083328 4.1198 2.171337 0 2.08567
## 37 1.4231083 3.6571 -0.579818 0 2.15756
## 38 0.4574248 2.3749 -1.386294 0 2.19165
## 39 2.6609586 4.0851 1.373716 1 2.21375
## 40 0.7975072 3.0131 0.936093 0 2.27727
## 41 0.6205765 3.1420 -1.386294 0 2.29757
## 42 1.4422020 3.6826 -1.386294 0 2.30757
## 43 0.5822156 3.8660 1.713798 0 2.32728
## 44 1.7715568 3.8969 -1.386294 0 2.37491
## 45 1.4861397 3.4095 1.749200 0 2.52172
## 46 1.6639261 3.3928 0.615186 0 2.55334
## 47 2.7278528 3.9954 1.879465 1 2.56879
## 48 1.1631508 4.0351 1.713798 0 2.56879
## 49 1.7457155 3.4980 -1.386294 0 2.59152
## 50 1.2208299 3.5681 1.373716 0 2.59152
## 51 1.0919233 3.9936 -1.386294 0 2.65676
## 52 1.6601310 4.2348 2.073172 0 2.67759
## 53 0.5128236 3.6336 1.492904 0 2.68444
## 54 2.1270405 4.1215 1.766442 0 2.69124
## 55 3.1535904 3.5160 -1.386294 0 2.70471
## 56 1.2669476 4.2801 2.122262 0 2.71800
## 57 0.9745596 2.8651 -1.386294 0 2.78809
## 58 0.4637340 3.7647 1.423108 0 2.79423
## 59 0.5423243 4.1782 0.438255 0 2.80639
## 60 1.0612565 3.8512 1.294727 0 2.81241
## 61 0.4574248 4.5245 2.326302 0 2.84200
## 62 1.9974177 3.7197 1.619388 1 2.85359
## 63 2.7757088 3.5249 -1.386294 0 2.85359
## 64 2.0347056 3.9170 2.008214 1 2.88200
## 65 2.0731719 3.6230 -1.386294 0 2.88200
## 66 1.4586150 3.8362 1.321756 0 2.88759
## 67 2.0228712 3.8785 1.783391 0 2.92047
## 68 2.1983351 4.0509 2.307573 0 2.96269
## 69 -0.4462871 4.4085 -1.386294 0 2.96269
## 70 1.1939225 4.7804 2.326302 0 2.97298
## 71 1.8640801 3.5932 -1.386294 1 3.01308
## 72 1.1600209 3.3411 1.749200 0 3.03735
## 73 1.2149127 3.8254 -1.386294 1 3.05636
## 74 1.8389611 3.2367 0.438255 1 3.07501
## 75 2.9992262 3.8491 -1.386294 1 3.27526
## 76 3.1411305 3.2638 -0.051293 1 3.33755
## 77 2.0108950 4.4338 2.122262 0 3.39283
## 78 2.5376572 4.3548 2.326302 0 3.43560
## 79 2.6483002 3.5821 -1.386294 1 3.45789
## 80 2.7794402 3.8232 -1.386294 0 3.51304
## 81 1.4678743 3.0704 0.559616 0 3.51601
## 82 2.5136561 3.4735 0.438255 0 3.53076
## 83 2.6130067 3.8888 -0.527633 1 3.56530
## 84 2.6775910 3.8384 1.115142 0 3.57094
## 85 1.5623463 3.7099 1.695616 0 3.58768
## 86 3.3028493 3.5190 -1.386294 1 3.63099
## 87 2.0241931 3.7317 1.638997 0 3.68009
## 88 1.7316555 3.3690 -1.386294 1 3.71235
## 89 2.8075938 4.7181 -1.386294 1 3.98434
## 90 1.5623463 3.6951 0.936093 1 3.99360
## 91 3.2464910 4.1018 -1.386294 0 4.02981
## 92 2.5329028 3.6776 1.348073 1 4.12955
## 93 2.8302678 3.8764 -1.386294 1 4.38515
## 94 3.8210036 3.8969 -1.386294 1 4.68444
## 95 2.9074474 3.3962 -1.386294 1 5.14312
## 96 2.8825636 3.7739 1.558145 1 5.47751
## 97 3.4719665 3.9750 0.438255 1 5.58293
lmod4<-lm(lpsa~lcavol+lweight+lbph+svi, data=prostate4)
summary(lmod4)
##
## Call:
## lm(formula = lpsa ~ lcavol + lweight + lbph + svi, data = prostate4)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.82653 -0.42270 0.04362 0.47041 1.48530
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.14554 0.59747 0.244 0.80809
## lcavol 0.54960 0.07406 7.422 5.64e-11 ***
## lweight 0.39088 0.16600 2.355 0.02067 *
## lbph 0.09009 0.05617 1.604 0.11213
## svi 0.71174 0.20996 3.390 0.00103 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.7108 on 92 degrees of freedom
## Multiple R-squared: 0.6366, Adjusted R-squared: 0.6208
## F-statistic: 40.29 on 4 and 92 DF, p-value: < 2.2e-16
from the summary, lcavol,lweight and svi still seem significant.