Use the fat data, fitting the model described in Section 4.2.
#8.9 Fitting as per 4.2 section
data(fat, package = "faraway")
lmod <- lm(brozek ~ age + weight + height + neck + chest + abdom + hip + thigh + knee + ankle + biceps + forearm + wrist, data=fat)
summary(lmod)
##
## Call:
## lm(formula = brozek ~ age + weight + height + neck + chest +
## abdom + hip + thigh + knee + ankle + biceps + forearm + wrist,
## data = fat)
##
## Residuals:
## Min 1Q Median 3Q Max
## -10.264 -2.572 -0.097 2.898 9.327
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -15.29255 16.06992 -0.952 0.34225
## age 0.05679 0.02996 1.895 0.05929 .
## weight -0.08031 0.04958 -1.620 0.10660
## height -0.06460 0.08893 -0.726 0.46830
## neck -0.43754 0.21533 -2.032 0.04327 *
## chest -0.02360 0.09184 -0.257 0.79740
## abdom 0.88543 0.08008 11.057 < 2e-16 ***
## hip -0.19842 0.13516 -1.468 0.14341
## thigh 0.23190 0.13372 1.734 0.08418 .
## knee -0.01168 0.22414 -0.052 0.95850
## ankle 0.16354 0.20514 0.797 0.42614
## biceps 0.15280 0.15851 0.964 0.33605
## forearm 0.43049 0.18445 2.334 0.02044 *
## wrist -1.47654 0.49552 -2.980 0.00318 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.988 on 238 degrees of freedom
## Multiple R-squared: 0.749, Adjusted R-squared: 0.7353
## F-statistic: 54.63 on 13 and 238 DF, p-value: < 2.2e-16
Fit the same model but now using Huber’s robust method. Comment on any substantial differences between this model and the least squares fit.
library(MASS)
rlmMod <- rlm(brozek ~ age + weight + height + neck + chest + abdom + hip + thigh + knee + ankle + biceps + forearm + wrist, data=fat)
summary(rlmMod)
##
## Call: rlm(formula = brozek ~ age + weight + height + neck + chest +
## abdom + hip + thigh + knee + ankle + biceps + forearm + wrist,
## data = fat)
## Residuals:
## Min 1Q Median 3Q Max
## -10.3964 -2.7352 -0.1171 2.8008 9.4446
##
## Coefficients:
## Value Std. Error t value
## (Intercept) -11.3460 17.1216 -0.6627
## age 0.0650 0.0319 2.0368
## weight -0.0643 0.0528 -1.2163
## height -0.0625 0.0948 -0.6595
## neck -0.4553 0.2294 -1.9846
## chest -0.0256 0.0978 -0.2614
## abdom 0.8778 0.0853 10.2891
## hip -0.2142 0.1440 -1.4872
## thigh 0.2632 0.1425 1.8473
## knee -0.1076 0.2388 -0.4505
## ankle 0.1815 0.2186 0.8306
## biceps 0.1367 0.1689 0.8091
## forearm 0.4152 0.1965 2.1126
## wrist -1.5739 0.5279 -2.9812
##
## Residual standard error: 4.073 on 238 degrees of freedom
Comments…the intercept is lower and each of the predictors (generally) have marginally greater effect on the result. The most predictive predictors in the original lm model are adom, wrist and forearm, and they are as well in the huber model, but what varies are some of the lesser predictors. In the huber model, age, appears to be significant, while it doesn’t in the basic lm model. The std error is actually better in the original, but only modestly so.
Identify which two cases have the lowest weights in the Huber fit. What is unusual about these two points?
wts <- rlmMod$w
names(wts) <- row.names(fat)
head(sort(wts))
## 224 207 39 231 225 81
## 0.5269652 0.5800712 0.5987751 0.6260611 0.6304030 0.6367342
Cases 224 and 207 have lowest weights.
fat[224,]
## brozek siri density age weight height adipos free neck chest abdom
## 224 6.1 5.2 1.0874 55 142.25 67.25 22.2 133.6 35.2 92.7 82.8
## hip thigh knee ankle biceps forearm wrist
## 224 91.9 54.4 35.2 22.5 29.4 26.8 17
fat[207,]
## brozek siri density age weight height adipos free neck chest abdom
## 207 31.7 32.9 1.025 44 166 65.5 27.2 113.5 39.1 100.6 93.9
## hip thigh knee ankle biceps forearm wrist
## 207 100.1 58.9 37.6 21.4 33.1 29.5 17.3
head(fat[order(fat$brozek),])
## brozek siri density age weight height adipos free neck chest abdom
## 182 0.0 0.0 1.1089 40 118.50 68.00 18.1 118.5 33.8 79.3 69.4
## 172 1.9 0.7 1.0983 35 125.75 65.50 20.6 123.4 34.0 90.8 75.0
## 171 4.1 3.0 1.0926 35 152.25 67.75 23.4 146.1 37.0 92.2 81.9
## 26 4.6 3.7 1.0911 27 159.25 71.50 21.9 151.9 35.7 89.6 79.7
## 29 4.7 3.7 1.0910 27 133.25 64.75 22.4 127.0 36.4 93.5 73.9
## 55 4.9 3.9 1.0906 42 136.25 67.50 21.1 129.6 37.8 87.6 77.6
## hip thigh knee ankle biceps forearm wrist
## 182 85.0 47.2 33.5 20.2 27.7 24.6 16.5
## 172 89.2 50.0 34.8 22.0 24.8 25.9 16.9
## 171 92.8 54.7 36.2 22.1 30.4 27.4 17.7
## 26 96.5 55.0 36.7 22.5 29.9 28.2 17.7
## 29 88.5 50.1 34.5 21.3 30.5 27.9 17.2
## 55 88.6 51.9 34.9 22.5 27.7 27.5 18.5
head(fat[order(fat$brozek, decreasing = TRUE),])
## brozek siri density age weight height adipos free neck chest abdom
## 216 45.1 47.5 0.9950 51 219.00 64.00 37.6 120.2 41.2 119.8 122.1
## 36 38.2 40.1 1.0101 49 191.75 65.00 32.0 118.4 38.4 118.5 113.1
## 192 36.5 38.1 1.0140 42 244.25 76.00 29.8 155.2 41.8 115.2 113.7
## 169 34.7 34.3 1.0180 35 228.25 69.50 33.3 149.3 40.4 114.9 115.9
## 39 33.8 35.2 1.0202 46 363.15 72.25 48.9 240.5 51.2 136.2 148.1
## 242 33.6 35.0 1.0207 65 224.50 68.25 33.9 149.2 38.8 119.6 118.0
## hip thigh knee ankle biceps forearm wrist
## 216 112.8 62.5 36.9 23.6 34.7 29.1 18.4
## 36 113.8 61.9 38.3 21.9 32.0 29.8 17.0
## 192 112.4 68.5 45.0 25.5 37.1 31.2 19.9
## 169 111.9 74.4 40.6 24.0 36.1 31.8 18.8
## 39 147.7 87.3 49.1 29.6 45.0 29.0 21.4
## 242 114.3 61.3 42.1 23.4 34.9 30.1 19.4
colMeans(fat)
## brozek siri density age weight height
## 18.938492 19.150794 1.055574 44.884921 178.924405 70.148810
## adipos free neck chest abdom hip
## 25.436905 143.713889 37.992063 100.824206 92.555952 99.904762
## thigh knee ankle biceps forearm wrist
## 59.405952 38.590476 23.102381 32.273413 28.663889 18.229762
What is interesting is they are near the high and low end of the values for the brozek weight but note, neither are all that close to being the lowest for either measure. What is also unusual, is that if one looks at colMeans for the fat dataset, the values for rows 224 and 207 do not have values that seem to vary all that much from the mean, they seem “relatively” typical.
Plot weight (of the man) against height. Identify the two outlying cases. Are these the same as those identified in the previous question? Discuss.
plot(fat$weight ~ fat$height)
identify(fat$weight ~ fat$height)
## integer(0)
Points 39, 42 were identified as outliers. One has the lowest height and the other the highest weight and are substantially separate from the rest of the points.
But, these are not the two lowest weighted points identified in the prior question (b), though “42” with lowest height is the 3rd lowest physical weight. But 39, with highest physical weight, is actually given a model weight of 1, meaning it is giving full weight in the model.
fat[39,]
## brozek siri density age weight height adipos free neck chest abdom
## 39 33.8 35.2 1.0202 46 363.15 72.25 48.9 240.5 51.2 136.2 148.1
## hip thigh knee ankle biceps forearm wrist
## 39 147.7 87.3 49.1 29.6 45 29 21.4
fat[42,]
## brozek siri density age weight height adipos free neck chest abdom
## 42 31.7 32.9 1.025 44 205 29.5 29.9 140.1 36.6 106 104.3
## hip thigh knee ankle biceps forearm wrist
## 42 115.5 70.6 42.5 23.7 33.6 28.7 17.4
wts[39]
## 39
## 0.5987751
wts[42]
## 42
## 1
One might expect these to have the lowest weights in the model. But being a perceived outlier doesn’t necessarily mean that a case isn’t important to the model (at least from a Huber perspective). The huber algorithm only weights values as less important, if they vary substantially from the mean square error.
So if we take the lowest and highest errors (prediction less actual y), and then sort the errors, we see that predictions row 207 and 224 are the farthest away from the true value.
min(predict(lmod) - fat$brozek)
## [1] -9.32674
max(predict(lmod) - fat$brozek)
## [1] 10.26353
errors <- predict(lmod) - fat$brozek
errors[order(errors)]
## 207 81 82 128 135
## -9.326740246 -8.480250444 -8.310690775 -7.937236953 -7.758955591
## 192 249 119 121 33
## -7.323569740 -7.222483537 -7.057953104 -6.922060347 -6.374215955
## 3 24 115 148 38
## -6.252231350 -6.099412184 -6.068221177 -6.048565123 -5.972179308
## 76 62 216 156 202
## -5.905070445 -5.781976643 -5.760728538 -5.753483484 -5.665321123
## 153 23 67 195 208
## -5.607709967 -5.607066085 -5.600336102 -5.567206164 -5.496702951
## 25 86 127 84 17
## -5.408701711 -5.272933099 -5.261188961 -5.239335593 -5.138519868
## 44 215 200 137 104
## -5.128290958 -4.892854878 -4.748428662 -4.747473150 -4.740088194
## 235 134 28 109 197
## -4.700175029 -4.590060672 -4.547318115 -4.520872194 -4.513017439
## 71 120 252 139 143
## -4.466810428 -4.260100079 -4.176970668 -4.051587183 -4.013073267
## 213 6 175 167 46
## -3.856960302 -3.709195183 -3.689518678 -3.672680204 -3.661451588
## 144 240 173 141 66
## -3.658081445 -3.584511978 -3.541652183 -3.538373408 -3.426388034
## 160 228 18 138 234
## -3.292389956 -3.191149954 -3.148293222 -3.055496422 -3.002179264
## 117 94 237 63 100
## -2.949819930 -2.928179976 -2.910291908 -2.894011636 -2.773960791
## 13 47 78 59 122
## -2.766336569 -2.755563313 -2.620682562 -2.620063863 -2.594752168
## 103 196 101 132 7
## -2.531944167 -2.485774218 -2.451910868 -2.313650732 -2.310623343
## 157 36 243 217 241
## -2.308508365 -2.239497617 -2.220884542 -2.016906272 -1.956784895
## 65 129 74 212 178
## -1.943050272 -1.891207796 -1.825364485 -1.793654500 -1.624201324
## 146 179 206 181 10
## -1.621389342 -1.618837460 -1.517487855 -1.412340955 -1.394875743
## 251 5 105 114 246
## -1.371373735 -1.355656321 -1.312001824 -1.273753572 -1.215500947
## 88 168 166 96 113
## -1.198519465 -1.165096790 -1.129126835 -1.012550774 -0.994287810
## 191 145 203 151 199
## -0.979269809 -0.936836446 -0.912585051 -0.783202256 -0.768405351
## 61 219 159 123 92
## -0.699878050 -0.629847610 -0.615146285 -0.614063380 -0.577461426
## 111 102 42 110 58
## -0.461265211 -0.438073261 -0.397557661 -0.348638044 -0.315270117
## 165 116 174 136 106
## -0.283844742 -0.266894612 -0.242014491 -0.226214261 -0.122204790
## 37 99 40 35 130
## -0.039775918 -0.004164290 0.003380545 0.032541671 0.085673834
## 90 154 164 131 244
## 0.090389476 0.103646852 0.161977787 0.188430945 0.198381292
## 77 85 245 226 185
## 0.272175438 0.278121162 0.336389040 0.352395229 0.376412091
## 118 247 19 155 149
## 0.425856244 0.441460975 0.611702470 0.688095922 0.733336556
## 70 56 205 233 124
## 0.771006154 0.802323028 0.886750166 0.929834140 1.067504958
## 133 176 91 27 64
## 1.071808823 1.089736027 1.110899192 1.143719656 1.296101092
## 50 8 169 218 150
## 1.321479840 1.346472137 1.354520090 1.371188704 1.379469960
## 4 188 79 125 60
## 1.397600909 1.463068549 1.470454965 1.535017900 1.543936943
## 242 68 239 21 193
## 1.557764745 1.740667507 1.759099800 1.818916398 1.838677020
## 194 15 190 189 16
## 1.850908182 1.905346337 1.943330571 1.971224424 1.972940322
## 198 11 69 229 161
## 1.992984700 2.063488268 2.131570179 2.162792720 2.228433106
## 93 41 186 34 73
## 2.231387704 2.250878879 2.289392886 2.291908594 2.296554594
## 163 29 152 31 87
## 2.365234487 2.369271885 2.417510053 2.422538099 2.431043322
## 43 214 177 2 52
## 2.487680848 2.530132369 2.535439065 2.559765439 2.609298207
## 30 220 162 142 45
## 2.628319455 2.652202377 2.826430491 2.833008837 2.855322308
## 147 201 126 230 248
## 3.039754680 3.130436174 3.230399417 3.367332565 3.370326238
## 170 210 55 1 108
## 3.383059932 3.426305657 3.474885433 3.564508677 3.564998529
## 236 14 48 72 187
## 3.566693218 3.596004602 3.605176496 3.660259456 3.729414039
## 209 54 227 112 51
## 3.863018374 3.936430547 3.957901882 4.120180330 4.124763307
## 89 49 222 83 183
## 4.220435703 4.233347035 4.235489955 4.260587556 4.361475626
## 22 184 26 12 75
## 4.406119506 4.406373970 4.497675174 4.675528543 4.754878386
## 32 232 98 57 9
## 4.875461736 4.894576083 5.022766447 5.034121476 5.113356443
## 158 95 211 182 223
## 5.175276754 5.231336135 5.258555963 5.309336175 5.335983091
## 80 53 107 20 180
## 5.548460835 5.643654028 6.093045743 6.140421251 6.281388794
## 238 97 172 250 140
## 6.667960176 6.723220533 6.925153108 7.116068918 7.726238505
## 221 171 39 204 231
## 7.831497228 7.911169314 8.170497848 8.371973433 8.609910081
## 225 224
## 8.736079635 10.263532226