For the fat data used in this chapter, a smaller model using only age, weight, height and abdom was proposed on the grounds that these predictors are either known by the individual or easily measured.
data(fat, package='faraway')
lmod_fat <- lm(brozek ~ age + weight + height + abdom, fat)
summary(lmod_fat)
Call:
lm(formula = brozek ~ age + weight + height + abdom, data = fat)
Residuals:
Min 1Q Median 3Q Max
-11.5105 -2.9346 0.0087 2.8942 9.4179
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -32.769636 6.541902 -5.009 1.04e-06 ***
age -0.007051 0.024342 -0.290 0.772
weight -0.123722 0.025046 -4.940 1.44e-06 ***
height -0.116694 0.082727 -1.411 0.160
abdom 0.889704 0.067267 13.226 < 2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 4.126 on 247 degrees of freedom
Multiple R-squared: 0.7211, Adjusted R-squared: 0.7166
F-statistic: 159.7 on 4 and 247 DF, p-value: < 2.2e-16
lmod_fat_full <- lm(brozek ~ ., fat)
We used anova to compare the full model, lmod_fat_full, and the smaller model, lmod_fat.
anova(lmod_fat, lmod_fat_full)
Analysis of Variance Table
Model 1: brozek ~ age + weight + height + abdom
Model 2: brozek ~ siri + density + age + weight + height + adipos + free +
neck + chest + abdom + hip + thigh + knee + ankle + biceps +
forearm + wrist
Res.Df RSS Df Sum of Sq F Pr(>F)
1 247 4205.0
2 234 6.8 13 4198.2 11096 < 2.2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Ans: Since the p-value is 2.2e-16, which is < 0.05, we reject null hypothesis. That means there is a statistical difference between the two models.
coef(lmod_fat_full)
(Intercept) siri density age weight height adipos free neck chest
12.1524012873 0.8884084596 -9.8456305062 -0.0005267739 0.0084855268 -0.0005458648 -0.0153248230 -0.0097387615 0.0005001797 0.0021454206
abdom hip thigh knee ankle biceps forearm wrist
0.0014463928 -0.0044514031 0.0156925881 -0.0252125537 0.0027789569 -0.0147133907 0.0149983439 0.0326518280
The results of the interval for median predictor values is [18.65831, 19.33247]
predict(lmod_fat_full, new=data.frame(t(x0)), interval="prediction")
fit lwr upr
1 18.99539 18.65831 19.33247
The 95% prediction interval for median predictor values is [18.96985, 19.02094]
predict(lmod_fat_full, new=data.frame(t(x0)), interval="confidence")
fit lwr upr
1 18.99539 18.96985 19.02094
The 2 intervals do not differ by a significant amount.
fat_2 <- fat[25:50, c(4,5,6,11)]
predict(lmod_fat, new=data.frame(fat_2), interval="predict")
fit lwr upr
25 8.298418 0.08103281 16.51580
26 9.903086 1.71322935 18.09294
27 9.216292 1.01552526 17.41706
28 19.740864 11.48411519 27.99761
29 8.747253 0.49903716 16.99547
30 13.376012 5.19875861 21.55326
31 14.797935 6.62619278 22.96968
32 14.065015 5.88455369 22.24548
33 8.315872 0.10061987 16.53112
34 21.038046 12.84083747 29.23525
35 30.623842 22.38906643 38.85862
36 36.201628 27.83841989 44.56484
37 23.528151 15.36925601 31.68705
38 22.473944 14.31151961 30.63637
39 45.310482 36.47199166 54.14897
40 30.120799 21.92013711 38.32146
41 38.663109 30.34468421 46.98153
42 30.910811 20.40248479 41.41914
43 30.810797 22.61170204 39.00989
44 25.164766 16.99699489 33.33254
45 11.141535 2.93651578 19.34655
46 10.568910 2.38586739 18.75195
47 8.125807 -0.07361926 16.32523
48 10.999713 2.82955826 19.16987
49 16.325612 8.12181572 24.52941
50 5.970276 -2.26886915 14.20942
Ans: There are 2 observations that has lower bound as negative values. They are 47 and 50. Fat can’t be negative.
fat_w_excl <- fat[c(25:46, 48, 49),]
predict(lmod_fat, new=data.frame(fat_w_excl), interval="predict")
fit lwr upr
25 8.298418 0.08103281 16.51580
26 9.903086 1.71322935 18.09294
27 9.216292 1.01552526 17.41706
28 19.740864 11.48411519 27.99761
29 8.747253 0.49903716 16.99547
30 13.376012 5.19875861 21.55326
31 14.797935 6.62619278 22.96968
32 14.065015 5.88455369 22.24548
33 8.315872 0.10061987 16.53112
34 21.038046 12.84083747 29.23525
35 30.623842 22.38906643 38.85862
36 36.201628 27.83841989 44.56484
37 23.528151 15.36925601 31.68705
38 22.473944 14.31151961 30.63637
39 45.310482 36.47199166 54.14897
40 30.120799 21.92013711 38.32146
41 38.663109 30.34468421 46.98153
42 30.910811 20.40248479 41.41914
43 30.810797 22.61170204 39.00989
44 25.164766 16.99699489 33.33254
45 11.141535 2.93651578 19.34655
46 10.568910 2.38586739 18.75195
48 10.999713 2.82955826 19.16987
49 16.325612 8.12181572 24.52941
There seems to be no big difference.