library(faraway)
data(seatpos)
lm <- lm(hipcenter ~ . , seatpos)
summary(lm)
##
## Call:
## lm(formula = hipcenter ~ ., data = seatpos)
##
## Residuals:
## Min 1Q Median 3Q Max
## -73.827 -22.833 -3.678 25.017 62.337
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 436.43213 166.57162 2.620 0.0138 *
## Age 0.77572 0.57033 1.360 0.1843
## Weight 0.02631 0.33097 0.080 0.9372
## HtShoes -2.69241 9.75304 -0.276 0.7845
## Ht 0.60134 10.12987 0.059 0.9531
## Seated 0.53375 3.76189 0.142 0.8882
## Arm -1.32807 3.90020 -0.341 0.7359
## Thigh -1.14312 2.66002 -0.430 0.6706
## Leg -6.43905 4.71386 -1.366 0.1824
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 37.72 on 29 degrees of freedom
## Multiple R-squared: 0.6866, Adjusted R-squared: 0.6001
## F-statistic: 7.94 on 8 and 29 DF, p-value: 1.306e-05
The coefficient of leg is -6.43905, which means when the leg increases by 1, the outcome will decrease 6.43905. Andleg has a p-value of 0.1824, which means that it is not statistically significant at 0.05.
x <- model.matrix(lm)
mean <- apply(x,2,mean)
pred <- predict(lm, data.frame(t(mean)), interval="prediction")
pred
## fit lwr upr
## 1 -164.8849 -243.04 -86.72972
#install.packages("leaps")
library(leaps)
b <- regsubsets(hipcenter ~ ., seatpos)
rs <- summary(b)
rs$which
## (Intercept) Age Weight HtShoes Ht Seated Arm Thigh Leg
## 1 TRUE FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE
## 2 TRUE FALSE FALSE FALSE TRUE FALSE FALSE FALSE TRUE
## 3 TRUE TRUE FALSE FALSE TRUE FALSE FALSE FALSE TRUE
## 4 TRUE TRUE FALSE TRUE FALSE FALSE FALSE TRUE TRUE
## 5 TRUE TRUE FALSE TRUE FALSE FALSE TRUE TRUE TRUE
## 6 TRUE TRUE FALSE TRUE FALSE TRUE TRUE TRUE TRUE
## 7 TRUE TRUE TRUE TRUE FALSE TRUE TRUE TRUE TRUE
## 8 TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
AIC <- 38 * log(rs$rss/38) + (2:9) * 2
plot(AIC ~ I(1:8), ylab="AIC", xlab="Number of Predictors")
We can see that the model
hipcenter ~ Age + Ht + Leg has the lowest AIC.
lm2 <- lm(hipcenter ~ Age + Ht + Leg, seatpos)
summary(lm2)
##
## Call:
## lm(formula = hipcenter ~ Age + Ht + Leg, data = seatpos)
##
## Residuals:
## Min 1Q Median 3Q Max
## -79.715 -22.758 -4.102 21.394 60.576
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 452.1976 100.9482 4.480 8.04e-05 ***
## Age 0.5807 0.3790 1.532 0.1347
## Ht -2.3254 1.2545 -1.854 0.0725 .
## Leg -6.7390 4.1050 -1.642 0.1099
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 35.12 on 34 degrees of freedom
## Multiple R-squared: 0.6814, Adjusted R-squared: 0.6533
## F-statistic: 24.24 on 3 and 34 DF, p-value: 1.426e-08
Now the coefficient of leg is -6.7390, which is smaller than that in the original model, and it still has a negative impact on the outcome. The p-value is 0.1099, which is smaller than that in the original model but it is still not statistically significant at 0.05.
x2 <- model.matrix(lm2)
mean2 <- apply(x2,2,mean)
pred2 <- predict(lm2, data.frame(t(mean2)), interval="prediction")
pred2
## fit lwr upr
## 1 -164.8849 -237.192 -92.57771
The prediction value is quite same.