(Interpret and comments are in every step, conlusion is at the last)
extract data
#install.packages("mplot")
library(mplot)
data(bodyfat)
Fat = bodyfat$Bodyfat
Neck = bodyfat$Neck
Chest = bodyfat$Chest
Abdo = bodyfat$Abdo
Hip = bodyfat$Hip
Thigh = bodyfat$Thigh
Knee = bodyfat$Knee
Ankle = bodyfat$Ankle
Bic = bodyfat$Bic
Fore = bodyfat$Fore
Wrist = bodyfat$Wrist
Age = bodyfat$Age
Height = bodyfat$Height
Weight = bodyfat$Weight
Bodyfat<-data.frame(Fat, Age, Height, Weight, Neck, Chest, Abdo, Hip, Thigh, Knee, Ankle, Bic, Fore, Wrist)cor()
corResult<-cor(Bodyfat)
corResult[,1]## Fat Age Height Weight Neck Chest Abdo Hip
## 1.0000000 0.2543748 0.1513761 0.6615254 0.4647812 0.7379384 0.8382541 0.6273400
## Thigh Knee Ankle Bic Fore Wrist
## 0.5225635 0.4876076 0.3595300 0.4915120 0.3403827 0.4200907
the correlation between the body fat and other parameters, some of them have strong correlation, the smallest is Age and Height.
build the model
Model<-lm(Fat ~ Age+Height+Weight+Neck+Chest+Abdo+Hip+Thigh+Knee+Ankle+Bic+Fore+Wrist)
summary(Model)##
## Call:
## lm(formula = Fat ~ Age + Height + Weight + Neck + Chest + Abdo +
## Hip + Thigh + Knee + Ankle + Bic + Fore + Wrist)
##
## Residuals:
## Min 1Q Median 3Q Max
## -9.3767 -2.5514 -0.1723 2.6391 9.1393
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -52.553646 40.062856 -1.312 0.1922
## Age 0.009288 0.043470 0.214 0.8312
## Height 0.258388 0.320810 0.805 0.4223
## Weight -0.271016 0.243569 -1.113 0.2682
## Neck -0.592669 0.322125 -1.840 0.0684 .
## Chest 0.090883 0.164738 0.552 0.5822
## Abdo 0.995184 0.123072 8.086 7.29e-13 ***
## Hip -0.141981 0.204533 -0.694 0.4890
## Thigh 0.101272 0.200714 0.505 0.6148
## Knee -0.096682 0.325889 -0.297 0.7673
## Ankle -0.048017 0.507695 -0.095 0.9248
## Bic 0.075332 0.244105 0.309 0.7582
## Fore 0.412107 0.272144 1.514 0.1327
## Wrist -0.263067 0.745145 -0.353 0.7247
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.081 on 114 degrees of freedom
## Multiple R-squared: 0.7519, Adjusted R-squared: 0.7236
## F-statistic: 26.57 on 13 and 114 DF, p-value: < 2.2e-16
Abdo’s coefficient was significantly not 0 at the level of P <0.05.(Neck should be considered, too)
Multiple R-squared: 0.7519 shows that predict variable explained 75.19% of the variance in bodyfat.
Residual standard error: 4.081 shows that the average estimation error of bodyfat is 4.081% in this model.
We need optimize it later.
diagnose
library(car)## Loading required package: carData
plot(Model)qqPlot(Model,id.method='identify',simulate = TRUE,labels=row.names(Bodyfat),main='Q-Q plot')## [1] 60 73
Residuals vs Fitted: normal distribution —-> OK
Normal QQ: combine with qqplot, Within the confidence interval —-> OK
Scale-Location: The variance is basically a constant —-> OK
Residuals vs Leverage: cook’s distance is in the 0.5 —-> OK
independence
durbinWatsonTest(Model)## lag Autocorrelation D-W Statistic p-value
## 1 -0.07255836 2.121203 0.55
## Alternative hypothesis: rho != 0
p=0.49>0.05. No autocorrelation.
homoscedasticity
ncvTest(Model)## Non-constant Variance Score Test
## Variance formula: ~ fitted.values
## Chisquare = 0.8365218, Df = 1, p = 0.36039
p value shows the variance is constant.
VIF multicollinearity
vif(Model)## Age Height Weight Neck Chest Abdo Hip Thigh
## 2.256602 4.323426 64.038005 3.849251 13.233597 10.759846 12.318911 7.432930
## Knee Ankle Bic Fore Wrist
## 4.506660 3.382574 4.086372 2.344915 3.662724
Weight is much bugger than 10 —-> delete it
Model<-lm(Fat ~ Age+Height+Neck+Chest+Abdo+Hip+Thigh+Knee+Ankle+Bic+Fore+Wrist)
summary(Model)##
## Call:
## lm(formula = Fat ~ Age + Height + Neck + Chest + Abdo + Hip +
## Thigh + Knee + Ankle + Bic + Fore + Wrist)
##
## Residuals:
## Min 1Q Median 3Q Max
## -9.3267 -2.4316 -0.1254 2.7091 9.3941
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -10.31143 12.80995 -0.805 0.4225
## Age 0.01850 0.04272 0.433 0.6658
## Height -0.01202 0.20965 -0.057 0.9544
## Neck -0.67730 0.31334 -2.162 0.0327 *
## Chest -0.03454 0.12026 -0.287 0.7745
## Abdo 0.94212 0.11358 8.295 2.33e-13 ***
## Hip -0.25301 0.17872 -1.416 0.1596
## Thigh 0.06148 0.19771 0.311 0.7564
## Knee -0.14806 0.32293 -0.458 0.6475
## Ankle -0.21031 0.48680 -0.432 0.6665
## Bic 0.03261 0.24132 0.135 0.8927
## Fore 0.34778 0.26621 1.306 0.1940
## Wrist -0.39793 0.73598 -0.541 0.5898
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.085 on 115 degrees of freedom
## Multiple R-squared: 0.7492, Adjusted R-squared: 0.723
## F-statistic: 28.62 on 12 and 115 DF, p-value: < 2.2e-16
vif(Model)## Age Height Neck Chest Abdo Hip Thigh Knee
## 2.174732 1.842510 3.634663 7.037885 9.144421 9.386768 7.196930 4.416175
## Ankle Bic Fore Wrist
## 3.103387 3.985286 2.239099 3.565813
VIF smaller than 10—->OK
Computing best subsets regression
1.regression
2.plot
3.Model selection criteria: Adjusted R2, Cp and BIC
library(leaps)
leaps1<-regsubsets(Fat ~ Age+Height+Neck+Chest+Abdo+Hip+Thigh+Knee+Ankle+Bic+Fore+Wrist, data = Bodyfat, nvmax = 12)
leaps2<-regsubsets(Fat ~ Age+Height+Neck+Chest+Abdo+Hip+Thigh+Knee+Ankle+Bic+Fore+Wrist, data = Bodyfat, nbest = 12)
res.sum <- summary(leaps1)
res.sum## Subset selection object
## Call: regsubsets.formula(Fat ~ Age + Height + Neck + Chest + Abdo +
## Hip + Thigh + Knee + Ankle + Bic + Fore + Wrist, data = Bodyfat,
## nvmax = 12)
## 12 Variables (and intercept)
## Forced in Forced out
## Age FALSE FALSE
## Height FALSE FALSE
## Neck FALSE FALSE
## Chest FALSE FALSE
## Abdo FALSE FALSE
## Hip FALSE FALSE
## Thigh FALSE FALSE
## Knee FALSE FALSE
## Ankle FALSE FALSE
## Bic FALSE FALSE
## Fore FALSE FALSE
## Wrist FALSE FALSE
## 1 subsets of each size up to 12
## Selection Algorithm: exhaustive
## Age Height Neck Chest Abdo Hip Thigh Knee Ankle Bic Fore Wrist
## 1 ( 1 ) " " " " " " " " "*" " " " " " " " " " " " " " "
## 2 ( 1 ) " " " " "*" " " "*" " " " " " " " " " " " " " "
## 3 ( 1 ) " " " " "*" " " "*" "*" " " " " " " " " " " " "
## 4 ( 1 ) " " " " "*" " " "*" "*" " " " " " " " " "*" " "
## 5 ( 1 ) " " " " "*" " " "*" "*" " " " " "*" " " "*" " "
## 6 ( 1 ) " " " " "*" " " "*" "*" " " " " "*" " " "*" "*"
## 7 ( 1 ) " " " " "*" " " "*" "*" " " "*" "*" " " "*" "*"
## 8 ( 1 ) "*" " " "*" " " "*" "*" " " "*" "*" " " "*" "*"
## 9 ( 1 ) "*" " " "*" " " "*" "*" "*" "*" "*" " " "*" "*"
## 10 ( 1 ) "*" " " "*" "*" "*" "*" "*" "*" "*" " " "*" "*"
## 11 ( 1 ) "*" " " "*" "*" "*" "*" "*" "*" "*" "*" "*" "*"
## 12 ( 1 ) "*" "*" "*" "*" "*" "*" "*" "*" "*" "*" "*" "*"
data.frame(
Adj.R2 = which.max(res.sum$adjr2),
CP = which.min(res.sum$cp),
BIC = which.min(res.sum$bic)
)## Adj.R2 CP BIC
## 1 4 3 3
plot(leaps1,scale = 'adjr2') leaps1 for maximum size of subsets to examine, shows choose 4 when consider R2 and choose 3 for CP and BIC.
4 is Neck+Abdo+Hip+Fore
3 is -Fore
res.sum <- summary(leaps2)
res.sum## Subset selection object
## Call: regsubsets.formula(Fat ~ Age + Height + Neck + Chest + Abdo +
## Hip + Thigh + Knee + Ankle + Bic + Fore + Wrist, data = Bodyfat,
## nbest = 12)
## 12 Variables (and intercept)
## Forced in Forced out
## Age FALSE FALSE
## Height FALSE FALSE
## Neck FALSE FALSE
## Chest FALSE FALSE
## Abdo FALSE FALSE
## Hip FALSE FALSE
## Thigh FALSE FALSE
## Knee FALSE FALSE
## Ankle FALSE FALSE
## Bic FALSE FALSE
## Fore FALSE FALSE
## Wrist FALSE FALSE
## 12 subsets of each size up to 8
## Selection Algorithm: exhaustive
## Age Height Neck Chest Abdo Hip Thigh Knee Ankle Bic Fore Wrist
## 1 ( 1 ) " " " " " " " " "*" " " " " " " " " " " " " " "
## 1 ( 2 ) " " " " " " "*" " " " " " " " " " " " " " " " "
## 1 ( 3 ) " " " " " " " " " " "*" " " " " " " " " " " " "
## 1 ( 4 ) " " " " " " " " " " " " "*" " " " " " " " " " "
## 1 ( 5 ) " " " " " " " " " " " " " " " " " " "*" " " " "
## 1 ( 6 ) " " " " " " " " " " " " " " "*" " " " " " " " "
## 1 ( 7 ) " " " " "*" " " " " " " " " " " " " " " " " " "
## 1 ( 8 ) " " " " " " " " " " " " " " " " " " " " " " "*"
## 1 ( 9 ) " " " " " " " " " " " " " " " " "*" " " " " " "
## 1 ( 10 ) " " " " " " " " " " " " " " " " " " " " "*" " "
## 1 ( 11 ) "*" " " " " " " " " " " " " " " " " " " " " " "
## 1 ( 12 ) " " "*" " " " " " " " " " " " " " " " " " " " "
## 2 ( 1 ) " " " " "*" " " "*" " " " " " " " " " " " " " "
## 2 ( 2 ) " " " " " " " " "*" "*" " " " " " " " " " " " "
## 2 ( 3 ) " " " " " " " " "*" " " " " " " " " " " " " "*"
## 2 ( 4 ) " " " " " " " " "*" " " " " " " "*" " " " " " "
## 2 ( 5 ) " " " " " " " " "*" " " " " "*" " " " " " " " "
## 2 ( 6 ) " " " " " " " " "*" " " "*" " " " " " " " " " "
## 2 ( 7 ) " " "*" " " " " "*" " " " " " " " " " " " " " "
## 2 ( 8 ) " " " " " " " " "*" " " " " " " " " "*" " " " "
## 2 ( 9 ) " " " " " " "*" "*" " " " " " " " " " " " " " "
## 2 ( 10 ) "*" " " " " " " "*" " " " " " " " " " " " " " "
## 2 ( 11 ) " " " " " " " " "*" " " " " " " " " " " "*" " "
## 2 ( 12 ) " " " " "*" "*" " " " " " " " " " " " " " " " "
## 3 ( 1 ) " " " " "*" " " "*" "*" " " " " " " " " " " " "
## 3 ( 2 ) " " " " " " " " "*" "*" " " " " " " " " " " "*"
## 3 ( 3 ) " " " " "*" " " "*" " " " " " " "*" " " " " " "
## 3 ( 4 ) " " " " "*" " " "*" " " " " "*" " " " " " " " "
## 3 ( 5 ) " " " " "*" " " "*" " " "*" " " " " " " " " " "
## 3 ( 6 ) " " "*" "*" " " "*" " " " " " " " " " " " " " "
## 3 ( 7 ) "*" " " "*" " " "*" " " " " " " " " " " " " " "
## 3 ( 8 ) " " " " "*" " " "*" " " " " " " " " " " " " "*"
## 3 ( 9 ) " " " " "*" " " "*" " " " " " " " " " " "*" " "
## 3 ( 10 ) " " " " " " " " "*" "*" " " " " "*" " " " " " "
## 3 ( 11 ) " " " " " " " " "*" "*" " " "*" " " " " " " " "
## 3 ( 12 ) " " " " " " " " "*" " " "*" " " " " " " " " "*"
## 4 ( 1 ) " " " " "*" " " "*" "*" " " " " " " " " "*" " "
## 4 ( 2 ) " " " " "*" " " "*" "*" " " " " " " "*" " " " "
## 4 ( 3 ) " " " " "*" " " "*" "*" " " "*" " " " " " " " "
## 4 ( 4 ) " " " " "*" " " "*" "*" " " " " "*" " " " " " "
## 4 ( 5 ) " " " " "*" " " "*" "*" " " " " " " " " " " "*"
## 4 ( 6 ) " " " " "*" " " "*" "*" "*" " " " " " " " " " "
## 4 ( 7 ) " " "*" "*" " " "*" "*" " " " " " " " " " " " "
## 4 ( 8 ) "*" " " "*" " " "*" "*" " " " " " " " " " " " "
## 4 ( 9 ) " " " " "*" "*" "*" "*" " " " " " " " " " " " "
## 4 ( 10 ) " " " " "*" " " "*" " " " " " " "*" " " "*" " "
## 4 ( 11 ) " " " " "*" " " "*" " " " " "*" " " " " "*" " "
## 4 ( 12 ) " " " " "*" " " "*" " " " " "*" "*" " " " " " "
## 5 ( 1 ) " " " " "*" " " "*" "*" " " " " "*" " " "*" " "
## 5 ( 2 ) " " " " "*" " " "*" "*" " " "*" " " " " "*" " "
## 5 ( 3 ) " " " " "*" " " "*" "*" " " " " " " " " "*" "*"
## 5 ( 4 ) " " "*" "*" " " "*" "*" " " " " " " " " "*" " "
## 5 ( 5 ) " " " " "*" "*" "*" "*" " " " " " " " " "*" " "
## 5 ( 6 ) " " " " "*" " " "*" "*" " " " " " " "*" "*" " "
## 5 ( 7 ) " " " " "*" " " "*" "*" "*" " " " " " " "*" " "
## 5 ( 8 ) "*" " " "*" " " "*" "*" " " " " " " " " "*" " "
## 5 ( 9 ) " " " " "*" " " "*" "*" " " "*" " " "*" " " " "
## 5 ( 10 ) " " " " "*" " " "*" "*" " " " " "*" "*" " " " "
## 5 ( 11 ) " " " " "*" " " "*" "*" " " " " " " "*" " " "*"
## 5 ( 12 ) " " " " "*" " " "*" "*" " " "*" " " " " " " "*"
## 6 ( 1 ) " " " " "*" " " "*" "*" " " " " "*" " " "*" "*"
## 6 ( 2 ) " " " " "*" " " "*" "*" " " "*" "*" " " "*" " "
## 6 ( 3 ) " " " " "*" " " "*" "*" " " "*" " " " " "*" "*"
## 6 ( 4 ) " " " " "*" " " "*" "*" "*" " " "*" " " "*" " "
## 6 ( 5 ) " " " " "*" "*" "*" "*" " " " " "*" " " "*" " "
## 6 ( 6 ) " " "*" "*" " " "*" "*" " " " " "*" " " "*" " "
## 6 ( 7 ) " " " " "*" " " "*" "*" " " " " "*" "*" "*" " "
## 6 ( 8 ) "*" " " "*" " " "*" "*" " " " " "*" " " "*" " "
## 6 ( 9 ) " " "*" "*" " " "*" "*" " " " " " " " " "*" "*"
## 6 ( 10 ) " " " " "*" " " "*" "*" "*" "*" " " " " "*" " "
## 6 ( 11 ) "*" " " "*" " " "*" "*" " " " " " " " " "*" "*"
## 6 ( 12 ) " " " " "*" "*" "*" "*" " " "*" " " " " "*" " "
## 7 ( 1 ) " " " " "*" " " "*" "*" " " "*" "*" " " "*" "*"
## 7 ( 2 ) " " "*" "*" " " "*" "*" " " " " "*" " " "*" "*"
## 7 ( 3 ) " " " " "*" " " "*" "*" "*" "*" "*" " " "*" " "
## 7 ( 4 ) " " " " "*" "*" "*" "*" " " " " "*" " " "*" "*"
## 7 ( 5 ) "*" " " "*" " " "*" "*" " " "*" " " " " "*" "*"
## 7 ( 6 ) " " " " "*" " " "*" "*" "*" " " "*" " " "*" "*"
## 7 ( 7 ) "*" " " "*" " " "*" "*" " " " " "*" " " "*" "*"
## 7 ( 8 ) " " " " "*" "*" "*" "*" " " "*" "*" " " "*" " "
## 7 ( 9 ) " " " " "*" " " "*" "*" " " " " "*" "*" "*" "*"
## 7 ( 10 ) " " " " "*" " " "*" "*" " " "*" "*" "*" "*" " "
## 7 ( 11 ) " " " " "*" "*" "*" "*" " " "*" " " " " "*" "*"
## 7 ( 12 ) " " "*" "*" " " "*" "*" " " "*" "*" " " "*" " "
## 8 ( 1 ) "*" " " "*" " " "*" "*" " " "*" "*" " " "*" "*"
## 8 ( 2 ) " " " " "*" " " "*" "*" "*" "*" "*" " " "*" "*"
## 8 ( 3 ) " " " " "*" "*" "*" "*" " " "*" "*" " " "*" "*"
## 8 ( 4 ) "*" " " "*" " " "*" "*" "*" "*" " " " " "*" "*"
## 8 ( 5 ) " " "*" "*" "*" "*" "*" " " " " "*" " " "*" "*"
## 8 ( 6 ) " " "*" "*" " " "*" "*" " " "*" "*" " " "*" "*"
## 8 ( 7 ) " " " " "*" " " "*" "*" " " "*" "*" "*" "*" "*"
## 8 ( 8 ) "*" "*" "*" " " "*" "*" " " " " "*" " " "*" "*"
## 8 ( 9 ) " " " " "*" "*" "*" "*" "*" "*" "*" " " "*" " "
## 8 ( 10 ) "*" " " "*" " " "*" "*" "*" "*" "*" " " "*" " "
## 8 ( 11 ) "*" " " "*" " " "*" "*" "*" " " "*" " " "*" "*"
## 8 ( 12 ) "*" " " "*" " " "*" "*" " " "*" " " "*" "*" "*"
data.frame(
Adj.R2 = which.max(res.sum$adjr2),
CP = which.min(res.sum$cp),
BIC = which.min(res.sum$bic)
)## Adj.R2 CP BIC
## 1 37 25 25
plot(leaps2,scale = 'adjr2')leaps2 for number of subsets of each size to record, shows choose 31 when consider R2 and choose 21 for CP and BIC.
37 is Neck+Abdo+Hip+Fore
25 is -Fore
So all information said that the best choice is Neck+Abdo+Hip (+Fore(if consider R2))
stepwise method
AIC | backward
library(MASS)
stepAIC(Model,direction = "backward")## Start: AIC=372.58
## Fat ~ Age + Height + Neck + Chest + Abdo + Hip + Thigh + Knee +
## Ankle + Bic + Fore + Wrist
##
## Df Sum of Sq RSS AIC
## - Height 1 0.05 1919.3 370.58
## - Bic 1 0.30 1919.5 370.60
## - Chest 1 1.38 1920.6 370.67
## - Thigh 1 1.61 1920.8 370.69
## - Ankle 1 3.11 1922.3 370.79
## - Age 1 3.13 1922.3 370.79
## - Knee 1 3.51 1922.7 370.81
## - Wrist 1 4.88 1924.1 370.90
## - Fore 1 28.48 1947.7 372.46
## <none> 1919.2 372.58
## - Hip 1 33.45 1952.7 372.79
## - Neck 1 77.97 1997.2 375.68
## - Abdo 1 1148.35 3067.6 430.61
##
## Step: AIC=370.58
## Fat ~ Age + Neck + Chest + Abdo + Hip + Thigh + Knee + Ankle +
## Bic + Fore + Wrist
##
## Df Sum of Sq RSS AIC
## - Bic 1 0.32 1919.6 368.60
## - Chest 1 1.34 1920.6 368.67
## - Thigh 1 1.95 1921.2 368.71
## - Ankle 1 3.12 1922.4 368.79
## - Age 1 3.44 1922.7 368.81
## - Wrist 1 4.82 1924.1 368.90
## - Knee 1 4.93 1924.2 368.91
## - Fore 1 28.48 1947.7 370.47
## <none> 1919.3 370.58
## - Hip 1 37.35 1956.6 371.05
## - Neck 1 80.13 1999.4 373.82
## - Abdo 1 1158.13 3077.4 429.02
##
## Step: AIC=368.6
## Fat ~ Age + Neck + Chest + Abdo + Hip + Thigh + Knee + Ankle +
## Fore + Wrist
##
## Df Sum of Sq RSS AIC
## - Chest 1 1.16 1920.8 366.68
## - Thigh 1 2.98 1922.6 366.80
## - Age 1 3.38 1923.0 366.83
## - Ankle 1 3.62 1923.2 366.84
## - Wrist 1 4.64 1924.2 366.91
## - Knee 1 4.86 1924.5 366.93
## <none> 1919.6 368.60
## - Fore 1 36.09 1955.7 368.99
## - Hip 1 37.04 1956.6 369.05
## - Neck 1 79.81 1999.4 371.82
## - Abdo 1 1164.23 3083.8 427.28
##
## Step: AIC=366.68
## Fat ~ Age + Neck + Abdo + Hip + Thigh + Knee + Ankle + Fore +
## Wrist
##
## Df Sum of Sq RSS AIC
## - Thigh 1 3.26 1924.0 364.90
## - Age 1 3.33 1924.1 364.90
## - Ankle 1 3.60 1924.3 364.92
## - Wrist 1 4.73 1925.5 364.99
## - Knee 1 4.82 1925.6 365.00
## <none> 1920.7 366.68
## - Fore 1 34.94 1955.7 366.99
## - Hip 1 38.40 1959.1 367.21
## - Neck 1 85.07 2005.8 370.23
## - Abdo 1 1924.58 3845.3 453.53
##
## Step: AIC=364.9
## Fat ~ Age + Neck + Abdo + Hip + Knee + Ankle + Fore + Wrist
##
## Df Sum of Sq RSS AIC
## - Age 1 1.52 1925.5 363.00
## - Ankle 1 3.17 1927.2 363.11
## - Knee 1 3.32 1927.3 363.12
## - Wrist 1 5.13 1929.1 363.24
## <none> 1924.0 364.90
## - Hip 1 36.09 1960.1 365.28
## - Fore 1 37.24 1961.2 365.35
## - Neck 1 82.15 2006.2 368.25
## - Abdo 1 1961.69 3885.7 452.87
##
## Step: AIC=363
## Fat ~ Neck + Abdo + Hip + Knee + Ankle + Fore + Wrist
##
## Df Sum of Sq RSS AIC
## - Knee 1 2.62 1928.1 361.17
## - Wrist 1 3.70 1929.2 361.24
## - Ankle 1 4.36 1929.9 361.29
## <none> 1925.5 363.00
## - Fore 1 35.85 1961.4 363.36
## - Hip 1 56.44 1982.0 364.70
## - Neck 1 84.61 2010.1 366.50
## - Abdo 1 2556.02 4481.5 469.13
##
## Step: AIC=361.17
## Fat ~ Neck + Abdo + Hip + Ankle + Fore + Wrist
##
## Df Sum of Sq RSS AIC
## - Wrist 1 3.79 1931.9 359.42
## - Ankle 1 9.03 1937.2 359.77
## <none> 1928.1 361.17
## - Fore 1 36.30 1964.4 361.56
## - Hip 1 67.91 1996.1 363.60
## - Neck 1 87.51 2015.7 364.85
## - Abdo 1 2570.36 4498.5 467.61
##
## Step: AIC=359.42
## Fat ~ Neck + Abdo + Hip + Ankle + Fore
##
## Df Sum of Sq RSS AIC
## - Ankle 1 14.86 1946.8 358.40
## <none> 1931.9 359.42
## - Fore 1 36.06 1968.0 359.79
## - Hip 1 67.31 1999.2 361.81
## - Neck 1 140.04 2072.0 366.38
## - Abdo 1 2574.16 4506.1 465.83
##
## Step: AIC=358.4
## Fat ~ Neck + Abdo + Hip + Fore
##
## Df Sum of Sq RSS AIC
## - Fore 1 28.63 1975.4 358.27
## <none> 1946.8 358.40
## - Hip 1 126.23 2073.0 364.45
## - Neck 1 147.87 2094.7 365.78
## - Abdo 1 2670.93 4617.7 466.96
##
## Step: AIC=358.27
## Fat ~ Neck + Abdo + Hip
##
## Df Sum of Sq RSS AIC
## <none> 1975.4 358.27
## - Hip 1 107.53 2083.0 363.06
## - Neck 1 119.24 2094.7 363.78
## - Abdo 1 2642.74 4618.2 464.97
##
## Call:
## lm(formula = Fat ~ Neck + Abdo + Hip)
##
## Coefficients:
## (Intercept) Neck Abdo Hip
## -14.2955 -0.6266 0.9290 -0.2863
least AIC: Neck+Abdo+Hip
anova and AIC test AIC for fore (and chest for interest)
Model1<-lm(Fat ~ Neck+Abdo+Hip)
Model2<-lm(Fat ~ Neck+Abdo+Hip+Fore)
#test for interest
#Because in cor() the chest has a high correlation.
#But all result shows we can delete them
Model3<-lm(Fat ~ Chest+Neck+Abdo+Hip)
anova(Model1,Model2)## Analysis of Variance Table
##
## Model 1: Fat ~ Neck + Abdo + Hip
## Model 2: Fat ~ Neck + Abdo + Hip + Fore
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 124 1975.4
## 2 123 1946.8 1 28.626 1.8086 0.1812
AIC(Model1,Model2,Model3)## df AIC
## Model1 5 723.5212
## Model2 6 723.6528
## Model3 6 725.5200
p = 0.1812 means we can delete fore. And AIC has same result
Result
summary(Model1)##
## Call:
## lm(formula = Fat ~ Neck + Abdo + Hip)
##
## Residuals:
## Min 1Q Median 3Q Max
## -9.0590 -2.5669 -0.0023 2.6107 8.7289
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -14.29546 7.49107 -1.908 0.05866 .
## Neck -0.62659 0.22903 -2.736 0.00713 **
## Abdo 0.92901 0.07213 12.880 < 2e-16 ***
## Hip -0.28631 0.11020 -2.598 0.01051 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.991 on 124 degrees of freedom
## Multiple R-squared: 0.7418, Adjusted R-squared: 0.7356
## F-statistic: 118.8 on 3 and 124 DF, p-value: < 2.2e-16
So the regression equation:
bodyfat = -14.29546 - 0.62659 * Neck + 0.92901 * Abdomen - 0.28631 * Hip
The result is the same with when we only consider the circumference without age, height, weight. (https://rpubs.com/YifeiLiu/RegressionBodyfat)
Consider weight(same result)
because weight is an important parameter in common sense
newModel1<-lm(Fat ~ Weight+Neck+Abdo+Hip)
vif(newModel1)## Weight Neck Abdo Hip
## 13.310543 2.631873 4.694637 8.183684
newModel2<-lm(Fat ~ Weight+Neck+Abdo+Fore)
vif(newModel2)## Weight Neck Abdo Fore
## 7.058795 2.774132 4.979725 1.899964
summary(newModel2)##
## Call:
## lm(formula = Fat ~ Weight + Neck + Abdo + Fore)
##
## Residuals:
## Min 1Q Median 3Q Max
## -9.1637 -2.8476 0.0877 2.7522 8.3342
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -41.88325 8.30967 -5.040 1.62e-06 ***
## Weight -0.22932 0.07868 -2.915 0.00423 **
## Neck -0.61856 0.26606 -2.325 0.02172 *
## Abdo 0.98046 0.08146 12.036 < 2e-16 ***
## Fore 0.43409 0.23834 1.821 0.07099 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.971 on 123 degrees of freedom
## Multiple R-squared: 0.7466, Adjusted R-squared: 0.7383
## F-statistic: 90.58 on 4 and 123 DF, p-value: < 2.2e-16
plot(newModel2) Consider weight and hip will let VIF > 10.
Consider weight and fore, the summary is not better than not consider weight above.
So the regression equation is still:
bodyfat = -14.29546 - 0.62659 * Neck + 0.92901 * Abdomen - 0.28631 * Hip
The result is the same with when we only consider the circumference without age, height, weight. (https://rpubs.com/YifeiLiu/RegressionBodyfat)