Predicting body fat with lm

library(mplot)
data(bodyfat)

bodyfat.lm = lm(Bodyfat ~ Neck + Chest + Abdo + Hip + Thigh + Knee + Ankle + Bic + Fore + Wrist, data=subset(bodyfat,select=-Id))

summary(bodyfat.lm)
## 
## Call:
## lm(formula = Bodyfat ~ Neck + Chest + Abdo + Hip + Thigh + Knee + 
##     Ankle + Bic + Fore + Wrist, data = subset(bodyfat, select = -Id))
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -9.0414 -2.5037 -0.1783  2.7166  9.5073 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -10.34859    9.07550  -1.140   0.2565    
## Neck         -0.68341    0.30776  -2.221   0.0283 *  
## Chest        -0.03297    0.11898  -0.277   0.7822    
## Abdo          0.96013    0.10562   9.090 3.01e-15 ***
## Hip          -0.27401    0.16515  -1.659   0.0998 .  
## Thigh         0.03553    0.17675   0.201   0.8410    
## Knee         -0.12257    0.27596  -0.444   0.6577    
## Ankle        -0.24841    0.47587  -0.522   0.6027    
## Bic           0.03025    0.23880   0.127   0.8994    
## Fore          0.33694    0.26239   1.284   0.2016    
## Wrist        -0.26845    0.67340  -0.399   0.6909    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.054 on 117 degrees of freedom
## Multiple R-squared:  0.7487, Adjusted R-squared:  0.7272 
## F-statistic: 34.86 on 10 and 117 DF,  p-value: < 2.2e-16

Plots

plot(bodyfat$Bodyfat, main = "Real bodyfat", ylab="Bodyfat", xlab="Number")

plot(predict.lm(bodyfat.lm), main = "Predicted bodyfat", ylab="Bodyfat", xlab="Number")

plot(bodyfat$Bodyfat, main = "Real and predicted bodyfat", ylab="Bodyfat", xlab="Number", ylim=c(0, 40))
lines(predict.lm(bodyfat.lm), pch = 18, col = "blue", type = "b", lty = 0)
legend(-4, 41, legend=c("Real", "Predicted"), fill=c("white", "blue"), cex=0.8)

As we can see, the predicted bodyfat does not match perfectly with the real data but ther are not that bad too. Some figures match almost perfectly, others differ only slightly. There are not many cases where the predicted data varies very significantly. The result would be more accurate with more data given (not just 128 cases).

Multiple regression equation for the percentage of body fat