Linear Regression in R

We’ll use the bodyfat dataset from the mfp package which provides us body measurements from 252 men. The columns we care about are:

siri: body fat %, derived from underwater weighing/density (our \(y\))
abdomen: abdominal circumference (our first predictor variable)
chest: second circumference (our second variable)
hip: third circumference (our third variable)

mod <- lm(siri ~ abdomen, data = bodyfat)
summary(mod)

## 
## Call:
## lm(formula = siri ~ abdomen, data = bodyfat)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -19.0160  -3.7557   0.0554   3.4215  12.9007 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -39.28018    2.66034  -14.77   <2e-16 ***
## abdomen       0.63130    0.02855   22.11   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.877 on 250 degrees of freedom
## Multiple R-squared:  0.6617, Adjusted R-squared:  0.6603 
## F-statistic: 488.9 on 1 and 250 DF,  p-value: < 2.2e-16

Statistic	Value
Estimate (Slope)	0.6313
R-Squared	0.6617
p-value	0.0000

Intercept: The estimate is the intercept of our function or \(\beta_0\) in the equation. That means when abdomen is \(0\) cm, the model predicts the individual will have a siri of -39.2801847%. Since no one has an abdomen of \(0\) and negative body fat is impossible, the intercept has no real-world meaning for our purposes.

Estimate: The estimate is the slope of our function or \(\beta_1\) in the equation. In our case it is 0.6313044. Which means we have positive correlation.

R^2: \(R^2\) in our context tells us what fraction of the differences in the body fat % our model can explain. For example an \(R^2\) of \(0.66\) tells us that the model explains \(66\%\) of variation in body fat %. The remaining \(34\%\) is from variables we did not measure.

p-value: Gives a value from 0-1, where a number closer to 0 means the relationship is less likely to be due to chance. Most standards are looking for a p-value of \(.05\). Our value 9.0900667^{-61} tells us that our model is very unlikely to be random.

Hypothesis: The more body circumferences we add to the model, the more accurately we can predict the siri body-fat percentage.

Process: With the single predictor we fit a line, with two we fit a plane, and from there on we cannot visualize, but the math works the same.

Model	R_squared
abdomen	0.6617
abdomen + chest	0.6728
abdomen + chest + hip	0.6993

Model	R-Squared	Adjusted R-Squared
abdomen	0.6617	0.6603
abdomen + chest	0.6728	0.6702
abdomen + chest + hip	0.6993	0.6956

Linear Regression

Our Dataset

Example One

How well does abdominal circumference predict body fat %?

Fitting a Line

The output

Understanding R-Squared

Introducing Adjusted R-Squared

Example Two

Example Two Output

How does this look when viewing R^2 Adjusted?

Conclusions