Women’s Height and Weight Data

I chose to work with the data set containing women’s height and weight data. I will use height as the predictor variable and weight as the response variable.

First I will start of by attaching the data set to R.

attach(women)

Now I can run my plot command and look at the scatter plot.

plot(height,weight)

This plot shows that the two variables have a very linear relationship. The plot looks good but I want to fix the labels.

plot(height, weight, ylab = "Women's Weight",
     xlab = "Womens Height", 
     main = "Women's Height and Weight Data")

Now that looks better and is easier to read.

Next I want to actually run my regression and plot my prediction line in my plot.

model <- lm(weight~height)
model
## 
## Call:
## lm(formula = weight ~ height)
## 
## Coefficients:
## (Intercept)       height  
##      -87.52         3.45
plot(height, weight, ylab = "Women's Weight",
     xlab = "Womens Height", 
     main = "Women's Height and Weight Data")
abline(-87.52,3.45)

Now I will look at a specific data point, I chose the 3rd row, and see how close the predicted to the actual is, the residual.

women[3, ]
##   height weight
## 3     60    120
-87.52+60*3.45
## [1] 119.48
120-119.48
## [1] 0.52

Now one of our assumptions is that we want to have a normally distributed error so I will make a histogram of the residuals.

resid<- model$residuals
hist(resid)

This model is okay. If you count you will see that there are 8 negative errors and 7 positive errors so we are centered around 0 pretty decent but the right side is more dispersed than the left side.

I will bring up another two graphs to check out the residuals.

qqnorm(resid)
qqline(resid)

plot(model$residuals ~ height)
abline(0,0)

The first plot shows us that our residuals are all very small. The second plot shows us a potetial problem though. This shows us a pattern in our residuals. Our residuals are supposed to be independent but this contradicts that assumption a bit.

Finally we want to check out the summary statistics of our model.

summary(model)
## 
## Call:
## lm(formula = weight ~ height)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.7333 -1.1333 -0.3833  0.7417  3.1167 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -87.51667    5.93694  -14.74 1.71e-09 ***
## height        3.45000    0.09114   37.85 1.09e-14 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.525 on 13 degrees of freedom
## Multiple R-squared:  0.991,  Adjusted R-squared:  0.9903 
## F-statistic:  1433 on 1 and 13 DF,  p-value: 1.091e-14

This is a nice way to see our error statistics, our r or r^2 value, our coefficients, our degrees of freedom, and many other useful statistics.

As we can see our R^2 value is very high and our error is pretty low so our model is a good fit and our variables are linearly related. The interpretation of our model is that for every one inch increase in height the weight increases by 3.45 pounds.