data(women)
attach(women)
plot(height, weight, xlab = "Height (in)", ylab = "Weight (lbs)")
There is a positive linear relationship between height and weight in this dataset.
mod <- lm(weight~height)
mod
##
## Call:
## lm(formula = weight ~ height)
##
## Coefficients:
## (Intercept) height
## -87.52 3.45
The regression line is predicted weight = -87.52 + 3.45(height)
To give meaning to our y-intercept would be extrapolation, since we have no women of height 0 in our dataset (shortest woman is 58 in). The interpretation of the slope in this model is that for every additional inch of height, we would expect a women to weight 3.45 more lbs.
In our dataset, there’s a woman who is 65 in tall.
coef(mod)%*%c(1, 65) #the coefficients of your model matrix multiplied by [1, x]
## [,1]
## [1,] 136.7333
We would predict her weight to be 136.7333 lbs (her actual weight in 135).
Here’s the residual for this one value:
woman.actual <- women[8, 2]
woman.height <- women[8, 1]
woman.predicted <- coef(mod)%*%c(1, woman.height)
woman.resid <- woman.actual - woman.predicted
woman.resid
## [,1]
## [1,] -1.733333
The residual is -1.733333 for this one observation.
In retrospect this most likely should be done before I go throwing predictions around, but here are the steps taken to check to see if our linear model is “legal”, aka meeting the normally distributed error and homoscedasticity conditions.
Checking out error distribution:
resids <- mod$residuals
qqnorm(resids)
qqline(resids)
Not perfect, but good enough (I think).
Checking out variance spread:
plot(resids ~ height)
abline(0,0)
Bad news. This model over predicts for very short and very tall women, and underpredicts for medium-height women. This unequal variance means that this isn’t a useable linear model, so my prediction above is pretty bogus and this model should be trashed.