In class on Thursday we reviewed the linear regression parameters B0 and B1 as an introduction to hypothesis tests on these variables, confidence and prediction intervals, correlation, and F-tests.
data(women)
attach(women)
LL4 <- lm(weight ~ height, data = women )
confint(LL4, level=.9)
## 5 % 95 %
## (Intercept) -98.030599 -77.002734
## height 3.288603 3.611397
The code outputs the upper and lower bounds of a 90% confidence interval for B0(the y-intercept), showing the bounds to be [-98, -77]. The row associated with height shows the bounds for B1 to be [3.3, 3.6].
cor(height, weight)
## [1] 0.9954948
Height and weight are highly correlated, as evidenced by the near perfect r-value.
LL4_63in <- data.frame(height=63)
(LL4_63in_Pred <- predict(LL4, LL4_63in, interval="predict") )
## fit lwr upr
## 1 129.8333 126.408 133.2587
(LL4_63in_Conf <- predict(LL4, LL4_63in, interval="confidence") )
## fit lwr upr
## 1 129.8333 128.896 130.7707
After limiting my data to only 63in tall heights, I have created a prediction interval for one 63in tall woman and a confidence interval for the mean of all 63in tall women. As expected, the interval for the one woman is wider than the interval for the mean of all women, meaning it is harder to predict a weight for one woman that it is to predict a mean weight for all women. Another way to confirm this is just looking at the width of the intervals.
LL4_63in_Pred %*% c(0, -1, 1)
## [,1]
## 1 6.850661
LL4_63in_Conf %*% c(0, -1, 1)
## [,1]
## 1 1.874753
It appears that the two intervals are centered at the same weight of 129.8lbs. To have R check for me:
LL4_63in_Conf[1] == LL4_63in_Pred[1]
## [1] TRUE
all.equal(LL4_63in_Conf[1], LL4_63in_Pred[1])
## [1] TRUE