Today we learned how to use R to find confidence and prediction intervals. I will find these intervals for the women data set that we used last class.
data(women)
attach(women)
mod2 <- lm(weight ~ height, data = women )
cor(weight, height)
## [1] 0.9954948
summary(mod2)
##
## Call:
## lm(formula = weight ~ height, data = women)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.7333 -1.1333 -0.3833 0.7417 3.1167
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -87.51667 5.93694 -14.74 1.71e-09 ***
## height 3.45000 0.09114 37.85 1.09e-14 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.525 on 13 degrees of freedom
## Multiple R-squared: 0.991, Adjusted R-squared: 0.9903
## F-statistic: 1433 on 1 and 13 DF, p-value: 1.091e-14
There is the regression model that uses height to predict weight. As we can see from this output our regression equation is in the form of \[\hat{y_i}= \hat{\beta_0}+\hat{\beta_1} x_i\] and is: \[\hat{y_i}= -87.51667+ 3.45x_i\] which means that for every one inch in height a person increases, their weight should increase on average 3.45 pounds.
Other important pieces of our output are the Multiple R^2 value which is very high at .991 and our p-values which are significant because they are very small. The other cool thing we can look at is that our F-stat and our t-stat have the same exact p-value for height. Also included in this chunk of code was the correlation between x and y, otherwise known as r.
Next I will run a couple of confidence intervals for our coefficients showing how you can adjust the confidence percentage based on your alpha value.
confint(mod2, level=.95)
## 2.5 % 97.5 %
## (Intercept) -100.342655 -74.690679
## height 3.253112 3.646888
confint(mod2, level=.85)
## 7.5 % 92.5 %
## (Intercept) -96.599714 -78.433620
## height 3.310568 3.589432
Now I will create a confidence and prediction interval for the height of 65.
ndata <- data.frame(height = 65)
predI <- predict(mod2, ndata, interval="predict")
confI <- predict(mod2, ndata, interval="confidence")
confI
## fit lwr upr
## 1 136.7333 135.8827 137.584
predI
## fit lwr upr
## 1 136.7333 133.3307 140.136
Above are the intervals with their centers and their boundaries defined. To check to see which one is wider without doing the math in your head, the next chunk of code is very useful. Note: our prediction interval should be larger than our confidence interval.
confI %*% c(0, -1, 1)
## [,1]
## 1 1.70131
predI %*% c(0, -1, 1)
## [,1]
## 1 6.805242
Finally to check that their centers are equal to each other we can use this last piece of code.
confI[1] == predI[1]
## [1] TRUE
We learned the importance and difference of these intervals today. The big thing that I took away from them that makes them different is that confidence intervals are an interval for all people at \(x_0\) where as prediction intervals are for one person at \(x_0\).