Confidence Intervals for Regression Coefficients

data(women)
attach(women)
names(women)
## [1] "height" "weight"

I want create a confidence interval for the linear relationship between womens’ heights (the predictor) and womens’ weights (the response). I want to create a 95% confidence interval.

mymod<- lm(weight ~ height)
confint(mymod,level=.95)
##                   2.5 %     97.5 %
## (Intercept) -100.342655 -74.690679
## height         3.253112   3.646888

Similarily with other datasets, the intercept confidence interval doesn’t necessarily make sense. It gives negative values, which is difficult to do when it comes to womens’ weights.

Testing Correlation

First, I will do just a quick correlation test to see if the two variables have an obvious correlation. I am expecting this number to be close to 1.

cor(weight,height)
## [1] 0.9954948

This value is a welcoming sign, because it is saying that womens’ heights are correlated to their weights. Which conceptually makes sense.

Prediction Intervals and Confidence Intervals

Now I will create a prediction interval, and confidence interval based upon the women’s height of 66 inches. I chose this value at random, but I do know that it lies within the dataset.

newdata<-data.frame(height = 66)
(predy <- predict(mymod, newdata, interval="predict") )
##        fit     lwr      upr
## 1 140.1833 136.775 143.5916

This produces our prediction interval, which is the prediction of a single value. We expect this to be larger than the confidence interval, becasue it is just based on that one value.

(confy <- predict(mymod, newdata, interval="confidence") )
##        fit      lwr      upr
## 1 140.1833 139.3102 141.0565

The confidence interval provides a smaller window, becasue it is based on all of the values. If it is not quite apparent, we can find the width of both intervals and compare them

confy %*% c(0, -1, 1)  
##       [,1]
## 1 1.746287
predy %*% c(0, -1, 1)
##       [,1]
## 1 6.816625

This is better proof that the CI width is much less than the PI width, which is to be expected.

Last we will check to make sure that both intervals are centered at the same value.

confy[1]==predy[1]
## [1] TRUE

This model looks good, and does not appear to be breaking any of the rules of a linear regression model.