Learning Log 4

Confidence and Prediction Intervals

Today we learned how to use R to find confidence and prediction intervals. I will find these intervals for the women data set that we used last class.

data(women)
attach(women)
mod2 <-  lm(weight ~ height, data = women )
cor(weight, height)

## [1] 0.9954948

summary(mod2)

## 
## Call:
## lm(formula = weight ~ height, data = women)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.7333 -1.1333 -0.3833  0.7417  3.1167 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -87.51667    5.93694  -14.74 1.71e-09 ***
## height        3.45000    0.09114   37.85 1.09e-14 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.525 on 13 degrees of freedom
## Multiple R-squared:  0.991,  Adjusted R-squared:  0.9903 
## F-statistic:  1433 on 1 and 13 DF,  p-value: 1.091e-14

There is the regression model that uses height to predict weight. As we can see from this output our regression equation is in the form of \[\hat{y_i}= \hat{\beta_0}+\hat{\beta_1} x_i\] and is: \[\hat{y_i}= -87.51667+ 3.45x_i\] which means that for every one inch in height a person increases, their weight should increase on average 3.45 pounds.

Other important pieces of our output are the Multiple R^2 value which is very high at .991 and our p-values which are significant because they are very small. The other cool thing we can look at is that our F-stat and our t-stat have the same exact p-value for height. Also included in this chunk of code was the correlation between x and y, otherwise known as r.

Next I will run a couple of confidence intervals for our coefficients showing how you can adjust the confidence percentage based on your alpha value.

confint(mod2, level=.95)

##                   2.5 %     97.5 %
## (Intercept) -100.342655 -74.690679
## height         3.253112   3.646888

confint(mod2, level=.85)

##                  7.5 %     92.5 %
## (Intercept) -96.599714 -78.433620
## height        3.310568   3.589432

Now I will create a confidence and prediction interval for the height of 65.

ndata <- data.frame(height = 65)
predI <- predict(mod2, ndata, interval="predict") 
confI <- predict(mod2, ndata, interval="confidence") 
confI

##        fit      lwr     upr
## 1 136.7333 135.8827 137.584

predI

##        fit      lwr     upr
## 1 136.7333 133.3307 140.136

Above are the intervals with their centers and their boundaries defined. To check to see which one is wider without doing the math in your head, the next chunk of code is very useful. Note: our prediction interval should be larger than our confidence interval.

confI %*% c(0, -1, 1)

##      [,1]
## 1 1.70131

predI %*% c(0, -1, 1)

##       [,1]
## 1 6.805242

Finally to check that their centers are equal to each other we can use this last piece of code.

confI[1] == predI[1]

## [1] TRUE

We learned the importance and difference of these intervals today. The big thing that I took away from them that makes them different is that confidence intervals are an interval for all people at \(x_0\) where as prediction intervals are for one person at \(x_0\).

Learning Log 4

Stenroos

February 8, 2018

Confidence and Prediction Intervals