What I learned today: I learned the difference between the confidence interval of the mean and the prediction interval of the single variable. When I was reading the textbook, I was confused why the equations differed, but after class today it makes so much more sense.

The prediction interval for y is wider than he confidence interval for the mean of y. This is because it is easier to predict the mean of weight given a height rather than a specific individual’s weight given a height.

data(women)
attach(women)
women <- lm(weight~height)
women
## 
## Call:
## lm(formula = weight ~ height)
## 
## Coefficients:
## (Intercept)       height  
##      -87.52         3.45
newData <- data.frame(height = 63)

newData
##   height
## 1     63
conf <- predict(women, newData, interval = "confidence")
conf
##        fit     lwr      upr
## 1 129.8333 128.896 130.7707
pred <- predict(women, newData, interval = "predict")
pred
##        fit     lwr      upr
## 1 129.8333 126.408 133.2587
all.equal (conf[1], pred[1])
## [1] TRUE
conf[1] == pred[1]
## [1] TRUE
widthPred = pred %*% c(0,-1,1)

widthConf = conf %*% c(0,-1,1)

widthPred - widthConf
##       [,1]
## 1 4.975908

This confirms that the prediction interval is wider than the confidence interval. I also checked to make sure the prediction and confidence intervals were centered at the same center using the conf[1] == pred[1] command.

corr1 = cor(weight, height)

corr1  
## [1] 0.9954948
corr2 = cor(height, weight)

corr2
## [1] 0.9954948
corr1 == corr2
## [1] TRUE

Confirms that order does not matter and that the cor function calculates the correlation between 2 variables very quickly.

summary(women)
## 
## Call:
## lm(formula = weight ~ height)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.7333 -1.1333 -0.3833  0.7417  3.1167 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -87.51667    5.93694  -14.74 1.71e-09 ***
## height        3.45000    0.09114   37.85 1.09e-14 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.525 on 13 degrees of freedom
## Multiple R-squared:  0.991,  Adjusted R-squared:  0.9903 
## F-statistic:  1433 on 1 and 13 DF,  p-value: 1.091e-14

The F-statistic = 1433 with 1 and 13 degrees of freedom

r-squared is .991

confint(women)
##                   2.5 %     97.5 %
## (Intercept) -100.342655 -74.690679
## height         3.253112   3.646888
confint(women, level = .9)
##                    5 %       95 %
## (Intercept) -98.030599 -77.002734
## height        3.288603   3.611397

This is saying that we are 95% confident that the weight-intercept will be between -98 and -77 pounds. We are confident that the slope coefficient will be between 3.288603 and 3.611397