What I learned today: I learned the difference between the confidence interval of the mean and the prediction interval of the single variable. When I was reading the textbook, I was confused why the equations differed, but after class today it makes so much more sense.
The prediction interval for y is wider than he confidence interval for the mean of y. This is because it is easier to predict the mean of weight given a height rather than a specific individual’s weight given a height.
data(women)
attach(women)
women <- lm(weight~height)
women
##
## Call:
## lm(formula = weight ~ height)
##
## Coefficients:
## (Intercept) height
## -87.52 3.45
newData <- data.frame(height = 63)
newData
## height
## 1 63
conf <- predict(women, newData, interval = "confidence")
conf
## fit lwr upr
## 1 129.8333 128.896 130.7707
pred <- predict(women, newData, interval = "predict")
pred
## fit lwr upr
## 1 129.8333 126.408 133.2587
all.equal (conf[1], pred[1])
## [1] TRUE
conf[1] == pred[1]
## [1] TRUE
widthPred = pred %*% c(0,-1,1)
widthConf = conf %*% c(0,-1,1)
widthPred - widthConf
## [,1]
## 1 4.975908
This confirms that the prediction interval is wider than the confidence interval. I also checked to make sure the prediction and confidence intervals were centered at the same center using the conf[1] == pred[1] command.
corr1 = cor(weight, height)
corr1
## [1] 0.9954948
corr2 = cor(height, weight)
corr2
## [1] 0.9954948
corr1 == corr2
## [1] TRUE
Confirms that order does not matter and that the cor function calculates the correlation between 2 variables very quickly.
summary(women)
##
## Call:
## lm(formula = weight ~ height)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.7333 -1.1333 -0.3833 0.7417 3.1167
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -87.51667 5.93694 -14.74 1.71e-09 ***
## height 3.45000 0.09114 37.85 1.09e-14 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.525 on 13 degrees of freedom
## Multiple R-squared: 0.991, Adjusted R-squared: 0.9903
## F-statistic: 1433 on 1 and 13 DF, p-value: 1.091e-14
The F-statistic = 1433 with 1 and 13 degrees of freedom
r-squared is .991
confint(women)
## 2.5 % 97.5 %
## (Intercept) -100.342655 -74.690679
## height 3.253112 3.646888
confint(women, level = .9)
## 5 % 95 %
## (Intercept) -98.030599 -77.002734
## height 3.288603 3.611397
This is saying that we are 95% confident that the weight-intercept will be between -98 and -77 pounds. We are confident that the slope coefficient will be between 3.288603 and 3.611397