Today in class we learned how to test the significance level and find confidence and prediction intervals for the slope and y-intercept in simple linear regression. We also talked about how to measure the usefulness of our model and how to code each of these in R.
First, I called the data and looked over it. The predictor is the average height (in.) for American women and the response is the average weight (lbs.).
data(women)
head(women)
## height weight
## 1 58 115
## 2 59 117
## 3 60 120
## 4 61 123
## 5 62 126
## 6 63 129
attach(women)
Next I found the slope and y-intercept for the regression line.
mod1 <- lm(weight ~ height, data = women )
mod1
##
## Call:
## lm(formula = weight ~ height, data = women)
##
## Coefficients:
## (Intercept) height
## -87.52 3.45
R has made it extremely easy to find a confidence interval for slope.
confint(mod1, level=.9)
## 5 % 95 %
## (Intercept) -98.030599 -77.002734
## height 3.288603 3.611397
I am 90% confident the true slope lies between 3.289 and 3.611
Correlation measures strength and direction of the linear relationship between the predictor and response.
cor(weight, height)
## [1] 0.9954948
There is very strong positive correlation between average height and average weight.
Next, we look at a prediction interval for the average weight when average height for a woman is 65in.
newdata1 <- data.frame(height=65)
predy1 <- predict(mod1, newdata1, interval="predict")
When a woman is 65in tall, we are 95% confident that her weight is between 133.33 and 140.14 lbs.
Now we consider a confidence interval for all women who are 65 inches tall.
confy1 <- predict(mod1, newdata1, interval="confidence")
When women’s average height is 65 inches, we are 95% confident the true average weight is between 135.88 and 137.58 lbs. The prediction interval is wider than the confidence interval, as expected, because the prediction inteveral is for a woman, rather than the population of all women.
Before we accept our intervals, we need to check where they are centered
confy1[1] == predy1[1]
## [1] TRUE
Finally, we can find information like the test statistics for F and T distributions, r^2, and p-values in the summary.
summary(mod1)
##
## Call:
## lm(formula = weight ~ height, data = women)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.7333 -1.1333 -0.3833 0.7417 3.1167
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -87.51667 5.93694 -14.74 1.71e-09 ***
## height 3.45000 0.09114 37.85 1.09e-14 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.525 on 13 degrees of freedom
## Multiple R-squared: 0.991, Adjusted R-squared: 0.9903
## F-statistic: 1433 on 1 and 13 DF, p-value: 1.091e-14