Today in class we learned how to test the significance level and find confidence and prediction intervals for the slope and y-intercept in simple linear regression. We also talked about how to measure the usefulness of our model and how to code each of these in R.

First, I called the data and looked over it. The predictor is the average height (in.) for American women and the response is the average weight (lbs.).

data(women)
head(women)
##   height weight
## 1     58    115
## 2     59    117
## 3     60    120
## 4     61    123
## 5     62    126
## 6     63    129
attach(women)

Next I found the slope and y-intercept for the regression line.

mod1 <-  lm(weight ~ height, data = women )  
mod1 
## 
## Call:
## lm(formula = weight ~ height, data = women)
## 
## Coefficients:
## (Intercept)       height  
##      -87.52         3.45

Confidence Interval

R has made it extremely easy to find a confidence interval for slope.

confint(mod1, level=.9)
##                    5 %       95 %
## (Intercept) -98.030599 -77.002734
## height        3.288603   3.611397

I am 90% confident the true slope lies between 3.289 and 3.611

Correlation

Correlation measures strength and direction of the linear relationship between the predictor and response.

cor(weight, height)
## [1] 0.9954948

There is very strong positive correlation between average height and average weight.

Confidence Intervals and Prediction

Next, we look at a prediction interval for the average weight when average height for a woman is 65in.

newdata1 <- data.frame(height=65)
predy1 <- predict(mod1, newdata1, interval="predict") 

When a woman is 65in tall, we are 95% confident that her weight is between 133.33 and 140.14 lbs.

Now we consider a confidence interval for all women who are 65 inches tall.

confy1 <- predict(mod1, newdata1, interval="confidence") 

When women’s average height is 65 inches, we are 95% confident the true average weight is between 135.88 and 137.58 lbs. The prediction interval is wider than the confidence interval, as expected, because the prediction inteveral is for a woman, rather than the population of all women.

Before we accept our intervals, we need to check where they are centered

confy1[1] == predy1[1]
## [1] TRUE

Finally, we can find information like the test statistics for F and T distributions, r^2, and p-values in the summary.

summary(mod1)
## 
## Call:
## lm(formula = weight ~ height, data = women)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.7333 -1.1333 -0.3833  0.7417  3.1167 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -87.51667    5.93694  -14.74 1.71e-09 ***
## height        3.45000    0.09114   37.85 1.09e-14 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.525 on 13 degrees of freedom
## Multiple R-squared:  0.991,  Adjusted R-squared:  0.9903 
## F-statistic:  1433 on 1 and 13 DF,  p-value: 1.091e-14