In this guide we will cover: the t-distribution, confidence intervals for the intercept and slope estimates, a confidence interval for the mean of our response variable when our predictor is a constant, a prediction interval for our response variable when our predictor is a constant, correlation, and F-tests. For any test stat and/or interval calculations assume the follwoing hypotheses: \[H_0\] = 0 \[H_A\] \[\neq\] 0

We will be working with the women dataset where height will be our independent variable and weight will be our response variable.

To start, call the data and familiarize yourself with the data.

data(women)
attach(women)

We will create a 99% confidence interval for our intercept and slope estimates. First create a model for the data. Then create the confidence interval.

w.mod <- lm(weight ~ height, data = women)
confint(w.mod, level = .99)
##                   0.5 %     99.5 %
## (Intercept) -105.400380 -69.632954
## height         3.175472   3.724528

The “confint” command gives us the lower and upper bounds for the each of the CIs corresponding to our intercept and slope estimates. Thus, we see that our 99% CI for our intercept estimate is [-105.400380, -69.632954], which means that we are 99% sure that the intercept for this model is between -105.400380 and -69.632954. However, since no woman could have a height of 0 inches, the intercept can’t be interpreted. In addition, our CI for our slpe estimate is [3.175472, 3.724528], which means that our slope for our model is between 3.175472 and 3.724528. This means that for every one inch increase in in a woman’s height, we would expect her to gain somewhere between 3.175472 lbs. and 3.724528 lbs.

By using the “summary” command we can see the individual parts that went into the CI. The estimate gives us our estimates for our intercept and slope , Std. Error gives us our standard error for our estimates, t-value gives us our test stats, and Pr(>|t|) gives us our p-values.

summary(w.mod)
## 
## Call:
## lm(formula = weight ~ height, data = women)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.7333 -1.1333 -0.3833  0.7417  3.1167 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -87.51667    5.93694  -14.74 1.71e-09 ***
## height        3.45000    0.09114   37.85 1.09e-14 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.525 on 13 degrees of freedom
## Multiple R-squared:  0.991,  Adjusted R-squared:  0.9903 
## F-statistic:  1433 on 1 and 13 DF,  p-value: 1.091e-14

Next, let’s calculate the correlation between our response and predictor variables.

cor(height, weight)
## [1] 0.9954948

Our output is 0.9954948, so we can see that there is a strong, positive correlation between our two varibales. Therefore, an increase in x is associated with an increase in y, and this association is strong.

Next, let’s construct a 90% CI for the mean value of our response varibale when our predictor is a constant. First, set our predictor (height) equal to 66 inches and create a new dataset.

new.w.data <- data.frame(height = 66)

Now we can construct the 90% CI.

confy <- predict(w.mod, new.w.data, interval = "confidence")
confy
##        fit      lwr      upr
## 1 140.1833 139.3102 141.0565

We can see that our estimated mean weight for a woman who is 66 inches tall is 140.1833 pounds and our 90% CI is [139.3102, 141.0565]. Thus, we are 90% sure that the mean weight for women who are 66 inches tall is between 139.3102 lbs. and 141.0565 lbs.

Now, lets create a prediciton interval for a woman’s weight when her height is 66 inches.

predy <- predict(w.mod, new.w.data, interval = "prediction")
predy
##        fit     lwr      upr
## 1 140.1833 136.775 143.5916

We see that our estimated weight is the same as it was for our confidence interval, but the 90% prediciton interval is [136.775, 143.5916]. Therefore, we are 90% sure that if a woman is 66 inches tall she will weigh between 136.775 lbs. and 143.5916 lbs.

Let’s compare the width of our confidence interval with the width of our prediction interval.

confy %*% c(0, -1, 1)
##       [,1]
## 1 1.746287
predy %*% c(0, -1, 1)
##       [,1]
## 1 6.816625

We can see that the width of our confidenc einterval is 1 1.746287, and the width of our prediciton interval is 6.816625. This makes sense becasue the variance of the mean of the y’s is smaller than the variance of any individual y value.

Let’s confrim that our CI and prediction interval are centered at the same point.

confy[1] == predy[1]
## [1] TRUE

Indeed they are.

Lastly, conducting an F-test in this case isn’t necessary because our model is a simple linear regression model with only two variables. However, we could gather information about a possible F-distribution from the “summary” command.

summary(w.mod)
## 
## Call:
## lm(formula = weight ~ height, data = women)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.7333 -1.1333 -0.3833  0.7417  3.1167 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -87.51667    5.93694  -14.74 1.71e-09 ***
## height        3.45000    0.09114   37.85 1.09e-14 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.525 on 13 degrees of freedom
## Multiple R-squared:  0.991,  Adjusted R-squared:  0.9903 
## F-statistic:  1433 on 1 and 13 DF,  p-value: 1.091e-14

The summary function shows us that: our test stat for the F-distribution is 1433, the degrees of freedom are 1 and 13, and our p-value is 1.091 * 10^-14. Since our p-valuse is so small, we would reject our null hypothesis is favor of the alternative.