Today we learned about how to conduct a hypothesis test to see whether or not the slope (i.e. \[\beta_1\]) is significant. This is important because the linear regression model isn’t useful unless there is a significant relationship between y and x.
\[\hat{y_i}= \hat{\beta_0}+\hat{\beta_1} x_i\]

Hypotheses

The hypotheses for this are as follows: \[H_0:\beta_1=0\]

In other words, the slope of the regression line is equal to zero under the null hypothesis. \[H_A:\beta_1\ne0\]

In other words, the slope of the regression line is not equal to zero under the alternative hypothesis

Application

Initially, I applied this to the dataset “women”, which I called and attached. I used the lm function to create a model of the data, with weight being the response variable(y-axis), and height being the predictor variable(x-axis). Then, I used a 90% confidence interval with the function confint for the regression coefficients

data(women)
attach(women)
wom.mod<-lm(weight~height,data=women)
confint(wom.mod,level=0.9)
##                    5 %       95 %
## (Intercept) -98.030599 -77.002734
## height        3.288603   3.611397

Thus we are 90% confident that the intercept of the model is in between -98.030599 and -77.002734, and the slope is in between 3.288603 and 3.611397. The intercept cannot be interpreted because it is considered extrapolation, as a height of 0 inches is not possible. For the slope however, it means that for every inch increase in height, there is in between a 3.288603 and 3.611397 pound increase.

Correlation

Next, I used the cor function to see the correlation between height and weight.

cor(height,weight)
## [1] 0.9954948

The resulting value is 0.9954948, which means there is a really strong positive correlation between height and weight. This is because it is really close to 1, which represents a perfectly positive correlation.

Data Frame

I created a new data frame so that the height of a woman is constant at 67 inches.

newdata<-data.frame(height=67)

Prediction Interval

The prediction interval is the interval in which the value of y falls into it with a probability of \[1-\alpha\]. To calculate this I used the prediction function with the interval specified as “predict”.

(predy<-predict(wom.mod,newdata,interval="predict"))
##        fit     lwr      upr
## 1 143.6333 140.208 147.0587
predy%*%c(0,-1,1)
##       [,1]
## 1 6.850661

We are 95% confident that the value of y falls in between 140.208 and 147.0587. Also, the width of the interval is 6.850661.

Confidence Interval

The confidence interval is the interval in which the mean value of y falls into it with a probability of \[1-\alpha\]. To calculate this I used the prediction function with the interval specified as “confidence”.

(confy<-predict(wom.mod,newdata,interval="confidence"))
##        fit     lwr      upr
## 1 143.6333 142.696 144.5707
confy%*%c(0,-1,1)
##       [,1]
## 1 1.874753

We are 95% confident that the mean value of y falls in between 142.696 and 144.5707. Also, the width of the interval is 1.874753, which is much smaller than the prediction interval. This is due to the fact that the variance of the mean of y is going to be much smaller than any individual value of y.

Also, we want to confirm that the confidence interval and the prediction interval are centered at the same spot.

confy[1]==predy[1]
## [1] TRUE

Since this statement is true, they are in fact centered in the same spot.

F-test

Another way we can test the linear relationship between x and y is through an F-test. However, in this case, it isn’t necessary since the p-value is going to be the same as the test we just conducted.

summary(wom.mod)
## 
## Call:
## lm(formula = weight ~ height, data = women)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.7333 -1.1333 -0.3833  0.7417  3.1167 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -87.51667    5.93694  -14.74 1.71e-09 ***
## height        3.45000    0.09114   37.85 1.09e-14 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.525 on 13 degrees of freedom
## Multiple R-squared:  0.991,  Adjusted R-squared:  0.9903 
## F-statistic:  1433 on 1 and 13 DF,  p-value: 1.091e-14

As you can see, the p-value for the slope is 1.09 e-14, which is the same as the p-value for the F-test. Either way, the p-value is much smaller than alpha, and therefore the null hypothesis is rejected, and the alternative is accepted.