Today we learned about how to conduct a hypothesis test to see whether or not the slope (i.e. \[\beta_1\]) is significant. This is important because the linear regression model isn’t useful unless there is a significant relationship between y and x.
\[\hat{y_i}= \hat{\beta_0}+\hat{\beta_1} x_i\]
The hypotheses for this are as follows: \[H_0:\beta_1=0\]
In other words, the slope of the regression line is equal to zero under the null hypothesis. \[H_A:\beta_1\ne0\]
In other words, the slope of the regression line is not equal to zero under the alternative hypothesis
Initially, I applied this to the dataset “women”, which I called and attached. I used the lm function to create a model of the data, with weight being the response variable(y-axis), and height being the predictor variable(x-axis). Then, I used a 90% confidence interval with the function confint for the regression coefficients
data(women)
attach(women)
wom.mod<-lm(weight~height,data=women)
confint(wom.mod,level=0.9)
## 5 % 95 %
## (Intercept) -98.030599 -77.002734
## height 3.288603 3.611397
Thus we are 90% confident that the intercept of the model is in between -98.030599 and -77.002734, and the slope is in between 3.288603 and 3.611397. The intercept cannot be interpreted because it is considered extrapolation, as a height of 0 inches is not possible. For the slope however, it means that for every inch increase in height, there is in between a 3.288603 and 3.611397 pound increase.
Next, I used the cor function to see the correlation between height and weight.
cor(height,weight)
## [1] 0.9954948
The resulting value is 0.9954948, which means there is a really strong positive correlation between height and weight. This is because it is really close to 1, which represents a perfectly positive correlation.
I created a new data frame so that the height of a woman is constant at 67 inches.
newdata<-data.frame(height=67)
The prediction interval is the interval in which the value of y falls into it with a probability of \[1-\alpha\]. To calculate this I used the prediction function with the interval specified as “predict”.
(predy<-predict(wom.mod,newdata,interval="predict"))
## fit lwr upr
## 1 143.6333 140.208 147.0587
predy%*%c(0,-1,1)
## [,1]
## 1 6.850661
We are 95% confident that the value of y falls in between 140.208 and 147.0587. Also, the width of the interval is 6.850661.
The confidence interval is the interval in which the mean value of y falls into it with a probability of \[1-\alpha\]. To calculate this I used the prediction function with the interval specified as “confidence”.
(confy<-predict(wom.mod,newdata,interval="confidence"))
## fit lwr upr
## 1 143.6333 142.696 144.5707
confy%*%c(0,-1,1)
## [,1]
## 1 1.874753
We are 95% confident that the mean value of y falls in between 142.696 and 144.5707. Also, the width of the interval is 1.874753, which is much smaller than the prediction interval. This is due to the fact that the variance of the mean of y is going to be much smaller than any individual value of y.
Also, we want to confirm that the confidence interval and the prediction interval are centered at the same spot.
confy[1]==predy[1]
## [1] TRUE
Since this statement is true, they are in fact centered in the same spot.
Another way we can test the linear relationship between x and y is through an F-test. However, in this case, it isn’t necessary since the p-value is going to be the same as the test we just conducted.
summary(wom.mod)
##
## Call:
## lm(formula = weight ~ height, data = women)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.7333 -1.1333 -0.3833 0.7417 3.1167
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -87.51667 5.93694 -14.74 1.71e-09 ***
## height 3.45000 0.09114 37.85 1.09e-14 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.525 on 13 degrees of freedom
## Multiple R-squared: 0.991, Adjusted R-squared: 0.9903
## F-statistic: 1433 on 1 and 13 DF, p-value: 1.091e-14
As you can see, the p-value for the slope is 1.09 e-14, which is the same as the p-value for the F-test. Either way, the p-value is much smaller than alpha, and therefore the null hypothesis is rejected, and the alternative is accepted.