\[\hat{y_i}= \hat{\beta_0}+\hat{\beta_0} x_i\]
Today we talked about testing the significance of the slope (beta1). This is important because it shows if x & y are related. We also talked about how to do this for the intercept estimator, but found this caluclation less helpful. We also talked baout CI and predection intervals when x is a chosen value, xo. We learned about r squared and r and when they can be used. Finally we touched on F-test, which we will use more in multiple linear regression models.
I decided to use the iris data to explore what we talked about today.
data(iris)
attach(iris)
mod<-lm(Sepal.Length~Petal.Length)
confint(mod,level=.9)
## 5 % 95 %
## (Intercept) 4.1768529 4.4363540
## Petal.Length 0.3776531 0.4401915
cor(Petal.Length,Sepal.Length)
## [1] 0.8717538
above I created a confidence interaval for the slpe and intercept of the linear regression model. I also found the correlation (r)between the 2 variables. It is pretty high, meaning they are pretty correlated. *Keep in mind (r) can only be used in SIMPLE linear regression models.
Here I created a new data frame, looking specifically at when petal length is 1.4. Then, I created a prediction interval for Sepal Length when Petal length (our xo) is 1.4.
new.data<-data.frame(Petal.Length=1.4)
(predsl <- predict(mod, new.data, interval="predict") )
## fit lwr upr
## 1 4.879095 4.067202 5.690987
Next, I made a CI for for mean sepal length when petal length is 1.4
(confsl <- predict(mod, new.data, interval="confidence") )
## fit lwr upr
## 1 4.879095 4.769263 4.988926
we can check to see that the confidence interval is less wide or smaller, than the prediction interval which makes sense because the variance is smaller for a mean than a single point. We can also see that they are centered at the same point.
confsl %*% c(0, -1, 1)
## [,1]
## 1 0.219663
predsl %*% c(0, -1, 1)
## [,1]
## 1 1.623785
confsl[1] == predsl[1]
## [1] TRUE
We also talked a lot today about what information we can get from the command summary(mod) seen below
summary(mod)
##
## Call:
## lm(formula = Sepal.Length ~ Petal.Length)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.24675 -0.29657 -0.01515 0.27676 1.00269
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.30660 0.07839 54.94 <2e-16 ***
## Petal.Length 0.40892 0.01889 21.65 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.4071 on 148 degrees of freedom
## Multiple R-squared: 0.76, Adjusted R-squared: 0.7583
## F-statistic: 468.6 on 1 and 148 DF, p-value: < 2.2e-16
It has a lot of info about stuff we talked about today. “Multiple R-squared” gives us the r-squared statistic we talked about which is the proportion of total var. that is explained by our simple linear regression model. It also gives our F-stat. For simple linear regression models, the f-test will be the smae as the t-test. When we get into multiple linear regression models apparently this will not be the case.
Overall, today’s concepts fit in well with what we have been doing because it gives us more ways to evaluate simple linear regression models. Also, it includes some conepts that will be helpful when we get into multiple linear regression models!