\[\hat{y_i}= \hat{\beta_0}+\hat{\beta_0} x_i\]

Today we talked about testing the significance of the slope (beta1). This is important because it shows if x & y are related. We also talked about how to do this for the intercept estimator, but found this caluclation less helpful. We also talked baout CI and predection intervals when x is a chosen value, xo. We learned about r squared and r and when they can be used. Finally we touched on F-test, which we will use more in multiple linear regression models.

I decided to use the iris data to explore what we talked about today.

data(iris)
attach(iris)

mod<-lm(Sepal.Length~Petal.Length)
confint(mod,level=.9)
##                    5 %      95 %
## (Intercept)  4.1768529 4.4363540
## Petal.Length 0.3776531 0.4401915
cor(Petal.Length,Sepal.Length)
## [1] 0.8717538

above I created a confidence interaval for the slpe and intercept of the linear regression model. I also found the correlation (r)between the 2 variables. It is pretty high, meaning they are pretty correlated. *Keep in mind (r) can only be used in SIMPLE linear regression models.

Here I created a new data frame, looking specifically at when petal length is 1.4. Then, I created a prediction interval for Sepal Length when Petal length (our xo) is 1.4.

new.data<-data.frame(Petal.Length=1.4)
(predsl <- predict(mod, new.data, interval="predict") )
##        fit      lwr      upr
## 1 4.879095 4.067202 5.690987

Next, I made a CI for for mean sepal length when petal length is 1.4

(confsl <- predict(mod, new.data, interval="confidence") )
##        fit      lwr      upr
## 1 4.879095 4.769263 4.988926

we can check to see that the confidence interval is less wide or smaller, than the prediction interval which makes sense because the variance is smaller for a mean than a single point. We can also see that they are centered at the same point.

confsl %*% c(0, -1, 1)  
##       [,1]
## 1 0.219663
predsl %*% c(0, -1, 1)
##       [,1]
## 1 1.623785
confsl[1] == predsl[1]
## [1] TRUE

We also talked a lot today about what information we can get from the command summary(mod) seen below

summary(mod)
## 
## Call:
## lm(formula = Sepal.Length ~ Petal.Length)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.24675 -0.29657 -0.01515  0.27676  1.00269 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   4.30660    0.07839   54.94   <2e-16 ***
## Petal.Length  0.40892    0.01889   21.65   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.4071 on 148 degrees of freedom
## Multiple R-squared:   0.76,  Adjusted R-squared:  0.7583 
## F-statistic: 468.6 on 1 and 148 DF,  p-value: < 2.2e-16

It has a lot of info about stuff we talked about today. “Multiple R-squared” gives us the r-squared statistic we talked about which is the proportion of total var. that is explained by our simple linear regression model. It also gives our F-stat. For simple linear regression models, the f-test will be the smae as the t-test. When we get into multiple linear regression models apparently this will not be the case.

Overall, today’s concepts fit in well with what we have been doing because it gives us more ways to evaluate simple linear regression models. Also, it includes some conepts that will be helpful when we get into multiple linear regression models!