Concept and General Idea

Today in class we covered a few different concepts. First, was a significance test for the slope of our linear regression as well as how to create a confidence interval for the slope. We also discussed the distance value in order to create prediction intervals and confidence intervals for estimates. Finally, we talked about r and r^2 as well as the F-test to understand how well our model is fitting our data.

When/why do we use this?

We use these concepts when looking at models for linear regression because we need to know how well our model actually fits the data, i.e. if it is useful for making preditions. R can create a model for any pair of quantitative data, but the only way to really tell whether a model is useful is to run statistical tests on it to understand how well it is fitting and describing the variation in the data.

Example

The following example will illustrate how to use some useful functions when dealing with these concepts:

First, we create our model and look at our 95% confidence interval.

library(resampledata)
## 
## Attaching package: 'resampledata'
## The following object is masked from 'package:datasets':
## 
##     Titanic
mod <- lm(Beer ~ Hotwings, data = Beerwings)
confint(mod)
##                 2.5 %    97.5 %
## (Intercept) -4.586851 10.667590
## Hotwings     1.346131  2.535371
summary(mod)
## 
## Call:
## lm(formula = Beer ~ Hotwings, data = Beerwings)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -18.566  -4.537  -0.122   3.671  17.789 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   3.0404     3.7235   0.817    0.421    
## Hotwings      1.9408     0.2903   6.686 2.95e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 7.479 on 28 degrees of freedom
## Multiple R-squared:  0.6148, Adjusted R-squared:  0.6011 
## F-statistic:  44.7 on 1 and 28 DF,  p-value: 2.953e-07

This confidence interval is the confidence interval for our intercept and slope. If we look at the summary information for our model, we see that the number of hot wings consumed is a significant contributer to the ounces of beer drank and this makes sense with our confidence interval because it does not contain 0. Still looking at the summary, we see we have a significant F-statistic meaning our slope is significant in modeling the data and our r^2 is .6148 meaning that the number of hot wings consumed accounts for 61.48% of the variation seen in the ounces of beer drank. Next, we look at the correlation between our two variables.

cor(Beerwings$Beer, Beerwings$Hotwings)
## [1] 0.7841224

This correlation coefficient is pretty high, meaning that our varibales have a fairly strong linear relationship. Next, we can look at a prediction interval for someone who consumed 14 hot wings.

new <- data.frame(Hotwings = 14)
predict(mod, new, interval = "predict")
##        fit      lwr      upr
## 1 30.21089 14.58848 45.83329

We are 95% confident that the interval 14.59 to 45.83 contains the true mean ounces of beer drank when that person consumed 14 hot wings. Next, we create a confidence interval for our estimate.

predict(mod, new, interval = "confidence")
##        fit      lwr     upr
## 1 30.21089 27.15567 33.2661

We are 95% confident that the interval 27.16 to 33.27 contains the true mean ounces of beer drank for everyone who consumes 14 hot wings. We can see that the prediction interval is wider than the confidence interval because the prediction interval refers to a singular person rather than the population.

Comparison to Other Topics

This topic fits really well with the other topics we’ve learned because we’re building on how to properly use and analyze linear regression models. These concepts also fit really well into our course because one of the main goals of the course is properly using and analyzing regression models.