Learning Log 4

library(resampledata)
## 
## Attaching package: 'resampledata'
## The following object is masked from 'package:datasets':
## 
##     Titanic
attach(Beerwings)
head(Beerwings)
##   ID Hotwings Beer Gender
## 1  1        4   24      F
## 2  2        5    0      F
## 3  3        5   12      F
## 4  4        6   12      F
## 5  5        7   12      F
## 6  6        7   12      F

Predictor: number of hotwings Response: oz of beer H(o) = b(1) = 0 H(a) = b(1) does not = 0

beer.mod <- lm(Beer ~ Hotwings, data = Beerwings)
beer.mod
## 
## Call:
## lm(formula = Beer ~ Hotwings, data = Beerwings)
## 
## Coefficients:
## (Intercept)     Hotwings  
##       3.040        1.941

b(1) = 1.941 b(0) = 3.040

summary(Beerwings)
##        ID           Hotwings          Beer      Gender
##  Min.   : 1.00   Min.   : 4.00   Min.   : 0.0   F:15  
##  1st Qu.: 8.25   1st Qu.: 8.00   1st Qu.:24.0   M:15  
##  Median :15.50   Median :12.50   Median :30.0         
##  Mean   :15.50   Mean   :11.93   Mean   :26.2         
##  3rd Qu.:22.75   3rd Qu.:15.50   3rd Qu.:36.0         
##  Max.   :30.00   Max.   :21.00   Max.   :48.0

Confidence Intervals for Regression Coefficients

confint(beer.mod, level = .9)
##                   5 %     95 %
## (Intercept) -3.293772 9.374511
## Hotwings     1.446940 2.434562

For every additional hotwing consumed, we are 90% confident that they will drink between [1.4469, 2.4346] more oz of beer

Correlation between Beer and Hotwings

cor(Beer, Hotwings)
## [1] 0.7841224
cor(Hotwings, Beer)
## [1] 0.7841224

There is a .7841 correlation, so that means that they are pretty strongly related it is also positive, so the more hotwings one eats, the more beer one will drink

Creating a New Dataframe

newdata <- data.frame(Hotwings = 18)

We created the new data frame so that the number of hotwings consumed is fixed on 18, for our intervals

Prediction interval for oz of beer when a person ate 18 hotwings

pred.int <- predict(beer.mod, newdata, interval="predict")
pred.int
##        fit      lwr      upr
## 1 37.97389 21.98757 53.96021

Bounds [21.9876, 53.9602] with fit (pt est) being 37.9739

Confidence interval for oz of beer when a person ate 18 hotwings

confid.int <- predict(beer.mod, newdata, interval="confidence")
confid.int
##        fit      lwr      upr
## 1 37.97389 33.40911 42.53867

Bounds [33.4091, 42.5387], with fit (pt est) being 37.9739

Which interval is wider?

confid.int %*% c(0, -1, 1)
##      [,1]
## 1 9.12956
pred.int %*% c(0, -1, 1)
##       [,1]
## 1 31.97263

The prediction interval is 31.9726 and the confid is 9.1296 we assume that the prediction interval will be wider because it is only using one data point, not the mean of all of them

We want to make sure that they are centered at the same spot

confid.int[1] == pred.int[1]
## [1] TRUE

It is true so that means that they are centered at the same spot!

Our final equation!

\[\hat{y_i}= {3.040}+{1.9141} x_i\]