library(resampledata)
##
## Attaching package: 'resampledata'
## The following object is masked from 'package:datasets':
##
## Titanic
attach(Beerwings)
head(Beerwings)
## ID Hotwings Beer Gender
## 1 1 4 24 F
## 2 2 5 0 F
## 3 3 5 12 F
## 4 4 6 12 F
## 5 5 7 12 F
## 6 6 7 12 F
Predictor: number of hotwings Response: oz of beer H(o) = b(1) = 0 H(a) = b(1) does not = 0
beer.mod <- lm(Beer ~ Hotwings, data = Beerwings)
beer.mod
##
## Call:
## lm(formula = Beer ~ Hotwings, data = Beerwings)
##
## Coefficients:
## (Intercept) Hotwings
## 3.040 1.941
b(1) = 1.941 b(0) = 3.040
summary(Beerwings)
## ID Hotwings Beer Gender
## Min. : 1.00 Min. : 4.00 Min. : 0.0 F:15
## 1st Qu.: 8.25 1st Qu.: 8.00 1st Qu.:24.0 M:15
## Median :15.50 Median :12.50 Median :30.0
## Mean :15.50 Mean :11.93 Mean :26.2
## 3rd Qu.:22.75 3rd Qu.:15.50 3rd Qu.:36.0
## Max. :30.00 Max. :21.00 Max. :48.0
Confidence Intervals for Regression Coefficients
confint(beer.mod, level = .9)
## 5 % 95 %
## (Intercept) -3.293772 9.374511
## Hotwings 1.446940 2.434562
For every additional hotwing consumed, we are 90% confident that they will drink between [1.4469, 2.4346] more oz of beer
Correlation between Beer and Hotwings
cor(Beer, Hotwings)
## [1] 0.7841224
cor(Hotwings, Beer)
## [1] 0.7841224
There is a .7841 correlation, so that means that they are pretty strongly related it is also positive, so the more hotwings one eats, the more beer one will drink
Creating a New Dataframe
newdata <- data.frame(Hotwings = 18)
We created the new data frame so that the number of hotwings consumed is fixed on 18, for our intervals
Prediction interval for oz of beer when a person ate 18 hotwings
pred.int <- predict(beer.mod, newdata, interval="predict")
pred.int
## fit lwr upr
## 1 37.97389 21.98757 53.96021
Bounds [21.9876, 53.9602] with fit (pt est) being 37.9739
Confidence interval for oz of beer when a person ate 18 hotwings
confid.int <- predict(beer.mod, newdata, interval="confidence")
confid.int
## fit lwr upr
## 1 37.97389 33.40911 42.53867
Bounds [33.4091, 42.5387], with fit (pt est) being 37.9739
Which interval is wider?
confid.int %*% c(0, -1, 1)
## [,1]
## 1 9.12956
pred.int %*% c(0, -1, 1)
## [,1]
## 1 31.97263
The prediction interval is 31.9726 and the confid is 9.1296 we assume that the prediction interval will be wider because it is only using one data point, not the mean of all of them
We want to make sure that they are centered at the same spot
confid.int[1] == pred.int[1]
## [1] TRUE
It is true so that means that they are centered at the same spot!
\[\hat{y_i}= {3.040}+{1.9141} x_i\]