The data set “Beerwings” has data regarding the number of wings and the amount of beer a person can drink, as well as the person’s gender.
library(resampledata)
##
## Attaching package: 'resampledata'
## The following object is masked from 'package:datasets':
##
## Titanic
data(Beerwings)
attach(Beerwings)
We will examine whether or not the number of hotwings a person eats affects the amount of beer that they drink. To do this, we create a model.
mod <- lm(Beer~Hotwings, data = Beerwings)
mod
##
## Call:
## lm(formula = Beer ~ Hotwings, data = Beerwings)
##
## Coefficients:
## (Intercept) Hotwings
## 3.040 1.941
Then, \(\hat{\beta}_0=3.040\) and \(\hat{\beta}_1=1.941\), so the regression equation is \(y_i=3.040+1.941 x_i+\epsilon_i\). Next, we check to see if beer and hotwings are corelated using a hypothesis test. The null hypothesis, \(H_0\) is that they are not related, or \(\beta_1=0\). The alternative hypothesis \(H_A\) is that \(\beta_1\neq 0\). Then, we look at the summary of the model to determine the results of the hypothesis test:
summary(mod)
##
## Call:
## lm(formula = Beer ~ Hotwings, data = Beerwings)
##
## Residuals:
## Min 1Q Median 3Q Max
## -18.566 -4.537 -0.122 3.671 17.789
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.0404 3.7235 0.817 0.421
## Hotwings 1.9408 0.2903 6.686 2.95e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 7.479 on 28 degrees of freedom
## Multiple R-squared: 0.6148, Adjusted R-squared: 0.6011
## F-statistic: 44.7 on 1 and 28 DF, p-value: 2.953e-07
The p-value for \(\beta_1\) is very small, so we reject the null hypothesis. Thus, we have sufficient evidence to conclude that \(\beta_1\neq 0\), so the number of hotwings consumed and the amount of beer drank are correlated. The slope of the regression equation is positive so the two values are positively correlated.
We can create a confidence interval for \(\beta_1\) and \(\beta_2\):
(CI<-confint(mod, level = .95))
## 2.5 % 97.5 %
## (Intercept) -4.586851 10.667590
## Hotwings 1.346131 2.535371
We are 95% confident that the \(\beta_0\) will be in the interval
CI[1, ]
## 2.5 % 97.5 %
## -4.586851 10.667590
and \(\beta_1\) will be in the interval:
CI[2, ]
## 2.5 % 97.5 %
## 1.346131 2.535371
Next we’ll look at a specific amount of beer consumed and the corresponding number of hotwings. Consider a person who has eaten 9 hotwings. Then, we can create a 95% confidence interval for the amount of beer they drank:
newdata <- data.frame(Hotwings = 9)
predict(mod, newdata, interval = "confidence")
## fit lwr upr
## 1 20.50713 17.2107 23.80356
So, we are 95% confident that given a set of data, the mean amount of beer drank by a person who ate 9 hotwings is between 17.2107 and 23.80356 oz. Similarly, we can create a prediction interval:
newdata <- data.frame(Hotwings = 9)
predict(mod, newdata, interval = "predict")
## fit lwr upr
## 1 20.50713 4.835769 36.17849
Thus, we are 95% confident that a person who ate 9 hotwings drank between 4.838 and 36.178 oz of beer.
Next, we look at variability of the amount of beer consumed. Total variation is the sum of explained variation and unexplained variation. We look at \(r^2\), the explained variation divided by the total variation to find the amoung of variability of the response that’s explained by its linear relationship with the number of hotwings eaten.
summary(mod)
##
## Call:
## lm(formula = Beer ~ Hotwings, data = Beerwings)
##
## Residuals:
## Min 1Q Median 3Q Max
## -18.566 -4.537 -0.122 3.671 17.789
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.0404 3.7235 0.817 0.421
## Hotwings 1.9408 0.2903 6.686 2.95e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 7.479 on 28 degrees of freedom
## Multiple R-squared: 0.6148, Adjusted R-squared: 0.6011
## F-statistic: 44.7 on 1 and 28 DF, p-value: 2.953e-07
From the summary, we see that \(r^2=0.6148\), so 61.48% of the variability of the amount of beer conusmed is explained by the number of hotwings eaten by a person.
Lastly, we look at whether or not there is a linear relationship between the number of hotwings and amount of beer consumed by a person using an F-test. The null hypothsis \(H_0\) is \(\beta=0\) and the alternative hypothesis \(H_A\) is \(\beta\neq 0\).
summary(mod)
##
## Call:
## lm(formula = Beer ~ Hotwings, data = Beerwings)
##
## Residuals:
## Min 1Q Median 3Q Max
## -18.566 -4.537 -0.122 3.671 17.789
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.0404 3.7235 0.817 0.421
## Hotwings 1.9408 0.2903 6.686 2.95e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 7.479 on 28 degrees of freedom
## Multiple R-squared: 0.6148, Adjusted R-squared: 0.6011
## F-statistic: 44.7 on 1 and 28 DF, p-value: 2.953e-07
The F-statistic is 44.7, with degrees of freedom 1 and 28. The p-value is \(2.953 \times 10^-7\). Since the p-value is small, we have sufficient data to reject the null hypothesis for the alternative hypothesis. Thus, our data suggests that there is a linear relationship between the number of hotwings eaten and the amount of beer consumed.