In the learning log for today I will be examining the relationship between amount of beer consumed based on number of hotwings eaten. I think that a larger amount of hotwings counsumed will lead to more beer consumed in order to keep a cool mouth. I’ll start with a summary to get a feel for the data.

library(resampledata)
## 
## Attaching package: 'resampledata'
## The following object is masked from 'package:datasets':
## 
##     Titanic
data(Beerwings)
summary(Beerwings)
##        ID           Hotwings          Beer      Gender
##  Min.   : 1.00   Min.   : 4.00   Min.   : 0.0   F:15  
##  1st Qu.: 8.25   1st Qu.: 8.00   1st Qu.:24.0   M:15  
##  Median :15.50   Median :12.50   Median :30.0         
##  Mean   :15.50   Mean   :11.93   Mean   :26.2         
##  3rd Qu.:22.75   3rd Qu.:15.50   3rd Qu.:36.0         
##  Max.   :30.00   Max.   :21.00   Max.   :48.0
attach(Beerwings)

Next we want to make a plot to visualize the data. A scatterplot will help us see if there is a correlation between number of hotwings eaten and and amount of beer consumed.

plot(Hotwings,Beer)

The plot shows a general upward trend in beer consumed as more hotwings are eaten. Furthermore, the variance in beer consumed stays relatively constant as the number of hotwings increases. Next we should create a linear model and put it on the plot as a line of best fit.

mymod <- lm(Beer ~ Hotwings)
mymod
## 
## Call:
## lm(formula = Beer ~ Hotwings)
## 
## Coefficients:
## (Intercept)     Hotwings  
##       3.040        1.941
plot(Hotwings,Beer)
abline(3.040, 1.941)

This output means that our intercept for eating zero wings is 3.04 fluid ounces of beer and for ever hot wing eaten we expect that person to drink 1.941 more fluid ounces of beer. It is also seen on the plot that this is an accurate line of best fit so we know everything was inputed correctly. Now we can use this regression model to predict the amount of beer consumed by a person. Perhaps we pick contestant thirteen. To get the amount of hotwings consumed by person 13 in the study we simply input

Beerwings[13,2]
## [1] 11

We get 11 hotwings consumed by this person. We can use our model to predict the amount of beer consumed by this person.

3.040 + 1.941*11
## [1] 24.391

We end up with a prediction of 24.391 fluid ounces of beer consumed. We can check the residual to see how close our preiction was.

act.val <- Beerwings[13,3]
act.val - 24.391
## [1] -0.391

In this case our prediction was about .4 fluid ounces too high, which is very close. It would be helpful to see if our residuals are normally distributed or not. to do this we can make a histogram of them.

myresids <- mymod$residuals
hist(myresids)

We can see that this histogram does follow the shape of a normal distribution pretty closely. We can actually plot the quantiles of a normal distribution against the quantiles of our residuals to see how close it really is.

qqnorm(myresids)
qqline(myresids)

Most of the dots are on or very near the line which is a very good. As mentioned earlier, the variance of the graph seems to stay pretty constant, which means homoscedasticity. Still, we should make another graph to double check this assumption.

plot(mymod$residuals ~ Hotwings)
abline(0,0)

Upon further glance we can see that there is definotely clear evidence of heteroscedasticity. One last thing we can check is the mean squared error which can be done simply by summarizing our model.

summary(mymod)
## 
## Call:
## lm(formula = Beer ~ Hotwings)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -18.566  -4.537  -0.122   3.671  17.789 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   3.0404     3.7235   0.817    0.421    
## Hotwings      1.9408     0.2903   6.686 2.95e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 7.479 on 28 degrees of freedom
## Multiple R-squared:  0.6148, Adjusted R-squared:  0.6011 
## F-statistic:  44.7 on 1 and 28 DF,  p-value: 2.953e-07