Learning Log Day 4

First we need to call the data and attach it. I decided to work with Beerwings.

library(resampledata)

## 
## Attaching package: 'resampledata'

## The following object is masked from 'package:datasets':
## 
##     Titanic

data(Beerwings)
attach(Beerwings)

Then we can look at the names of the variables and the summary of the data so that we can decide which variables to use.

names(Beerwings)

## [1] "ID"       "Hotwings" "Beer"     "Gender"

summary(Beerwings)

##        ID           Hotwings          Beer      Gender
##  Min.   : 1.00   Min.   : 4.00   Min.   : 0.0   F:15  
##  1st Qu.: 8.25   1st Qu.: 8.00   1st Qu.:24.0   M:15  
##  Median :15.50   Median :12.50   Median :30.0         
##  Mean   :15.50   Mean   :11.93   Mean   :26.2         
##  3rd Qu.:22.75   3rd Qu.:15.50   3rd Qu.:36.0         
##  Max.   :30.00   Max.   :21.00   Max.   :48.0

I picked Beer to be the predictor and Hotwings to be the response.

The regression equation is: \[\hat{y_i}= \hat{\beta_0}+\hat{\beta_0} x_i\] The regression equation for this data is: \[\hat{y_i}= 3.63293+0.31681 x_i\] This means that if you had 0 ounces of beer, you’d still have 3.6 hotwings. Each ounce increase of beer is a 0.3 increase in hotwings.

Now we can create a linear model for the data and look at the summary for it.

mod3 <- lm(Hotwings ~ Beer)
summary(mod3)

## 
## Call:
## lm(formula = Hotwings ~ Beer)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -7.2364 -1.3603 -0.1868  1.2658  5.9619 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  3.63293    1.35859   2.674   0.0124 *  
## Beer         0.31681    0.04739   6.686 2.95e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.022 on 28 degrees of freedom
## Multiple R-squared:  0.6148, Adjusted R-squared:  0.6011 
## F-statistic:  44.7 on 1 and 28 DF,  p-value: 2.953e-07

The summary can show a lot of useful information. This is where you can find the slope and intercept, the standard errors, the degrees of freedom, R-squared, the F-statistic and p-value and more.

We can create a 95% confidence interval for regression coefficients. The standard is a 95% confidence interval so we don’t need to add a level.

We can also find the correlation between Hotwings and Beer. Both ways give the same result.

confint(mod3)

##                 2.5 %    97.5 %
## (Intercept) 0.8499936 6.4158665
## Beer        0.2197432 0.4138753

cor(Hotwings,Beer)

## [1] 0.7841224

cor(Beer,Hotwings)

## [1] 0.7841224

Now we can make a new data set with the Beer variable set to a value within the range. This helps us predict the number of Hotwings based on a Beer amount of 30 fluid ounces.

newdata3 <- data.frame(Beer=30)

We can see the prediction intveral and the confidence interval. The prediction interval will always be larger because the data is for just one data set rather than the average of all of the data.

predy3 <- predict(mod3, newdata3, interval="predict")
predy3

##        fit      lwr      upr
## 1 13.13721 6.834041 19.44038

confy3 <- predict(mod3, newdata3, interval="confidence")
confy3

##        fit     lwr      upr
## 1 13.13721 11.9484 14.32602

We can also look at the size of the intervals and check to see if the fit is equal for both.

predy3%*%c(0,-1,1)

##       [,1]
## 1 12.60634

confy3%*%c(0,-1,1)

##       [,1]
## 1 2.377623

confy3[1]==predy3[1]

## [1] TRUE

Learning Log Day 4

Megan Blaschko

February 8, 2018