This week we learned about logistic regression, which can be used when predicting a variable with a binary response. Logistic regression equations are in the form of: log(p/(1-p)) = B0 + B1x1 + B2x2…. Below are a few practice problems that we did to understand the process.

5.3.1 Female Horseshoe Crab Weight

Our logistic regression equation is: log(p/(1-p)) = -3.6947264 + .0018151*weight
\nAn increase of one gram of weight is associated with a multiplicative change of 1.00181679 of the logodds. This means that heavier females have a higher chance of having satellites, albeit small.
\n A female weighing 2000 grams has a .483874 probability of having one of more satellites. Looks like our girl gotta get more thicc.
\nConducting a Wald test, the pvalue is very small (1.45e-06), meaning that there is a linear relationship between thiccy crabs and having satellites or not.
library(faraway)
crabs <- read.csv("http://www.cknudson.com/data/crabs.csv")
mod0 <- glm(y ~ weight, family = binomial, data = crabs)
summary(mod0)
## 
## Call:
## glm(formula = y ~ weight, family = binomial, data = crabs)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.1108  -1.0749   0.5426   0.9122   1.6285  
## 
## Coefficients:
##               Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -3.6947264  0.8801975  -4.198 2.70e-05 ***
## weight       0.0018151  0.0003767   4.819 1.45e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 225.76  on 172  degrees of freedom
## Residual deviance: 195.74  on 171  degrees of freedom
## AIC: 199.74
## 
## Number of Fisher Scoring iterations: 4
exp(coef(mod0))
## (Intercept)      weight 
##  0.02485425  1.00181679
ilogit(-3.6947264 + .0018151*2000) #This function solves for the logodds for things in the form of log(p/(1-p))
## [1] 0.483874

5.3.2 Boundary Water Blowdown

Our logistic regression equation is: log(p/(1-p)) = -1.702112 + .097556*D
\nAn increase of one centimeter of diameter is associated with a multiplicative change of 1.1024759 in the logodds.
\nThicker trees have a higher chance of dying (RIP) since the multiplicative change is above 1. 
\nA 20 cm tree has a .5619323 probability that it died
\nThere do appear to be a linear relationship between the tree's diameter and its log odds of dying because we find a very very VERY VEEERRRYYYY small pvalue after conducting a Wald test (<2e-16)
blowdown <- read.csv("http://www.cknudson.com/data/blowdown.csv")
head(blowdown)
##    D         S y SPP
## 1  9 0.0217509 0  BA
## 2 14 0.0217509 0  BA
## 3 18 0.0217509 0  BA
## 4 23 0.0217509 0  BA
## 5  9 0.0217509 0  BA
## 6 16 0.0217509 0  BA
mod1 <- glm(y ~ D, family = binomial, data = blowdown)
exp(coef(mod1))
## (Intercept)           D 
##   0.1822981   1.1024759
summary(mod1)
## 
## Call:
## glm(formula = y ~ D, family = binomial, data = blowdown)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -3.6309  -0.9616  -0.7211   1.1495   1.7172  
## 
## Coefficients:
##              Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -1.702112   0.082798  -20.56   <2e-16 ***
## D            0.097558   0.004846   20.13   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 5057.9  on 3665  degrees of freedom
## Residual deviance: 4555.6  on 3664  degrees of freedom
## AIC: 4559.6
## 
## Number of Fisher Scoring iterations: 4
ilogit(-1.702112 + .097556*20)
## [1] 0.5619323

5.3.3 Beer

An increase in one ABV rating is associated with a multiplicative change of 4.332626 in the logodds of being a good beer. This would mean that beers with a higher ABV are more likely to be good beers.
\nThe beer I chose was the only sour on the list, which happened to have the lowest ABV. The probability of my beer being a good beer to that friend is thus quite low at .02582973.
beer <- read.csv("http://www.cknudson.com/data/MNbeer.csv")
head(beer)
##       Brewery         Beer              Description Style ABV IBU Rating Good
## 1     Bauhaus  Wonderstuff     New Bohemian Pilsner Lager 5.4  48     88    0
## 2     Bauhaus    Stargazer German Style Schwarzbier Lager 5.0  28     87    0
## 3     Bauhaus  Wagon Party    West Cost Style Lager Lager 5.4  55     86    0
## 4     Bauhaus    Sky-Five!        Midwest Coast IPA   IPA 6.7  70     86    0
## 5 Bent Paddle         Kanu         Session Pale Ale   Ale 4.8  48     85    0
## 6 Bent Paddle Venture Pils            Pilsner Lager Lager 5.0  38     87    0
mod2 <- glm(Good ~ ABV, family = binomial, data = beer)    
exp(coef(mod2))
##  (Intercept)          ABV 
## 5.611666e-05 4.332636e+00
summary(mod2)
## 
## Call:
## glm(formula = Good ~ ABV, family = binomial, data = beer)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.3847  -0.8206  -0.4663   0.7230   2.1320  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)   
## (Intercept)  -9.7881     3.3959  -2.882  0.00395 **
## ABV           1.4662     0.5481   2.675  0.00747 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 51.564  on 43  degrees of freedom
## Residual deviance: 42.108  on 42  degrees of freedom
## AIC: 46.108
## 
## Number of Fisher Scoring iterations: 5
ilogit(-9.7881+1.4662*4.2)
## [1] 0.02582973