Logistic Regression

crabs <- read.csv("http://www.cknudson.com/data/crabs.csv")
library(faraway)
head(crabs)

##    color spine width satell weight y
## 1 medium   bad  28.3      8   3050 1
## 2   dark   bad  22.5      0   1550 0
## 3  light  good  26.0      9   2300 1
## 4   dark   bad  24.8      0   2100 0
## 5   dark   bad  26.0      4   2600 1
## 6 medium   bad  23.8      0   2100 0

mod <- glm(y ~ width, family = binomial, data = crabs)
coef(mod)

## (Intercept)       width 
## -12.3508177   0.4972306

summary(mod)

## 
## Call:
## glm(formula = y ~ width, family = binomial, data = crabs)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.0281  -1.0458   0.5480   0.9066   1.6942  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -12.3508     2.6287  -4.698 2.62e-06 ***
## width         0.4972     0.1017   4.887 1.02e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 225.76  on 172  degrees of freedom
## Residual deviance: 194.45  on 171  degrees of freedom
## AIC: 198.45
## 
## Number of Fisher Scoring iterations: 4

Half-Time Exercises

5.3.1 Female Horseshoe Crab Weight

Our logistic regression equation is: log(p/(1-p)) = -3.6947264 + .0018151*weight
\nAn increase of one gram of weight is associated with a multiplicative change of 1.00181679 of the logodds. This means that heavier females have a higher chance of having satellites, albeit small.
\n A female weighing 2000 grams has a .483874 probability of having one of more satellites. Looks like our girl gotta get more thicc.
\nConducting a Wald test, the pvalue is very small (1.45e-06), meaning that there is a linear relationship between thiccy crabs and having satellites or not.

mod0 <- glm(y ~ weight, family = binomial, data = crabs)
summary(mod0)

## 
## Call:
## glm(formula = y ~ weight, family = binomial, data = crabs)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.1108  -1.0749   0.5426   0.9122   1.6285  
## 
## Coefficients:
##               Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -3.6947264  0.8801975  -4.198 2.70e-05 ***
## weight       0.0018151  0.0003767   4.819 1.45e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 225.76  on 172  degrees of freedom
## Residual deviance: 195.74  on 171  degrees of freedom
## AIC: 199.74
## 
## Number of Fisher Scoring iterations: 4

exp(coef(mod0))

## (Intercept)      weight 
##  0.02485425  1.00181679

ilogit(-3.6947264 + .0018151*2000)

## [1] 0.483874

5.3.2 Boundary Water Blowdown

Our logistic regression equation is: log(p/(1-p)) = -1.702112 + .097556*D
\nAn increase of one centimeter of diameter is associated with a multiplicative change of 1.1024759 in the logodds.
\nThicker trees have a higher chance of dying (RIP) since the multiplicative change is above 1. 
\nA 20 cm tree has a .5619323 probability that it died
\nThere do appear to be a linear relationship between the tree's diameter and its log odds of dying because we find a very very VERY VEEERRRYYYY small pvalue after conducting a Wald test (<2e-16)

blowdown <- read.csv("http://www.cknudson.com/data/blowdown.csv")
head(blowdown)

##    D         S y SPP
## 1  9 0.0217509 0  BA
## 2 14 0.0217509 0  BA
## 3 18 0.0217509 0  BA
## 4 23 0.0217509 0  BA
## 5  9 0.0217509 0  BA
## 6 16 0.0217509 0  BA

mod1 <- glm(y ~ D, family = binomial, data = blowdown)
exp(coef(mod1))

## (Intercept)           D 
##   0.1822981   1.1024759

summary(mod1)

## 
## Call:
## glm(formula = y ~ D, family = binomial, data = blowdown)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -3.6309  -0.9616  -0.7211   1.1495   1.7172  
## 
## Coefficients:
##              Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -1.702112   0.082798  -20.56   <2e-16 ***
## D            0.097558   0.004846   20.13   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 5057.9  on 3665  degrees of freedom
## Residual deviance: 4555.6  on 3664  degrees of freedom
## AIC: 4559.6
## 
## Number of Fisher Scoring iterations: 4

ilogit(-1.702112 + .097556*20)

## [1] 0.5619323

5.3.3 Beer

An increase in one ABV rating is associated with a multiplicative change of 4.332626 in the logodds of being a good beer. This would mean that beers with a higher ABV are more likely to be good beers.
\nThe beer I chose was the only sour on the list, which happened to have the lowest ABV. The probability of my beer being a good beer to that friend is thus quite low at .02582973.

beer <- read.csv("http://www.cknudson.com/data/MNbeer.csv")
head(beer)

##       Brewery         Beer              Description Style ABV IBU Rating Good
## 1     Bauhaus  Wonderstuff     New Bohemian Pilsner Lager 5.4  48     88    0
## 2     Bauhaus    Stargazer German Style Schwarzbier Lager 5.0  28     87    0
## 3     Bauhaus  Wagon Party    West Cost Style Lager Lager 5.4  55     86    0
## 4     Bauhaus    Sky-Five!        Midwest Coast IPA   IPA 6.7  70     86    0
## 5 Bent Paddle         Kanu         Session Pale Ale   Ale 4.8  48     85    0
## 6 Bent Paddle Venture Pils            Pilsner Lager Lager 5.0  38     87    0

mod2 <- glm(Good ~ ABV, family = binomial, data = beer)    
exp(coef(mod2))

##  (Intercept)          ABV 
## 5.611666e-05 4.332636e+00

summary(mod2)

## 
## Call:
## glm(formula = Good ~ ABV, family = binomial, data = beer)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.3847  -0.8206  -0.4663   0.7230   2.1320  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)   
## (Intercept)  -9.7881     3.3959  -2.882  0.00395 **
## ABV           1.4662     0.5481   2.675  0.00747 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 51.564  on 43  degrees of freedom
## Residual deviance: 42.108  on 42  degrees of freedom
## AIC: 46.108
## 
## Number of Fisher Scoring iterations: 5

ilogit(-9.7881+1.4662*4.2)

## [1] 0.02582973

Logistic Regression

StatsWithSarah

3/11/2021

Half-Time Exercises

5.3.1 Female Horseshoe Crab Weight

5.3.2 Boundary Water Blowdown

5.3.3 Beer