This week we learned about logistic regression, which can be used when predicting a variable with a binary response. Logistic regression equations are in the form of: log(p/(1-p)) = B0 + B1x1 + B2x2…. Below are a few practice problems that we did to understand the process.
Our logistic regression equation is: log(p/(1-p)) = -3.6947264 + .0018151*weight
\nAn increase of one gram of weight is associated with a multiplicative change of 1.00181679 of the logodds. This means that heavier females have a higher chance of having satellites, albeit small.
\n A female weighing 2000 grams has a .483874 probability of having one of more satellites. Looks like our girl gotta get more thicc.
\nConducting a Wald test, the pvalue is very small (1.45e-06), meaning that there is a linear relationship between thiccy crabs and having satellites or not.
library(faraway)
crabs <- read.csv("http://www.cknudson.com/data/crabs.csv")
mod0 <- glm(y ~ weight, family = binomial, data = crabs)
summary(mod0)
##
## Call:
## glm(formula = y ~ weight, family = binomial, data = crabs)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.1108 -1.0749 0.5426 0.9122 1.6285
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -3.6947264 0.8801975 -4.198 2.70e-05 ***
## weight 0.0018151 0.0003767 4.819 1.45e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 225.76 on 172 degrees of freedom
## Residual deviance: 195.74 on 171 degrees of freedom
## AIC: 199.74
##
## Number of Fisher Scoring iterations: 4
exp(coef(mod0))
## (Intercept) weight
## 0.02485425 1.00181679
ilogit(-3.6947264 + .0018151*2000) #This function solves for the logodds for things in the form of log(p/(1-p))
## [1] 0.483874
Our logistic regression equation is: log(p/(1-p)) = -1.702112 + .097556*D
\nAn increase of one centimeter of diameter is associated with a multiplicative change of 1.1024759 in the logodds.
\nThicker trees have a higher chance of dying (RIP) since the multiplicative change is above 1.
\nA 20 cm tree has a .5619323 probability that it died
\nThere do appear to be a linear relationship between the tree's diameter and its log odds of dying because we find a very very VERY VEEERRRYYYY small pvalue after conducting a Wald test (<2e-16)
blowdown <- read.csv("http://www.cknudson.com/data/blowdown.csv")
head(blowdown)
## D S y SPP
## 1 9 0.0217509 0 BA
## 2 14 0.0217509 0 BA
## 3 18 0.0217509 0 BA
## 4 23 0.0217509 0 BA
## 5 9 0.0217509 0 BA
## 6 16 0.0217509 0 BA
mod1 <- glm(y ~ D, family = binomial, data = blowdown)
exp(coef(mod1))
## (Intercept) D
## 0.1822981 1.1024759
summary(mod1)
##
## Call:
## glm(formula = y ~ D, family = binomial, data = blowdown)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -3.6309 -0.9616 -0.7211 1.1495 1.7172
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -1.702112 0.082798 -20.56 <2e-16 ***
## D 0.097558 0.004846 20.13 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 5057.9 on 3665 degrees of freedom
## Residual deviance: 4555.6 on 3664 degrees of freedom
## AIC: 4559.6
##
## Number of Fisher Scoring iterations: 4
ilogit(-1.702112 + .097556*20)
## [1] 0.5619323
An increase in one ABV rating is associated with a multiplicative change of 4.332626 in the logodds of being a good beer. This would mean that beers with a higher ABV are more likely to be good beers.
\nThe beer I chose was the only sour on the list, which happened to have the lowest ABV. The probability of my beer being a good beer to that friend is thus quite low at .02582973.
beer <- read.csv("http://www.cknudson.com/data/MNbeer.csv")
head(beer)
## Brewery Beer Description Style ABV IBU Rating Good
## 1 Bauhaus Wonderstuff New Bohemian Pilsner Lager 5.4 48 88 0
## 2 Bauhaus Stargazer German Style Schwarzbier Lager 5.0 28 87 0
## 3 Bauhaus Wagon Party West Cost Style Lager Lager 5.4 55 86 0
## 4 Bauhaus Sky-Five! Midwest Coast IPA IPA 6.7 70 86 0
## 5 Bent Paddle Kanu Session Pale Ale Ale 4.8 48 85 0
## 6 Bent Paddle Venture Pils Pilsner Lager Lager 5.0 38 87 0
mod2 <- glm(Good ~ ABV, family = binomial, data = beer)
exp(coef(mod2))
## (Intercept) ABV
## 5.611666e-05 4.332636e+00
summary(mod2)
##
## Call:
## glm(formula = Good ~ ABV, family = binomial, data = beer)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.3847 -0.8206 -0.4663 0.7230 2.1320
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -9.7881 3.3959 -2.882 0.00395 **
## ABV 1.4662 0.5481 2.675 0.00747 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 51.564 on 43 degrees of freedom
## Residual deviance: 42.108 on 42 degrees of freedom
## AIC: 46.108
##
## Number of Fisher Scoring iterations: 5
ilogit(-9.7881+1.4662*4.2)
## [1] 0.02582973