crabs <- read.csv("http://www.cknudson.com/data/crabs.csv")
library(faraway)
head(crabs)
## color spine width satell weight y
## 1 medium bad 28.3 8 3050 1
## 2 dark bad 22.5 0 1550 0
## 3 light good 26.0 9 2300 1
## 4 dark bad 24.8 0 2100 0
## 5 dark bad 26.0 4 2600 1
## 6 medium bad 23.8 0 2100 0
mod <- glm(y ~ width, family = binomial, data = crabs)
coef(mod)
## (Intercept) width
## -12.3508177 0.4972306
summary(mod)
##
## Call:
## glm(formula = y ~ width, family = binomial, data = crabs)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.0281 -1.0458 0.5480 0.9066 1.6942
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -12.3508 2.6287 -4.698 2.62e-06 ***
## width 0.4972 0.1017 4.887 1.02e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 225.76 on 172 degrees of freedom
## Residual deviance: 194.45 on 171 degrees of freedom
## AIC: 198.45
##
## Number of Fisher Scoring iterations: 4
Half-Time Exercises
5.3.1 Female Horseshoe Crab Weight
Our logistic regression equation is: log(p/(1-p)) = -3.6947264 + .0018151*weight
\nAn increase of one gram of weight is associated with a multiplicative change of 1.00181679 of the logodds. This means that heavier females have a higher chance of having satellites, albeit small.
\n A female weighing 2000 grams has a .483874 probability of having one of more satellites. Looks like our girl gotta get more thicc.
\nConducting a Wald test, the pvalue is very small (1.45e-06), meaning that there is a linear relationship between thiccy crabs and having satellites or not.
mod0 <- glm(y ~ weight, family = binomial, data = crabs)
summary(mod0)
##
## Call:
## glm(formula = y ~ weight, family = binomial, data = crabs)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.1108 -1.0749 0.5426 0.9122 1.6285
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -3.6947264 0.8801975 -4.198 2.70e-05 ***
## weight 0.0018151 0.0003767 4.819 1.45e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 225.76 on 172 degrees of freedom
## Residual deviance: 195.74 on 171 degrees of freedom
## AIC: 199.74
##
## Number of Fisher Scoring iterations: 4
exp(coef(mod0))
## (Intercept) weight
## 0.02485425 1.00181679
ilogit(-3.6947264 + .0018151*2000)
## [1] 0.483874
5.3.2 Boundary Water Blowdown
Our logistic regression equation is: log(p/(1-p)) = -1.702112 + .097556*D
\nAn increase of one centimeter of diameter is associated with a multiplicative change of 1.1024759 in the logodds.
\nThicker trees have a higher chance of dying (RIP) since the multiplicative change is above 1.
\nA 20 cm tree has a .5619323 probability that it died
\nThere do appear to be a linear relationship between the tree's diameter and its log odds of dying because we find a very very VERY VEEERRRYYYY small pvalue after conducting a Wald test (<2e-16)
blowdown <- read.csv("http://www.cknudson.com/data/blowdown.csv")
head(blowdown)
## D S y SPP
## 1 9 0.0217509 0 BA
## 2 14 0.0217509 0 BA
## 3 18 0.0217509 0 BA
## 4 23 0.0217509 0 BA
## 5 9 0.0217509 0 BA
## 6 16 0.0217509 0 BA
mod1 <- glm(y ~ D, family = binomial, data = blowdown)
exp(coef(mod1))
## (Intercept) D
## 0.1822981 1.1024759
summary(mod1)
##
## Call:
## glm(formula = y ~ D, family = binomial, data = blowdown)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -3.6309 -0.9616 -0.7211 1.1495 1.7172
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -1.702112 0.082798 -20.56 <2e-16 ***
## D 0.097558 0.004846 20.13 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 5057.9 on 3665 degrees of freedom
## Residual deviance: 4555.6 on 3664 degrees of freedom
## AIC: 4559.6
##
## Number of Fisher Scoring iterations: 4
ilogit(-1.702112 + .097556*20)
## [1] 0.5619323
5.3.3 Beer
An increase in one ABV rating is associated with a multiplicative change of 4.332626 in the logodds of being a good beer. This would mean that beers with a higher ABV are more likely to be good beers.
\nThe beer I chose was the only sour on the list, which happened to have the lowest ABV. The probability of my beer being a good beer to that friend is thus quite low at .02582973.
beer <- read.csv("http://www.cknudson.com/data/MNbeer.csv")
head(beer)
## Brewery Beer Description Style ABV IBU Rating Good
## 1 Bauhaus Wonderstuff New Bohemian Pilsner Lager 5.4 48 88 0
## 2 Bauhaus Stargazer German Style Schwarzbier Lager 5.0 28 87 0
## 3 Bauhaus Wagon Party West Cost Style Lager Lager 5.4 55 86 0
## 4 Bauhaus Sky-Five! Midwest Coast IPA IPA 6.7 70 86 0
## 5 Bent Paddle Kanu Session Pale Ale Ale 4.8 48 85 0
## 6 Bent Paddle Venture Pils Pilsner Lager Lager 5.0 38 87 0
mod2 <- glm(Good ~ ABV, family = binomial, data = beer)
exp(coef(mod2))
## (Intercept) ABV
## 5.611666e-05 4.332636e+00
summary(mod2)
##
## Call:
## glm(formula = Good ~ ABV, family = binomial, data = beer)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.3847 -0.8206 -0.4663 0.7230 2.1320
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -9.7881 3.3959 -2.882 0.00395 **
## ABV 1.4662 0.5481 2.675 0.00747 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 51.564 on 43 degrees of freedom
## Residual deviance: 42.108 on 42 degrees of freedom
## AIC: 46.108
##
## Number of Fisher Scoring iterations: 5
ilogit(-9.7881+1.4662*4.2)
## [1] 0.02582973