Logistic Regression
This week we worked on Logistic Regression. This is useful when trying to model a binomial response variable. Logistic Regression takes the following form:
\[log[\frac{p}{1-p}] = \beta_0+\beta_1X_1 \]
Here p is the probability of success. This means that the log-odds of success is:
\[log[\frac{p}{1-p}] \]
Here is an example of how to make a Logistic Regression Model in R. Below is a model for the probability of a tree dying during a windstorm given its diameter.
blowdown <- read.csv("http://www.cknudson.com/data/blowdown.csv")
attach(blowdown)
mod2<-glm(y~D,data = blowdown,family = "binomial")
summary(mod2)
##
## Call:
## glm(formula = y ~ D, family = "binomial", data = blowdown)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -3.6309 -0.9616 -0.7211 1.1495 1.7172
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -1.702112 0.082798 -20.56 <2e-16 ***
## D 0.097558 0.004846 20.13 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 5057.9 on 3665 degrees of freedom
## Residual deviance: 4555.6 on 3664 degrees of freedom
## AIC: 4559.6
##
## Number of Fisher Scoring iterations: 4
Note that Y is 1 if a tree died and 0 if it lived and D is the diameter. The glm() function is similar to Poisson Regression with the exception being that the distribution is “binomial”. In some cases, use the cbind() function if the dataset contains successes and failures when modeling the response in glm(). The dataset above consists of 1 tree per observation, so the cbind() function was not needed.
By looking at p-values, we can tell if linear relationships exist between the predictors and response. Because the p-value for diameter is significant, we have evidence that there is a linear relationship between diameter and log odds of a tree dying.
To interpret the coefficients, it helps to exponentiate them. This gives the multiplicative change on the odds of success. Here is an example to illustrate.
exp(mod2$coefficients)
## (Intercept) D
## 0.1822981 1.1024759
A 1 cm increase in the diameter of a tree is associated with a 1.1024759 multiplicative change in the odds of the tree dying, holding all else constant. In other words, thicker trees have higher chances of dying.
To calculate the probability of a tree dying given a specific diameter, use the ilogit() function as is done below.
library(faraway)
## Warning: package 'faraway' was built under R version 3.6.3
ilogit(mod2$coefficients[1]+mod2$coefficients[2]*20)
## (Intercept)
## 0.5619444
The probability that a tree 20 cm diameter dies is estimated to be .561944.