Logistic Regression Continued
We practiced interpreting coefficients on Logistic Regression Models this week. It can be tricky to interpret odds, especially if the odds are less than 1. In such a case, it may be easier to reverse the odds that you are interpreting. I will give examples of this below.
Here is a logistic model with the regression equation:
library(stableGR)
## Warning: package 'stableGR' was built under R version 3.6.3
## Loading required package: mcmcse
## Warning: package 'mcmcse' was built under R version 3.6.3
## mcmcse: Monte Carlo Standard Errors for MCMC
## Version 1.4-1 created on 2020-01-29.
## copyright (c) 2012, James M. Flegal, University of California, Riverside
## John Hughes, University of Colorado, Denver
## Dootika Vats, University of Warwick
## Ning Dai, University of Minnesota
## For citation information, type citation("mcmcse").
## Type help("mcmcse-package") to get started.
data("titanic.complete")
titanic<-titanic.complete
m4<-glm(Survived~Pclass,data = titanic, family=binomial)
summary(m4)
##
## Call:
## glm(formula = Survived ~ Pclass, family = binomial, data = titanic)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.4533 -0.7399 -0.7399 0.9246 1.6908
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 0.6286 0.1548 4.061 4.88e-05 ***
## Pclass2 -0.7096 0.2171 -3.269 0.00108 **
## Pclass3 -1.7844 0.1986 -8.987 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 960.90 on 711 degrees of freedom
## Residual deviance: 868.11 on 709 degrees of freedom
## AIC: 874.11
##
## Number of Fisher Scoring iterations: 4
exp(m4$coefficients)
## (Intercept) Pclass2 Pclass3
## 1.8750000 0.4918519 0.1679012
1/exp(m4$coefficients[-1])
## Pclass2 Pclass3
## 2.033133 5.955882
\[log(\frac{p}{1-p})=0.6286-0.7096I(2nd Class)-1.7844I(3rdClass) \]
Here we are modeling the probability that a passenger from the Titanic survived, p. The sole predictor for this model is the class of the passenger: 1st, 2nd, or 3rd. Note that 1st class is built into the intercept.
By exponentiating the coefficients, we turn the log odds into odds. The exponentiated intercept is 1.875. This means the odds that a 1st class passenger survived were 1.875, that is the probability they survived divided by the probability they didn’t survive.
The exponentiated coefficients for 2nd and 3rd class can be interpretated as follows: the odds that a 2nd and 3rd class passenger survived are .4918 and .1679 times the odds that 1st class passenger survived.
Alternatively, we could flip it around and say: The odds a first class passenger survived were 2.033 and 5.956 times the odds of survival for a 2nd and 3rd class passenger, respectively. This is done by inverting the exponentiated coefficients. When we interpret this way, it is easy to see that higher class passengers were more likely to survive.
We also may want to find the actual probability of survival for a given passenger. For example, we can calculate the probability a 2nd class passenger survives as follows:
library(faraway)
## Warning: package 'faraway' was built under R version 3.6.3
logodds<-m4$coefficients%*%c(1,1,0)
ilogit(logodds)
## [,1]
## [1,] 0.4797688
The probability that a 2nd class passenger survived was .4797.
Testing for Linear Relationships
Say we have add another predictor to this model: gender. We could then test whether or not class has a significant linear relationship with the odds of survival after accounting for gender.
This can be done with the ANOVA() function using drop in deviance as the test stat. We will compare the model with two predictors to one that uses just gender. Since class has several coefficients, using a single coefficient Wald test would not be applicable here.
m6<-glm(Survived~Sex,data = titanic, family=binomial)
m5<-glm(Survived~Pclass+Sex,data = titanic, family=binomial)
anova(m6,m5,test="Chisq")
## Analysis of Deviance Table
##
## Model 1: Survived ~ Sex
## Model 2: Survived ~ Pclass + Sex
## Resid. Df Resid. Dev Df Deviance Pr(>Chi)
## 1 710 749.57
## 2 708 672.06 2 77.511 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
This shows that adding class results in a significant drop in deviance. Thus, after accounting for gender it appears that class exhibits a linear relationship with the odds of survival.