Here we practice doing logistic regression with the beer data.
beer <- read.csv("http://www.cknudson.com/data/MNbeer.csv")
First, let’s just compare the styles of beer with respect to their chances of being “Good.”
mod <- glm(Good ~ Style, data = beer, family = binomial)
summary(mod)
##
## Call:
## glm(formula = Good ~ Style, family = binomial, data = beer)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.0302 -0.8067 -0.8067 1.3321 1.6006
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -0.9555 0.5262 -1.816 0.0694 .
## StyleIPA 0.5988 0.7210 0.831 0.4062
## StyleLager -17.6106 2174.2129 -0.008 0.9935
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 51.564 on 43 degrees of freedom
## Residual deviance: 44.305 on 41 degrees of freedom
## AIC: 50.305
##
## Number of Fisher Scoring iterations: 17
Let’s look at each style’s log odds of being good. We need to work with these regression coefficients.
coef(mod)
## (Intercept) StyleIPA StyleLager
## -0.9555114 0.5988365 -17.6105571
Since ales are built into the intercept, here are ale’s log odds of being good:
exp(coef(mod)[1])
## (Intercept)
## 0.3846154
Next, here are IPA’s log odds of being good:
sum(coef(mod)[c(1,2)])
## [1] -0.3566749
Last and also least, here are the lager’s log odds of being good:
sum(coef(mod)[c(1,3)])
## [1] -18.56607
We can exponentiate each of those log odds to get the plain old odds of a beer style being good. For example, here is the odds of an IPA being good:
exp(sum(coef(mod)[c(1,2)]))
## [1] 0.7
library(faraway)
Let’s calculate the probability of each style of beer being good. We can use the ilogit command or calculate it by hand. It’s easy either way. First we calculate an ale’s probability of being good.
ilogit(coef(mod)[1])
## (Intercept)
## 0.2777778
Next, let’s calculate an IPA’s probability of being good.
ilogit(sum(coef(mod)[c(1,2)]))
## [1] 0.4117647
Finally, let’s calculate a lager’s probability of being good.
ilogit(sum(coef(mod)[c(1,3)]))
## [1] 8.646869e-09
Now let’s start comparing the odds using ODDS RATIOS. An odds ratio is just a ratio of two odds.
For example, the odds of an IPA being good are 1.82 times the odds of an ale being good.
exp(coef(mod)[2])
## StyleIPA
## 1.82
The odds of a lager being good are 2.2 * 10^-8 times the odds of an ale being good.
exp(coef(mod)[3])
## StyleLager
## 2.248186e-08
Most people like odds ratios greater than or equal to one, so let’s rework this sentence. The odds of an ale being good are about 4 million times the odds of a lager being good
exp(-coef(mod)[3])
## StyleLager
## 44480305
In the previous model, ale was the reference group; it was built into the intercept. If you do not want a reference group, tweak the formula using 0+.
mod2 <- glm(Good ~ 0+ Style, data = beer, family = binomial)
summary(mod2)
##
## Call:
## glm(formula = Good ~ 0 + Style, family = binomial, data = beer)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.0302 -0.8067 -0.8067 1.3321 1.6006
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## StyleAle -0.9555 0.5262 -1.816 0.0694 .
## StyleIPA -0.3567 0.4928 -0.724 0.4692
## StyleLager -18.5661 2174.2129 -0.009 0.9932
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 60.997 on 44 degrees of freedom
## Residual deviance: 44.305 on 41 degrees of freedom
## AIC: 50.305
##
## Number of Fisher Scoring iterations: 17
The log odds of ale being good are still contained in the first regression coefficient:
coef(mod2)[1]
## StyleAle
## -0.9555114
But now the second regression coefficient tells us the log odds of an IPA being good.
coef(mod2)[2]
## StyleIPA
## -0.3566749
Similarly, the third regression coefficient shows us the log odds of a lager being good. We can see these log odds don’t change; just the model’s way of representing the numbers has changed slightly.
coef(mod2)[3]
## StyleLager
## -18.56607
How do we find odds ratios when we don’t have a reference group? Simply calculate two odds and then find the ratio. Here the numerator is the log odds of an IPA being good and the denominator is the odds of a lager being good.
exp(coef(mod2)[2])/exp(coef(mod2)[3])
## StyleIPA
## 80954155
The odds of an IPA being good are about 8 million times the odds of a lager being good
Say we want to account for ABV and then compare a couple styles.
mod3 <- glm(Good ~ Style + ABV, data = beer, family = binomial)
summary(mod3)
##
## Call:
## glm(formula = Good ~ Style + ABV, family = binomial, data = beer)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.7351 -0.8053 -0.3668 0.7564 1.8907
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -9.8628 4.3839 -2.250 0.0245 *
## StyleIPA -0.9435 1.0476 -0.901 0.3678
## StyleLager -16.7872 2105.6351 -0.008 0.9936
## ABV 1.5882 0.7657 2.074 0.0381 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 51.564 on 43 degrees of freedom
## Residual deviance: 39.097 on 40 degrees of freedom
## AIC: 47.097
##
## Number of Fisher Scoring iterations: 17
exp(coef(mod3)[2])
## StyleIPA
## 0.3892449
After accounting for the ABV of the beer, the odds of an IPA being good are 0.3892449 times the odds of an ale being good.
Alternatively, after accounting for the ABV of the beer, the odds of an ale being good are 2.5 times the odds of an IPA being good.
exp(-coef(mod3)[2])
## StyleIPA
## 2.569077