Here we practice doing logistic regression with the beer data.

beer <- read.csv("http://www.cknudson.com/data/MNbeer.csv")

Basic model construction

First, let’s just compare the styles of beer with respect to their chances of being “Good.”

mod <- glm(Good ~ Style, data = beer, family = binomial)

summary(mod)
## 
## Call:
## glm(formula = Good ~ Style, family = binomial, data = beer)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.0302  -0.8067  -0.8067   1.3321   1.6006  
## 
## Coefficients:
##              Estimate Std. Error z value Pr(>|z|)  
## (Intercept)   -0.9555     0.5262  -1.816   0.0694 .
## StyleIPA       0.5988     0.7210   0.831   0.4062  
## StyleLager   -17.6106  2174.2129  -0.008   0.9935  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 51.564  on 43  degrees of freedom
## Residual deviance: 44.305  on 41  degrees of freedom
## AIC: 50.305
## 
## Number of Fisher Scoring iterations: 17

Log odds and odds

Let’s look at each style’s log odds of being good. We need to work with these regression coefficients.

coef(mod)
## (Intercept)    StyleIPA  StyleLager 
##  -0.9555114   0.5988365 -17.6105571

Since ales are built into the intercept, here are ale’s log odds of being good:

exp(coef(mod)[1])
## (Intercept) 
##   0.3846154

Next, here are IPA’s log odds of being good:

sum(coef(mod)[c(1,2)]) 
## [1] -0.3566749

Last and also least, here are the lager’s log odds of being good:

sum(coef(mod)[c(1,3)])
## [1] -18.56607

We can exponentiate each of those log odds to get the plain old odds of a beer style being good. For example, here is the odds of an IPA being good:

exp(sum(coef(mod)[c(1,2)]))
## [1] 0.7
library(faraway)

Probabilities (from odds)

Let’s calculate the probability of each style of beer being good. We can use the ilogit command or calculate it by hand. It’s easy either way. First we calculate an ale’s probability of being good.

ilogit(coef(mod)[1]) 
## (Intercept) 
##   0.2777778

Next, let’s calculate an IPA’s probability of being good.

ilogit(sum(coef(mod)[c(1,2)])) 
## [1] 0.4117647

Finally, let’s calculate a lager’s probability of being good.

ilogit(sum(coef(mod)[c(1,3)])) 
## [1] 8.646869e-09

Odds ratios

Now let’s start comparing the odds using ODDS RATIOS. An odds ratio is just a ratio of two odds.

For example, the odds of an IPA being good are 1.82 times the odds of an ale being good.

exp(coef(mod)[2])
## StyleIPA 
##     1.82

The odds of a lager being good are 2.2 * 10^-8 times the odds of an ale being good.

exp(coef(mod)[3])
##   StyleLager 
## 2.248186e-08

Most people like odds ratios greater than or equal to one, so let’s rework this sentence. The odds of an ale being good are about 4 million times the odds of a lager being good

exp(-coef(mod)[3])
## StyleLager 
##   44480305

Reference groups

In the previous model, ale was the reference group; it was built into the intercept. If you do not want a reference group, tweak the formula using 0+.

mod2 <- glm(Good ~ 0+ Style, data = beer, family = binomial)
summary(mod2)
## 
## Call:
## glm(formula = Good ~ 0 + Style, family = binomial, data = beer)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.0302  -0.8067  -0.8067   1.3321   1.6006  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)  
## StyleAle     -0.9555     0.5262  -1.816   0.0694 .
## StyleIPA     -0.3567     0.4928  -0.724   0.4692  
## StyleLager  -18.5661  2174.2129  -0.009   0.9932  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 60.997  on 44  degrees of freedom
## Residual deviance: 44.305  on 41  degrees of freedom
## AIC: 50.305
## 
## Number of Fisher Scoring iterations: 17

The log odds of ale being good are still contained in the first regression coefficient:

coef(mod2)[1]
##   StyleAle 
## -0.9555114

But now the second regression coefficient tells us the log odds of an IPA being good.

coef(mod2)[2]
##   StyleIPA 
## -0.3566749

Similarly, the third regression coefficient shows us the log odds of a lager being good. We can see these log odds don’t change; just the model’s way of representing the numbers has changed slightly.

coef(mod2)[3]
## StyleLager 
##  -18.56607

How do we find odds ratios when we don’t have a reference group? Simply calculate two odds and then find the ratio. Here the numerator is the log odds of an IPA being good and the denominator is the odds of a lager being good.

exp(coef(mod2)[2])/exp(coef(mod2)[3])
## StyleIPA 
## 80954155

The odds of an IPA being good are about 8 million times the odds of a lager being good

Multiple predictors

Say we want to account for ABV and then compare a couple styles.

mod3 <- glm(Good ~ Style + ABV, data = beer, family = binomial)
summary(mod3)
## 
## Call:
## glm(formula = Good ~ Style + ABV, family = binomial, data = beer)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.7351  -0.8053  -0.3668   0.7564   1.8907  
## 
## Coefficients:
##              Estimate Std. Error z value Pr(>|z|)  
## (Intercept)   -9.8628     4.3839  -2.250   0.0245 *
## StyleIPA      -0.9435     1.0476  -0.901   0.3678  
## StyleLager   -16.7872  2105.6351  -0.008   0.9936  
## ABV            1.5882     0.7657   2.074   0.0381 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 51.564  on 43  degrees of freedom
## Residual deviance: 39.097  on 40  degrees of freedom
## AIC: 47.097
## 
## Number of Fisher Scoring iterations: 17
exp(coef(mod3)[2])
##  StyleIPA 
## 0.3892449

After accounting for the ABV of the beer, the odds of an IPA being good are 0.3892449 times the odds of an ale being good.

Alternatively, after accounting for the ABV of the beer, the odds of an ale being good are 2.5 times the odds of an IPA being good.

exp(-coef(mod3)[2])
## StyleIPA 
## 2.569077