Binary Logistic Regression 2

What have we done; what do we need to do still?

model definition ✅
relating log-odds to probabilities ✅
prediction ✅

What have we done; what do we need to do still?

model definition ✅
relating log-odds to probabilities ✅
prediction ✅
meaning of coefficients
model comparison

What have we done; what do we need to do still?

This is the format we’ll be applying for the upcoming weeks:

model definition
interpretation of coefficients
generating model predictions
comparing (nested) models

Binary logistic regression

When \[y_i \in \{0, 1\}, \quad \text{for all } i \in n.\]

(i.e., each \(y_i\) is a binary variable that can take a value of 0 or 1 for every observation \(i\) in observations \(n\))

We can model binary data by assuming that for all \(i \in 1 \ldots n\) (i.e., for every observation \(i\) of \(n\) samples),

\[ y_i \sim \text{Bernoulli}(\theta_i), \\ \log\left(\frac{\theta_i}{1 - \theta_i}\right) = \beta_0 + \beta_1 \times x_i. \]

\(\dots\) each observed outcome variable value \(y_1, y_2, \ldots, y_n\) is a sample from a Bernoulli distribution with parameter \(\theta_i\), and the log odds of \(\theta\) is a linear function of the predictor variable’s values \(x_1, x_2, \ldots, x_n\).

Implementation in R

# load packages
library(tidyverse)

# load smoking data (link can be found on NOW)
smoking_df <- read_csv("https://raw.githubusercontent.com/mark-andrews/ntupsychology-data/main/data-sets/smoking.csv") 

# select relevant variables
smoking_df <- select(smoking_df, cigs, age, educ)

# Make the cigs a binary variable
smoking_df <- mutate(smoking_df, is_smoker = cigs > 0)

# Show data
smoking_df

# A tibble: 807 × 4
    cigs   age  educ is_smoker
   <dbl> <dbl> <dbl> <lgl>    
 1     0    46  16   FALSE    
 2     0    40  16   FALSE    
 3     3    58  12   TRUE     
 4     0    30  13.5 FALSE    
 5     0    17  10   FALSE    
 6     0    86   6   FALSE    
 7     0    35  12   FALSE    
 8     0    48  15   FALSE    
 9     0    48  12   FALSE    
10     0    31  12   FALSE    
# ℹ 797 more rows

Implementation in R

# Whether or not someone is a smoker modelled as a function of age.
m_age <- glm(is_smoker ~ age, data = smoking_df, family = binomial(link = 'logit'))

# Show model coefficients
summary( m_age )$coef

               Estimate  Std. Error    z value    Pr(>|z|)
(Intercept)  0.02396177 0.190206271  0.1259778 0.899749472
age         -0.01215431 0.004355148 -2.7907922 0.005257921

Converting between probability and log-odds

Probabilities \(>\) odds \(>\) log odds \(>\) odds \(>\) probabilities

# A probability of 
p <- 0.75 

# corresponds to odds of
odds <- p / (1 - p)  
odds

[1] 3

# which correspond to log odds / logits of
logodds <- log(odds) # the natural logarithm of the odds 
logodds

[1] 1.098612

Converting between probability and log-odds

# which correspond to odds of
odds <- exp(logodds)
# exponentiation of the logit (log odds) to get the odds
odds

[1] 3

# which correspond to a probability
odds / (1 + odds)

[1] 0.75

Odds are always ratios: An odds of 3 is a ratio which means that 3 chances of something occurring for every 1 chance of it not occurring.

Converting between probability and log-odds

# Use plogis to get the probability of the log odds
plogis(logodds)

[1] 0.75

# Use qlogis to get the log odds of a probability
qlogis(p)

[1] 1.098612

Converting between probability and log-odds

When probability = 0.5, log-odds are 0 (equal).
When probabilities are small, log-odds are negative.
When probabilities are large, log-odds are positive.
Probabilities are bound between 0 and 1; log-odds aren’t.

Converting between probability and log-odds

# Here are some low, medium and high probability values:
theta_low <- 0.01  # low prob of success 
theta_mid <- 0.5   # equal prob of success 
theta_high <- 0.99 # high prob of success

# Logits of low probability
log( theta_low / ( 1 - theta_low ) )

[1] -4.59512

plogis( theta_low )

[1] 0.5025

Work out the logit values for theta_mid and theta_high using the formula and one of the two functions.

Converting between probability and log-odds

If the probability of bad weather is 0.75, what are the odds that we’ll have bad weather?

\[ \text{odds} = \frac{\theta}{1 - \theta} \] Use \(R\) as a calculator or solve it by hand!

Converting between probability and log-odds

If the probability of bad weather is \(\theta = 0.75\), what are the odds that we’ll have bad weather?

p <- 0.75
p / ( 1 - p )

[1] 3

Probability of \(\theta = 0.75\) means that we are three times more likely to have bad weather than good weather.

Converting between probability and log-odds

If the odds are 4 (i.e. 4:1) to have bad weather, what is the probability that we will have bad weather?

\[ \theta = \frac{\text{odds}}{1 + \text{odds}} \] Use \(R\) as a calculator!

Converting between probability and log-odds

If the odds are 4 (i.e. 4:1) to have bad weather, what is the probability that we will have bad weather?

odds <- 4
odds / ( 1 + odds )

[1] 0.8

Converting between probability and log-odds

If there is a probability of \(\theta = 0.9\) that an intervention improves reading of kids, what is its log odds?

Compute the odds:

\[ \text{odds} = \frac{\theta}{1 - \theta} \]

Take the logarithm of the odds to get the log-odds (also called logit):

\[ \text{log-odds} = \log(\text{odds}) \]

Use \(R\) as a calculator!

Converting between probability and log-odds

If there is a probability of \(\theta = 0.9\) that an intervention improves reading of kids, what is its log odds?

\[ \text{odds} = \frac{\theta}{1 - \theta} = \frac{0.9}{1 - 0.9} = \frac{0.9}{0.1} = 9 \]

p <- 0.9
p / (1 - p)

[1] 9

Take the logarithm of the odds to get the log-odds (also called logit):

\[ \text{log-odds} = \log(\text{odds}) = \log(9) \approx 2.197225 \]

log(9)

[1] 2.197225

Converting between probability and log-odds

The log-odds on the previous slide were 2.1972246. Convert this value into

Odds (the opposite of log(x) is exp(x)).
Probabilities (using plogis(x)).

(replace x correctly)

Converting between probability and log-odds

The log-odds on the previous slide were 2.1972246. Convert this value into

Odds (the opposite of log(x) is exp(x)):

exp(2.197225)

[1] 9.000004

Probabilities (using plogis(x)):

plogis(2.197225)

[1] 0.9

Predicting outcomes with `add_predictions`

# Step 1. Make a new data frame with range of values of age.
smoking_df2 <- tibble(age = seq(from = 15, to = 80, by = 10))

# Alternative to Step 2 (also log-odds)
library(modelr)
add_predictions(data = smoking_df2, # name of data frame 
                model = m_age)      # name of model

Use add_predictions for ages 20, 21, 22, 23, 24, 25, 26, 27, \(\dots\), 75, 76, 77, 78, 79, 80, 81

Predicting outcomes as probabilities

add_predictions(data = smoking_df2, # name of data frame 
                model = m_age,      # name of model
                type = 'response')  # return probabilities

# A tibble: 7 × 2
    age  pred
  <dbl> <dbl>
1    15 0.460
2    25 0.430
3    35 0.401
4    45 0.372
5    55 0.344
6    65 0.317
7    75 0.292

Plotting predicted outcomes

# Generate predictions
d_predict <- add_predictions(data = smoking_df2, 
                             model = m_age, 
                             type = 'response')

# Plot predicted values
library(psyntur)
scatterplot(data = d_predict, x = age, y = pred)

Compare predicts log-odds and probabilities

Muliple binary logistic regression

For ease of illustration consider a binary logistic regression with educ as binary categorical predictor:

# Create a categorical variable for `educ` 
smoking_df <- mutate(smoking_df, educ = case_when(educ > 12 ~ "o12y", TRUE ~ "u12y"))

# Binary logistic regression
m_educ_age <- glm(is_smoker ~ educ + age, data = smoking_df, family = binomial(link = "logit"))

# Inferential statistics for model coefficients
summary(m_educ_age)$coef

               Estimate Std. Error   z value     Pr(>|z|)
(Intercept) -0.24790348 0.20534132 -1.207275 0.2273261975
educu12y     0.56029969 0.15267484  3.669889 0.0002426559
age         -0.01387864 0.00439195 -3.160019 0.0015775895

Interpretation of coefficients

coef(m_educ_age)

(Intercept)    educu12y         age 
-0.24790348  0.56029969 -0.01387864

Log-odds of being a smoker is a linear function of the predictors
For any change in the predictors, there is a constant change in the log-odds of being a smoker.
- the coefficient for educ 0.5602997 suggests that people with less than 12 years of education u12y have, the log-odds of being a smoker increase by 0.5602997.
- this is the same for 18 to 19 years, 25 to 26 years, etc.
What does it mean for something to increase by 0.5602997
One way of getting around the lack of intuition around log-odds is to talk about the meaning of the coefficients in terms of odd ratios.

Risk ratios and odds ratios

Difference in terms of probabilities: ratios of probabilities are referred to as risk ratios.

\[ \text{risk ratio} = \frac{p(\text{is smoker if less than 12 years of education})}{p(\text{is smoker if more than 12 years of education)}} \] Create a grid for all combinations of educ and age.

smoking_df3 <- expand_grid(educ = c('o12y', 'u12y'), age = seq(18, 80, 20))

Obtain predictions as probabilities.

pred_df <- add_predictions(data = smoking_df3, model = m_educ_age, type='response')

Convert data frame to wide format (one column per level of educ).

pred_df_wide <- pivot_wider(pred_df, names_from = educ, values_from = pred)

Risk ratios and odds ratios

Calculate the risk ratios

\[ \text{risk ratio} = \frac{p(\text{is smoker if less than 12 years of education})}{p(\text{is smoker if more than 12 years of education)}} \]

Note, these values aren’t constant across age!

mutate(pred_df_wide, prob_ratio = u12y / o12y)

# A tibble: 4 × 4
    age  o12y  u12y prob_ratio
  <dbl> <dbl> <dbl>      <dbl>
1    18 0.378 0.516       1.36
2    38 0.315 0.446       1.42
3    58 0.259 0.379       1.47
4    78 0.209 0.316       1.51

Risk ratios and odds ratios

mutate(pred_df_wide,
       odds_o12y = o12y / (1 - o12y),
       odds_u12y = u12y / (1 - u12y),
       odds_ratio = odds_u12y / odds_o12y)

# A tibble: 4 × 6
    age  o12y  u12y odds_o12y odds_u12y odds_ratio
  <dbl> <dbl> <dbl>     <dbl>     <dbl>      <dbl>
1    18 0.378 0.516     0.608     1.06        1.75
2    38 0.315 0.446     0.461     0.807       1.75
3    58 0.259 0.379     0.349     0.611       1.75
4    78 0.209 0.316     0.264     0.463       1.75

Unlike ratios of probabilities, ratios of odds are independent of age (or any other predictor), where the odds are a ratios of probabilities, i.e. \(\frac{p}{1-p}\). Expressed in maths, odds ratios can be represented as

\[ \text{odds ratio} = \frac{\frac{p}{1 - p}}{\frac{q}{1 - q}} \]

Risk ratios and odds ratios

This odds ratio can be obtained by the exponential of the education coefficient \(e^{\beta_{\text{educ}}}\), where \(e\) is a constant (Euler’s number, i.e. 2.718282) like \(\pi\).

# Extract the model coefficients
betas <- coef(m_educ_age)
betas

(Intercept)    educu12y         age 
-0.24790348  0.56029969 -0.01387864

# Get odds ratio via Euler’s number
e <- exp(1)
e^betas[2]

educu12y 
1.751197

# or simply using exponential function
exp(betas[2])

educu12y 
1.751197

Risk ratios and odds ratios

# or simply using exponential function
exp(betas[2])

educu12y 
1.751197

The interpretation is that the odds of being a smoker for individuals with less than 12 years of education are 1.7511972 greater than the odds of being a smoker for individuals with more than 12 years of education.

So while betas[2] shows you the coefficient of education in log odds, exp(betas[2]) returns the coefficient in odds.

What are the odds for of a 18 year old to be a smoker compared to a 19 year old, regardless of education?

Risk ratios and odds ratios

How do we get the confidence interval for the same coefficient as odds ratio?

(ci <- confint.default(m_educ_age, parm = "educu12y")) # log odds

             2.5 %    97.5 %
educu12y 0.2610625 0.8595369

Range of values that would not be rejected in a null hypothesis test. These are all plausible values for the parameter of interest (difference of being a smoker for less than vs more than 12 years of education).

exp(ci) # odds ratio

            2.5 %   97.5 %
educu12y 1.298309 2.362067

If odds ratio contains 1, we cannot reject the null hypothesis. Yes 1, not 0! Odds are equal if 1. In contrast, on the log-odds scale the null hypothesis is a zero difference; cause log(1) = 0.

Summarising coefficient interpretation

Changes in the outcome variable can be presented as probabilities, log odds and odds.

logodds / logits show constant changes but are unintuitive.
probability is intuitive but not constant.
odds are constant and reasonably intuitive.

The factor by which odds of being a smoker increases with a unit change in the predictor!

Summarising coefficient interpretation

What do we mean by a factor? We mean it is MULTIPLIED.

a factor of 2 means something is twice as likely.
a factor of 1 means something is equally likely.
a factor of 0.5 means something is half as likely.

Aside, if you get a question asking “what is the factor by which the odds of being a smoker increases for a change in unit of education” I’m really asking you about the odds ratio.

Model fit statistics and model comparisons

log-likelihood / deviance
log-likelihood ratio tests
- Normal linear models: F test
- Binary logistic regression: \(\chi^2\) test (Chi^2)

Model fit statistics and model comparisons

summary(m_educ_age) # see deviances on bottom

Call:
glm(formula = is_smoker ~ educ + age, family = binomial(link = "logit"), 
    data = smoking_df)

Coefficients:
             Estimate Std. Error z value Pr(>|z|)    
(Intercept) -0.247903   0.205341  -1.207 0.227326    
educu12y     0.560300   0.152675   3.670 0.000243 ***
age         -0.013879   0.004392  -3.160 0.001578 ** 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 1075.0  on 806  degrees of freedom
Residual deviance: 1053.3  on 804  degrees of freedom
AIC: 1059.3

Number of Fisher Scoring iterations: 4

Model fit statistics and model comparisons

Residual deviance: deviance of the model we’re looking at
Null deviance: deviance of a reference model with no predictors
A lower deviance indicates a better the model fit

deviance(m_educ_age)

[1] 1053.306

This deviance is based on the log-likelihood

\[ \text{deviance} = -2 \times \text{log likelihood} \]

Log-likelihood

Calculation is based on the observed data and predicted data

# Extract the outcome variable
y <- smoking_df$is_smoker
# Generate predicted values on the prob scale
p_hat <- predict(m_educ_age, type = "response")
# Formula for log-likelihood for binomial models
sum( y * log(p_hat) + (1 - y) * log(1 - p_hat)) # don't worry about this formula

[1] -526.6529

and thus captures, overall, how unsurprised your model is by the data it actually saw (high values mean less surprised by data; i.e. better fit). For short,

logLik(m_educ_age)

'log Lik.' -526.6529 (df=3)

Log-likelihood to deviance

The negative double of the log-likelihood is the deviance, so

-2 * logLik(m_educ_age)

'log Lik.' 1053.306 (df=3)

is the same as

deviance(m_educ_age)

[1] 1053.306

Log-likelihood to deviance

and

deviance(m_educ_age) / -2

[1] -526.6529

is the same as

logLik(m_educ_age)

'log Lik.' -526.6529 (df=3)

The deviance doesn’t tell you more than what you already know from the log-likelihood.

Comparing nested models using \(\chi^2\) test

We used \(R^2\) for normal linear models which has an intuitive and absolute interpretation.
Log-likelihood and deviance can only be understood in relative terms, relative to other, specifically nested, models.

For example, the two binary logistic models \(\text{is_smoker} \sim \text{Bernoulli}(\theta_i)\) that we have considered so far have the predictor structures

\[ \mathcal{M_1}: \text{logit}(\theta_i) = \beta_0 + \beta_1 \times age_i\\ \] and

\[ \mathcal{M_2}: \text{logit}(\theta_i) = \beta_0 + \beta_1 \times age_i + \beta_2 \times \text{educ}_i. \] Model \(\mathcal{M_1}\) is nested in model \(\mathcal{M_2}\) because \(\mathcal{M_1}\) contains a subset of the predictors of \(\mathcal{M_2}\).

Comparing nested models using \(\chi^2\) test

The deviance of model \(\mathcal{M_2}\) is

deviance(m_educ_age)

[1] 1053.306

What’s the deviance of \(\mathcal{M_1}\)?

Which one of these two models is better at predicting the data?

What does it mean that one model is “better”?

The model is better at identifying a smoker on the basis of their age and education compared to just knowing their age. Education can tell us something that age alone can’t.

Comparing nested models using \(\chi^2\) test

\(\dots\) using the difference in deviance:

deviance(m_educ_age) - deviance(m_age)

[1] -13.762

The deviance of \(\mathcal{M_2}\) is around 13.762 units lower.

We can roughly interpret these differences of deviances

if less than 4, inconclusive
if between 4 and 10, conclusive
if above 10, definitive.

Comparing nested models using \(\chi^2\) test

A meaningful difference of the deviance should be 2 times greater than the difference in the number of parameters in the models.

How many parameters are in \(\mathcal{M_2}\)?

coef(m_educ_age)

(Intercept)    educu12y         age 
-0.24790348  0.56029969 -0.01387864

It’s 3: in a binary logistic regression the number of coefficients is the number of parameters (in normal linear models there is also the standard deviation). And it’s 2 for \(\mathcal{M_1}\)

coef(m_age)

(Intercept)         age 
 0.02396177 -0.01215431

So the difference in deviances should be at least 2 (our difference in deviances was 13.762).

Comparing nested models using \(\chi^2\) test

A statistical comparison of these deviances test can be implemented using anova.
We are using a \(\chi^2\) statistic for binary logistic regression, not an \(F\)-test as for normal linear models.
This is because the sampling distribution of the difference in deviances is close to a \(\chi^2\) distribution, not an \(F\)-distribution.

anova(m_age, m_educ_age, test = 'Chisq')

Comparing nested models using \(\chi^2\) test

p-value comes from a \(\chi^2\) distribution Df degrees of freedom (difference in number of parameters).
The difference in the deviances can be found in the columns Deviance.

anova(m_age, m_educ_age, test = 'Chisq')

Analysis of Deviance Table

Model 1: is_smoker ~ age
Model 2: is_smoker ~ educ + age
  Resid. Df Resid. Dev Df Deviance  Pr(>Chi)
1       805     1067.1                      
2       804     1053.3  1   13.762 0.0002075

The p-value is the probability of obtaining a difference in deviances of 13.76 in a \(\chi^2\) distribution with 1 degree of freedom if the model with both educ and age is identical to the model with only age.

diff_deviance <- deviance(m_age) - deviance(m_educ_age)
pchisq(diff_deviance, df = 1, lower.tail = FALSE)

[1] 0.0002074912

Difference in deviances and \(\chi^2\) distribution

So lets say, you have two models \(\mathcal{M_a}\) and \(\mathcal{M_b}\). \(\mathcal{M_a}\) is fully nested in \(\mathcal{M_b}\). \(\mathcal{M_a}\) has 2 parameters, and \(\mathcal{M_b}\) has 5. \(\mathcal{M_a}\) has a deviance of 500, and \(\mathcal{M_b}\) has a deviance of 490. What is the corresponding p-value of their difference?

pchisq(---, df = ---, lower.tail = FALSE)

Exercise

Reload the data to select new variables:

# load smoking data (link can be found on NOW)
smoking_df <- read_csv("https://raw.githubusercontent.com/mark-andrews/ntupsychology-data/main/data-sets/smoking.csv") 

# Make the cigs a binary variable
smoking_df <- mutate(smoking_df, is_smoker = cigs > 0)

# select relevant variables
smoking_df <- select(smoking_df, is_smoker, white, educ, lincome)

We will use white (binary), educ (as continuous variable), and the log of income (lincome).

Implement the following two models with \(\text{is_smoker}_i \sim \text{Bernoulli}(\theta_i)\):

\[ \mathcal{M_1}: \text{logit}(\theta_i) = \beta_0 + \beta_1 \times \log(\text{income})\\ \mathcal{M_2}: \text{logit}(\theta_i) = \beta_0 + \beta_1 \times \log(\text{income}) + \beta_2 \times \text{education} + \beta_3 \times \text{white} \]

and, then, perform a statistical test to evaluate whether the difference in their deviance is statistically meaningful. This test evaluates whether education and being white has an (combined) effect on being a smoker after controlling for income.

What have we done; what do we need to do still?

What have we done; what do we need to do still?

What have we done; what do we need to do still?

Binary logistic regression

Implementation in R

Implementation in R

Converting between probability and log-odds

Converting between probability and log-odds

Converting between probability and log-odds

Converting between probability and log-odds

Converting between probability and log-odds

Converting between probability and log-odds

Converting between probability and log-odds

Converting between probability and log-odds

Converting between probability and log-odds

Converting between probability and log-odds

Converting between probability and log-odds

Converting between probability and log-odds

Converting between probability and log-odds

Predicting outcomes with add_predictions

Predicting outcomes as probabilities

Plotting predicted outcomes

Compare predicts log-odds and probabilities

Muliple binary logistic regression

Interpretation of coefficients

Risk ratios and odds ratios

Risk ratios and odds ratios

Risk ratios and odds ratios

Risk ratios and odds ratios

Risk ratios and odds ratios

Risk ratios and odds ratios

Summarising coefficient interpretation

Summarising coefficient interpretation

Model fit statistics and model comparisons

Model fit statistics and model comparisons

Model fit statistics and model comparisons

Log-likelihood

Log-likelihood to deviance

Log-likelihood to deviance

Comparing nested models using \(\chi^2\) test

Comparing nested models using \(\chi^2\) test

Comparing nested models using \(\chi^2\) test

Comparing nested models using \(\chi^2\) test

Comparing nested models using \(\chi^2\) test

Comparing nested models using \(\chi^2\) test

Difference in deviances and \(\chi^2\) distribution

Exercise

Predicting outcomes with `add_predictions`