Binary Logistic Regression 1

Learning Outcomes

Understand the difference between a linear and a generalised linear model
Be able to run and interpret a binary logistic regression model in \(R\)
Transform values between different scales (probabilities, odds, log odds)
Make predictions using model coefficients and \(R\) functions such as predict()

You will need to watch the lecture and read Chapter 10 in Andrews (2021) (pages 346 – 363) to fully grasp the concepts.

Recap: Linear regression

regression models are models that model the probability distribution of one variable, known as the outcome variable, and varies as a function of other variables, known as predictor variables.
the most basic type of regression model is the normal linear model.
in normal linear models, we assume that the outcome variable is normally distributed around a mean \(\mu\); the mean varies linearly with changes in a set of predictor variables.

Recap: Linear regression

\[ \begin{aligned} y_i &\sim \mathcal{N}(\mu, \sigma^2) \end{aligned} \]

\(y\): observed outcome variable.
Each observation \(y_i\) is a sample from such a normal distribution
indexed by \(i\) where \(i \in \{1, \dots, N\}\)
“\(\sim\)”: “is proportional to” / “is a function of”.
\(\mathcal{N}\): normal distributed probability model with mean \(\mu\) and variance \(\sigma^2\) (or standard deviation \(\sigma\)).

Recap: Linear regression

Where \(\mu\) is a linear function of the intercept \(\beta_0\) and its slope \(\beta_1\), relative to the value that the predictor \(x\) takes on.

A simple linear regression – with a single predictor \(x\) – has the form

\[ \mu_i = \beta_0 + \beta_1 \times x_i \]

Recap: Linear regression

For multiple linear regression predictors, with \(K\) predictors, the form is

\[ \mu_i = \beta_0 + \beta_1 \times x_{1i} + \beta_2 \times x_{2i} + \cdots + \beta_K \times x_{Ki} \]

or for short

\[ \mu_i = \beta_0 + \sum_{k=1}^K \beta_k \times x_{ki} \]

Essentially, \(\mu\) is the model’s predicted average of \(y\) for a given (set of) \(x\).

\[ y_i \sim \mathcal{N}(\mu_i, \sigma^2), \quad i \in \{1, \dots, N\} \] \(\in\) means “is an element of”

Recap: Linear regression

As useful as this model is, it is clearly limited
What (sort of data) is the normal linear model limited to?

Recap: Linear regression

Data are continuous (rather than discrete, categorical).
Data are assumed to come from a normal distribution.
Cannot be applied to categorical outcomes or counts.
Generalised linear models allow us to use other distributions with link functions to map different types of outcome variables to a continuous numeric space.
For example, binary logistic regression – a specific type of logistic regressions – uses a Binomial / Bernoulli distribution with a logit link to model binary outcomes (yes / no, correct / incorrect).

Binary logistic model

# Show some example data
example_data

            x y
1 -0.56047565 0
2 -0.23017749 0
3  1.55870831 1
4  0.07050839 1
5  0.12928774 1
6  1.71506499 1
7  0.46091621 1
8 -1.26506123 0

Let’s say, we have a set of bivariate data with n binary data \(y\) paired with a continuous variable \(x\):

\[(x_1, y_1), (x_2, y_2), \ldots, (x_n, y_n)\]

where

\[y_i \in \{0, 1\}, \quad \text{for all } i \in n\]

(\(y\) could stand for true / false, correct / incorrect, etc.)

Binary logistic model

Modelling these observed outcomes as samples from a normal distribution is an extreme example of model misspecification. Might look a bit like this:

Bernoulli to the rescue

The normal distribution isn’t appropriate because the data are not continuous.

\[ \begin{aligned} y_i &\sim \mathcal{N}(\mu, \sigma^2) \end{aligned} \] We can use another distribution that can handle binary outcomes.

\[ \begin{aligned} y_i \sim \text{Bernoulli}(\theta) \\ \end{aligned} \]

Bernoulli to the rescue

Binary logistic model: in R

Normal linear model

# lm =  linear model
lm(outcome ~ predictor_1 + predictor_2, data = data)

Binary logistic regression

# glm = generalised linear model
glm(outcome ~ predictor_1 + predictor_2, data = data, family = binomial(link = "logit"))

binomial(link = "logit") models data as coming from a Binomial / Bernoulli distribution where the logit links maps probabilities to a continuous numeric space.

Binary logistic model

Bernoulli distribution has a single parameter \(\theta\) (theta).

\(P(y = 1) = \theta\):

\(\theta\) is the probability that the outcome variable takes a value of 1.

\(P(y = 0) = 1 - \theta\):

Example: if \(\theta = .7\), what’s \(P(y = 0)\)?

Binary logistic model

\(\theta\) controls the relative heights of each bar

Real data analysis

# load packages
library(tidyverse)

# load smoking data (link can be found on NOW)
smoking_df <- read_csv("https://raw.githubusercontent.com/mark-andrews/ntupsychology-data/main/data-sets/smoking.csv") 

# select variables of interest
smoking_df <- select( smoking_df, cigs, age )

# look at the data
glimpse( smoking_df )

Rows: 807
Columns: 2
$ cigs <dbl> 0, 0, 3, 0, 0, 0, 0, 0, 0, 0, 10, 20, 30, 0, 0, 20, 0, 30, 0, 20,…
$ age  <dbl> 46, 40, 58, 30, 17, 86, 35, 48, 48, 31, 27, 24, 71, 44, 22, 29, 3…

Real data analysis

Whats the probability of a person smokes? How does this probability vary as function of other factors.

Lots of cigs = 0 (non-smokers); smokers vary in terms of how much they smoke.

cigs is NOT binary (smokers, non-smokers) but a count.

Real data analysis

# Make the cigs a binary variable
smoking_df <- mutate(smoking_df, is_smoker = cigs > 0)

# How many smokers and nonsmokers are in the data?
count( smoking_df, is_smoker )

# A tibble: 2 × 2
  is_smoker     n
  <lgl>     <int>
1 FALSE       497
2 TRUE        310

Real data analysis

# Probability of smokers (same as `mean(smoking_df$is_smoker)`)
310 / ( 310 + 497 )

[1] 0.3841388

# Overall probability of smokers in odds
0.3841388 / ( 1 - 0.3841388 )

[1] 0.6237425

# Overall probability of smokers in log-odds (logits)
log( 0.6237425 )

[1] -0.4720177

Binary logistic regression

lm(outcome ~ predictor, data = data) wouldn’t return an error but it won’t do what you want it to do.

m_0 <- glm(is_smoker ~ 1, data = smoking_df, family = binomial(link = 'logit'))

family defines the distribution family.
link = 'logit' maps probability to logits (log-odds) to allow for linear modelling.

Intercept is the average log-odds to smoke (in the context of the population that these data come from).

summary( m_0 )$coef # compare Intercept estimate to previous slide!

              Estimate Std. Error   z value     Pr(>|z|)
(Intercept) -0.4720177 0.07237319 -6.521997 6.937742e-11

Binary logistic regression

Research Question: To what extent does age affect the probability of smoking?

m_smoke <- glm(is_smoker ~ age, data = smoking_df, family = binomial(link = 'logit'))

summary( m_smoke )$coef

               Estimate  Std. Error    z value    Pr(>|z|)
(Intercept)  0.02396177 0.190206271  0.1259778 0.899749472
age         -0.01215431 0.004355148 -2.7907922 0.005257921

Coefficient table is just like a coefficient table for a linear model.
Displays the linear relationship between predictor (age) and the log-odds of being a smoker.
What’s the answer to our research question?

Binary logistic regression

summary(m_smoke)$coef

               Estimate  Std. Error    z value    Pr(>|z|)
(Intercept)  0.02396177 0.190206271  0.1259778 0.899749472
age         -0.01215431 0.004355148 -2.7907922 0.005257921

Sign of age estimate (-0.01215431; shown in log-odds) is negative; what does this mean?
Slope estimate shows the proportional (linear) change in the log-odds (logits) to smoke for each year of age.
Std. Error: uncertainty in the estimate.
z is a critical value to get the p-value; this is similar to the t-value in linear regressions but we don’t require degrees of freedom.
Pr(>|z|): probability of obtaining the same absolute value of z or something larger if age has really no effect on smoking.

Logit-link function

\(\dots\) transforms probability to log-odds.

Log-odds and odds are another way of talking about probability (e.g., kilometers and miles are both measures of distance, but one might make more sense to you than the other).

Importantly,

Log-odds transformation ensures that \(\theta\) (probability of outcome) stays between 0 and 1 while log-odds are unbound.
\(\theta\) is not a linear function of predictors but the log odds transformation of \(\theta\) is. \[ \begin{aligned} \log\left(\frac{\theta}{1 - \theta}\right) = \beta_0 + \beta_1 \times x_i. \end{aligned} \]

There and back again

Odds tell you how much more likely an event is to happen than not:

\[\text{odds}=\frac{\theta}{1-\theta}\]

Taking the natural log of the odds gives the log-odds (logit):

\[\text{logit}(\theta)=\log\!\left(\text{odds}\right)\]

Exponentiation turns log-odds into odds:

\[\text{odds}=e^{\text{logit(}\theta\text{)}}\] Converting log-odds back to probability:

\[\theta = \frac{1}{1+e^{-\text{logit(}\theta\text{)}}} \]

Function for probability to log-odds

\[ \log\!\left(\frac{\theta}{1-\theta}\right) \]

# probability to logit
logit <- function( x ) log( x / ( 1 - x ) )

logit( .6 ) # prob to logit

[1] 0.4054651

# or R function
qlogis( .6 )

[1] 0.4054651

Implement function for log-odds to probability

\[ \frac{1}{1+e^{-\theta}} \]

# logit to probability
ilogit <- function( x ) 1 / ( 1 + exp( -x ) )

ilogit( 0.4054651 )

[1] 0.6

# or R function
plogis( 0.4054651 )

[1] 0.6

Converting probabilities into log-odds

# Here are some low, medium and high probability values:
theta_low <- 0.01  # low prob of success 
theta_mid <- 0.5   # equal prob of success 
theta_high <- 0.99 # high prob of success

# Logits of low probability
log( theta_low / ( 1 - theta_low ) )

[1] -4.59512

logit( theta_low ) #  or use plogis()

[1] -4.59512

Work out the logit values for theta_mid and theta_high using the formula and one of the two functions.

Converting between probabilities and log-odds

If the probability of bad weather is 0.75, what are the odds that we’ll have bad weather?

\[ \text{odds} = \frac{\theta}{1 - \theta} \] Use \(R\) as a calculator or solve it by hand!

Converting between probabilities and log-odds

If the probability of bad weather is \(\theta = 0.75\), what are the odds that we’ll have bad weather?

p <- 0.75
p / ( 1 - p )

[1] 3

Probability of \(\theta = 0.75\) means that we are three times more likely to have bad weather than good weather.

Converting between probabilities and log-odds

If the odds are 4 (i.e. 4:1) to have bad weather, what is the probability that we will have bad weather?

\[ \theta = \frac{\text{odds}}{1 + \text{odds}} \] Use \(R\) as a calculator!

Converting between probabilities and log-odds

If the odds are 4 (i.e. 4:1) to have bad weather, what is the probability that we will have bad weather?

odds <- 4
odds / ( 1 + odds )

[1] 0.8

Converting between probabilities and log-odds

If there is a probability of \(\theta = 0.9\) that an intervention improves reading of kids, what is its log odds?

Compute the odds:

\[ \text{odds} = \frac{\theta}{1 - \theta} \]

Take the logarithm of the odds to get the log-odds (also called logit):

\[ \text{log-odds} = \log(\text{odds}) \]

Use \(R\) as a calculator!

Converting between probabilities and log-odds

If there is a probability of \(\theta = 0.9\) that an intervention improves reading of kids, what is its log odds?

\[ \text{odds} = \frac{\theta}{1 - \theta} = \frac{0.9}{1 - 0.9} = \frac{0.9}{0.1} = 9 \]

p <- 0.9
p / (1 - p)

[1] 9

Take the logarithm of the odds to get the log-odds (also called logit):

\[ \text{log-odds} = \log(\text{odds}) = \log(9) \approx 2.197225 \]

log(9)

[1] 2.197225

Converting between probabilities and log-odds

The log-odds on the previous slide were 2.1972246. Convert this value into

Odds (the opposite of log(x) is exp(x)).
Probabilities (using plogis(x)).

(replace x correctly)

Converting between probabilities and log-odds

The log-odds on the previous slide were 2.1972246. Convert this value into

Odds (the opposite of log(x) is exp(x)):

exp(2.197225)

[1] 9.000004

Probabilities (using plogis(x)):

plogis(2.197225)

[1] 0.9

Probability and log-odds

When probability = 0.5, log-odds are 0 (equal).
When probabilities are small, log-odds are negative.
When probabilities are large, log-odds are positive.
Probabilities are bound between 0 and 1; log-odds aren’t.

Predicting outcomes

What are the log-odds of being a smoker at any given age?

What are the odds of being a smoker at any given age?

What is the probability of being a smoker at any given age?

Binary logistic regression

To answer these questions, we use the linear regression equation.

\[ \log\left(\frac{\theta}{1 - \theta}\right) = \beta_0 + \beta_1 \times x_1 + \beta_2 \times x_2 + \dots + \beta_k \times x_k \]

In normal linear regression, the linear function returns an average of the same unit as the outcome.
In binary logistic regression, the linear function return the average of the outcome in log-odds.

For example, for the log-odds of being a smoker at 25 years of age, this simplifies to:

\[ \log\left(\frac{\text{is_smoker}}{1 - \text{is_smoker}}\right) = \beta_0 + \beta_1 \times 25 \]

Predicting outcomes: model coefficients

# Extract model coefficients
b <- coef(m_smoke)
b

(Intercept)         age 
 0.02396177 -0.01215431

Predicting outcomes: log-odds

What are the log-odds of being a smoker at 25 years of age?

b[1] + b[2] * 25 # linear function of age = 25

[1] -0.279896

Indexing the first value [1] for intercept and second value [2] for age slope.

# or without variable indexing
0.02396177 + -0.01215431 * 25

[1] -0.279896

Predicting outcomes: log-odds

What are the odds of being a smoker at 25 years of age?

exp(b[1] + b[2] * 25)

[1] 0.7558623

Indexing the first value [1] for intercept and second value [2] for age slope.

# or without variable indexing
exp(0.02396177 + -0.01215431 * 25)

[1] 0.7558624

Predicting outcomes: probabilities

But what does log-odds of -0.279 actually mean?

What is the probability of being a smoker at 25 years of age?

The inverse logit function, allows us to convert from log-odds to probability.

\[ \theta = \frac{1}{1 + e^{-\text{logit}(\theta)}} \] We implemented ilogit(x) before, or use the base-R function plogis(x).

Predicting outcomes: probabilities

\[ \log\left(\frac{\text{is_smoker}}{1 - \text{is_smoker}}\right) = \beta_0 + \beta_1 \times 25 \]

which is

logit_smoke_age25 <- b[1] + b[2] * 25 # linear function of age = 25

plogis( logit_smoke_age25 ) # using plogis to convert log odds to prob

[1] 0.4304793

1 / ( 1 + exp( -logit_smoke_age25 ) ) # using the formula we implemted in ilogit(x)

[1] 0.4304793

Predicting outcomes: exercise

What are the log-odds of being a smoker at 50 years of age?

What are the odds of being a smoker at 50 years of age?

What is the probability of being a smoker at 50 years of age?

Predicting outcomes: exercise

What are the log-odds of being a smoker at 50 years of age?

b[1] + b[2] * 50

[1] -0.5837539

What are the odds of being a smoker at 50 years of age?

exp(b[1] + b[2] * 50)

[1] 0.5578005

What is the probability of being a smoker at 50 years of age?

plogis(b[1] + b[2] * 50)

[1] 0.3580693

Predicting multiple outcomes: manual approach

We can make predictions for larger ranges of predictor values of hypothetical participants (of ages that we did and did not observe!).

# Create vector ages that you're interested in.
age <- seq( from = 15, to = 80, by = 10 )
age

[1] 15 25 35 45 55 65 75

# Use this age vector in the regression formula.
b[1] + b[2] * age # in log-odds

[1] -0.1583529 -0.2798960 -0.4014392 -0.5229823 -0.6445254 -0.7660686 -0.8876117

Try this for ages 20, 25, 30, 35, 40 using seq!

Predicting multiple outcomes: `predict`

# Step 1. Make a new data frame with range of values of age.
smoking_df2 <- tibble(age = seq(from = 15, to = 80, by = 10))

# Step 2. Use model from earlier to generate predictions.
predict(m_smoke, newdata = smoking_df2) # in log-odds

         1          2          3          4          5          6          7 
-0.1583529 -0.2798960 -0.4014392 -0.5229823 -0.6445254 -0.7660686 -0.8876117

Predicting multiple outcomes: `add_predictions`

# Alternative to Step 2 (also log-odds)
library(modelr)
add_predictions(data = smoking_df2, # name of data frame 
                model = m_smoke)    # name of model

Try one of these approaches – predict or add_predictions – for ages 20, 25, 30, 35, 40!

# A tibble: 7 × 2
    age   pred
  <dbl>  <dbl>
1    15 -0.158
2    25 -0.280
3    35 -0.401
4    45 -0.523
5    55 -0.645
6    65 -0.766
7    75 -0.888

Predicting multiple outcomes: probabilities

Specify type = 'response' to get predictions in probability.

# Base R function for vector of predictions
predict(m_smoke, newdata = smoking_df2, type = 'response')

# Alternative using `modelr` for output as data frame
library(modelr)
add_predictions(data = smoking_df2, model = m_smoke, type = 'response')

Predicting multiple outcomes: probabilities

add_predictions(data = smoking_df2, # name of data frame 
                model = m_smoke,    # name of model
                type = 'response')  # return probabilities

# A tibble: 7 × 2
    age  pred
  <dbl> <dbl>
1    15 0.460
2    25 0.430
3    35 0.401
4    45 0.372
5    55 0.344
6    65 0.317
7    75 0.292

Predicting multiple outcomes: probabilities

Try one or both of these approaches – predict or add_predictions – to predict the probability of smoking for ages 20, 25, 30, 35, 40. Replace "..."!

# Step 1. Make a new data frame with range of values of age.
smoking_df2 <- tibble(age = seq(from = ..., to = ..., by = ...))

# Step 2. Use model from earlier to generate predictions.
predict(object = ...,  # name of model
        newdata = ..., #  name of data frame to make predictions for
        type = '...')  # return probabilities

# Alternative to Step 2 
library(modelr)
add_predictions(data = ...,     # name of data frame to use for predictions
                model = ...,    # name of model
                 type = '...')  # return probabilities

References

Andrews, M. (2021). Doing data science in R: An Introduction for Social Scientists. SAGE Publications Ltd.

Learning Outcomes

Recap: Linear regression

Recap: Linear regression

Recap: Linear regression

Recap: Linear regression

Recap: Linear regression

Recap: Linear regression

Binary logistic model

Binary logistic model

Bernoulli to the rescue

Bernoulli to the rescue

Binary logistic model: in R

Binary logistic model

Binary logistic model

Real data analysis

Real data analysis

Real data analysis

Real data analysis

Binary logistic regression

Binary logistic regression

Binary logistic regression

Logit-link function

There and back again

Function for probability to log-odds

Implement function for log-odds to probability

Converting probabilities into log-odds

Converting between probabilities and log-odds

Converting between probabilities and log-odds

Converting between probabilities and log-odds

Converting between probabilities and log-odds

Converting between probabilities and log-odds

Converting between probabilities and log-odds

Converting between probabilities and log-odds

Converting between probabilities and log-odds

Probability and log-odds

Predicting outcomes

Binary logistic regression

Predicting outcomes: model coefficients

Predicting outcomes: log-odds

Predicting outcomes: log-odds

Predicting outcomes: probabilities

Predicting outcomes: probabilities

Predicting outcomes: exercise

Predicting outcomes: exercise

Predicting multiple outcomes: manual approach

Predicting multiple outcomes: predict

Predicting multiple outcomes: add_predictions

Predicting multiple outcomes: probabilities

Predicting multiple outcomes: probabilities

Predicting multiple outcomes: probabilities

References

Predicting multiple outcomes: `predict`

Predicting multiple outcomes: `add_predictions`