Interpreting the results of a logistic regression model involves understanding how the predictor variables affect the probability of the outcome event. Here are some common ways to interpret logistic regression results:
Let’s say you have a logistic regression model predicting the likelihood of a customer purchasing a product based on their age and income.
This means that for every one-year increase in age, the odds of purchasing the product increase by a factor of 1.05, or 5% (\(100 * (e^{\beta} - 1)\)).
A logistic regression model predicts the likelihood of a student passing a course based on the number of hours studied and whether they attended a review session (yes/no).
# Set seed for reproducibility
set.seed(42)
# Number of observations
n <- 500
# Generate random data for hours studied (uniform distribution)
hours_studied <- runif(n, min = 1, max = 20)
# Generate random data for review session attendance (categorical)
review_session <- sample(c("yes", "no"), n, replace = TRUE, prob = c(0.6, 0.4))
# Create dummy variable for review session
review_attended <- ifelse(review_session == "yes", 1, 0)
# Generate probabilities for passing based on hours studied and review session
# (using coefficients similar to our previous example)
log_odds <- 0.2 * hours_studied + 1.5 * review_attended
probability <- 1 / (1 + exp(-log_odds))
# Generate pass outcome (0 or 1) based on probabilities
pass <- rbinom(n, 1, probability)
# Create a data frame
df <- data.frame(hours_studied, review_session, review_attended, pass)
df$review_attended<-as.factor(df$review_attended)
df$pass<-as.factor(df$pass)
# Fit logistic regression model
model <- glm(pass ~ hours_studied + review_attended, data = df, family = "binomial")
# Print model summary
summary(model)
##
## Call:
## glm(formula = pass ~ hours_studied + review_attended, family = "binomial",
## data = df)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 0.55444 0.34120 1.625 0.104168
## hours_studied 0.15572 0.03801 4.097 4.19e-05 ***
## review_attended1 1.25495 0.36097 3.477 0.000508 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 273.86 on 499 degrees of freedom
## Residual deviance: 239.14 on 497 degrees of freedom
## AIC: 245.14
##
## Number of Fisher Scoring iterations: 6
hours studied
(quantitative)This means that for every additional hour studied, the odds of passing the course increase by a factor of 1.168, or 16%.
review attended
(qualitative)This means that students who attended the review session have 3.49 times the odds of passing compared to those who did not.
# Call necessary libraries
library(ggeffects)
library(cowplot)
##
## Attaching package: 'cowplot'
## The following object is masked from 'package:ggeffects':
##
## get_title
# Obtain predicted probabilities for each term separately
prediction_hours<-ggpredict(model, terms="hours_studied [all]")
prediction_review<-ggpredict(model, terms="review_attended")
# Plot each term individually
plot_hours<-plot(prediction_hours)
plot_review<-plot(prediction_review)
# Combine plots into a single figure
plot_grid(plot_hours, plot_review)
A bank uses a logistic regression model to predict the likelihood of a customer defaulting on a loan based on their credit score and income.
# Set seed for reproducibility
set.seed(999)
# Number of observations
n <- 6000
# Generate random data for credit score (normally distributed)
credit_score <- rnorm(n, mean = 650, sd = 50)
# Generate random data for income (log-normal distribution)
income <- exp(rnorm(n, mean = 10, sd = 1))
# Generate probabilities for default based on credit score and income
# (using adjusted coefficients and a base log-odds)
base_log_odds <- 40 # Add a base log-odds to shift the probabilities
log_odds <- base_log_odds - 0.01 * credit_score - 0.005 * income
probability <- 1 / (1 + exp(-log_odds))
# Generate default outcome (0 or 1) based on probabilities
default <- factor(rbinom(n, 1, probability))
levels(default)=c("No", "Yes")
# Create a data frame
df <- data.frame(credit_score, income, default)
# Plot it for fun
library(ggplot2)
ggplot(df, aes(y = default, x = credit_score)) +
geom_jitter(width = 0.05, height = 0.05, alpha = 0.5) +
theme_classic(base_size = 15)
# Fit logistic regression model
model <- glm(default ~ credit_score + income, data = df, family = "binomial")
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
# Print model summary
summary(model)
##
## Call:
## glm(formula = default ~ credit_score + income, family = "binomial",
## data = df)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 42.082308 4.698374 8.957 < 2e-16 ***
## credit_score -0.012690 0.003809 -3.331 0.000864 ***
## income -0.005031 0.000472 -10.659 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 4391.13 on 5999 degrees of freedom
## Residual deviance: 232.48 on 5997 degrees of freedom
## AIC: 238.48
##
## Number of Fisher Scoring iterations: 15
credit_score
This means that for every one-point increase in credit score, the odds of defaulting decrease by a factor of 0.99, or 1%.
income
This means that for every one-dollar increase in income, the odds of defaulting decrease by a very small factor of 0.995.
# Call necessary libraries
library(ggeffects)
library(cowplot)
# Obtain predicted probabilities for each term separately
prediction_credit<-ggpredict(model, terms="credit_score [all]") # using [all] gets smooth plots.
prediction_income<-ggpredict(model, terms="income [all]")
# Plot each term individually
plot_credit<-plot(prediction_credit)
plot_income<-plot(prediction_income)
# Combine plots into a single figure
plot_grid(plot_credit, plot_income)
An online advertiser uses logistic regression to predict the likelihood of a user clicking on an ad based on the ad’s position on the page and the user’s age.
# Set seed for reproducibility
set.seed(777)
# Number of observations
n <- 2000
# Generate random data for ad position (uniformly distributed)
ad_position <- sample(1:5, n, replace = TRUE) # Assuming 5 ad positions
# Generate random data for user age (normally distributed)
user_age <- rnorm(n, mean = 30, sd = 8)
# Generate probabilities for click-through based on ad position and user age
# (using coefficients similar to our previous example)
log_odds <- -0.5 * ad_position + 0.02 * user_age
probability <- 1 / (1 + exp(-log_odds))
# Generate click-through outcome (0 or 1) based on probabilities
click <- as.factor(rbinom(n, 1, probability))
# Create a data frame
df <- data.frame(ad_position, user_age, click)
df$ad_position<-as.factor(df$ad_position)
# Fit logistic regression model
model <- glm(click ~ ad_position + user_age, data = df, family = "binomial")
# Print model summary
summary(model)
##
## Call:
## glm(formula = click ~ ad_position + user_age, family = "binomial",
## data = df)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -0.583076 0.215397 -2.707 0.00679 **
## ad_position2 -0.310708 0.139656 -2.225 0.02609 *
## ad_position3 -0.764950 0.147841 -5.174 2.29e-07 ***
## ad_position4 -1.351590 0.164504 -8.216 < 2e-16 ***
## ad_position5 -1.615061 0.173516 -9.308 < 2e-16 ***
## user_age 0.014858 0.006318 2.352 0.01869 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 2450.2 on 1999 degrees of freedom
## Residual deviance: 2302.8 on 1994 degrees of freedom
## AIC: 2314.8
##
## Number of Fisher Scoring iterations: 4
ad position
This means that the odds of clicking on the ad in position 2 are 0.73 times the odds of clicking an ad in position 1. This is equivalent to saying the odds of clicking on the ad are 27% lower for an ad in position 2 compared to position 1
user age
This means that for every one-year increase in user age, the odds of clicking on the ad increase by a factor of 1.02, or 2%.
# Obtain predicted probabilities for each term separately
prediction_ad<-ggpredict(model, terms="ad_position") # using [all] gets smooth plots.
prediction_age<-ggpredict(model, terms="user_age [all]")
# Plot each term individually
plot_ad<-plot(prediction_ad)
plot_age<-plot(prediction_age)
# Combine plots into a single figure
plot_grid(plot_ad, plot_age)