When adding interaction terms in a logit regression model, interpreting their effects can be tricky because the relationship between variables and the outcome is not straightforward. In a linear regression, an interaction term’s coefficient directly shows how two variables combine to affect the outcome. But in a logit model, the effect of an interaction depends on the values of other variables, meaning it can change across observations. Ai and Norton (2003) warn that a significant interaction coefficient does not always mean there is a meaningful impact on the predicted probability. If researchers interpret the interaction term like they would in a linear model, they might mistakenly assume a constant effect when, in reality, it varies across different situations.
A simulation-based approach, as suggested by King, Tomz, and Wittenberg (2000) and Zelner (2009), helps by computing predicted probabilities instead of relying on raw coefficients. By simulating different scenarios, researchers can visualize how an interaction affects the probability of an outcome under various conditions. This makes it easier to see whether the interaction has a strong or weak effect, or if it changes direction depending on the values of other variables. Using simulations helps avoid misinterpretation and provides a clearer, more accurate picture of how interactions work in logit models.
The data set I will use to perform a
logit regression is the Rossi dataset from the AER package. The data set
contains information on 432 individuals released from prison and tracks
whether they were re-arrested within a specific period.
- Binary Dependent Variable:
Rearrested(1 = Yes, 0 = No)
- Independent Variable:
race(black,other)
- Independent Variable: Marriage(not
married, married)
- Independent Variable: Work
Experience(yes, no)
- Independent Variable: Financial
Assistance(yes, no)
- Does race influence the likelihood
of being re-arrested?
- Does marriage status impact the
likelihood of being re-arrested?
- Does having work experience reduce the likelihood of re-arrest?
- Does financial assistance reduce the likelihood of re-arrest?
#Start Session
rm(list =ls())
gc()
## used (Mb) gc trigger (Mb) limit (Mb) max used (Mb)
## Ncells 544083 29.1 1204401 64.4 NA 700242 37.4
## Vcells 1004419 7.7 8388608 64.0 16384 1963155 15.0
library(AER)
## Loading required package: car
## Loading required package: carData
## Loading required package: lmtest
## Loading required package: zoo
##
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
##
## as.Date, as.Date.numeric
## Loading required package: sandwich
## Loading required package: survival
data(Rossi)
#Tidy Data(Subset Variables and Rename Columns)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following object is masked from 'package:car':
##
## recode
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
Rossi <- select(Rossi, arrest, race, mar, wexp, fin)
Rossi <- rename(Rossi, Rearrested = "arrest", Race = "race", Marriage = "mar", Work.Experience = "wexp", Financial.Assistance = "fin")
head(Rossi)
## Rearrested Race Marriage Work.Experience Financial.Assistance
## 1 1 black not married no no
## 2 1 black not married no no
## 3 1 other not married yes no
## 4 0 black married yes yes
## 5 0 other not married yes no
## 6 0 black not married yes no
#Convert my individual-level raw data to grouped data
Grouped <- Rossi %>%
group_by(Race, Marriage, Work.Experience, Financial.Assistance) %>%
summarise(total = n(), Yes = sum(Rearrested)) %>%
mutate(No = total - Yes)
## `summarise()` has grouped output by 'Race', 'Marriage', 'Work.Experience'. You
## can override using the `.groups` argument.
head(Grouped)
## # A tibble: 6 × 7
## # Groups: Race, Marriage, Work.Experience [3]
## Race Marriage Work.Experience Financial.Assistance total Yes No
## <fct> <fct> <fct> <fct> <int> <int> <int>
## 1 black married no no 2 2 0
## 2 black married no yes 2 1 1
## 3 black married yes no 21 2 19
## 4 black married yes yes 19 2 17
## 5 black not married no no 79 31 48
## 6 black not married no yes 82 22 60
#Start Logit Regression
Model_1 <- glm(formula = cbind(Yes, No) ~ Race, family = binomial, data = Grouped)
summary(Model_1)
##
## Call:
## glm(formula = cbind(Yes, No) ~ Race, family = binomial, data = Grouped)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -0.9990 0.1158 -8.626 <2e-16 ***
## Raceother -0.2296 0.3480 -0.660 0.509
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 31.440 on 14 degrees of freedom
## Residual deviance: 30.991 on 13 degrees of freedom
## AIC: 69.917
##
## Number of Fisher Scoring iterations: 4
#Adding two predictors to my model
Model_2 <- glm(formula = cbind(Yes, No) ~ Race + Marriage, family = binomial, data = Grouped)
summary(Model_2)
##
## Call:
## glm(formula = cbind(Yes, No) ~ Race + Marriage, family = binomial,
## data = Grouped)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -1.6958 0.3874 -4.377 1.2e-05 ***
## Raceother -0.1963 0.3501 -0.561 0.5750
## Marriagenot married 0.7716 0.4008 1.925 0.0542 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 31.440 on 14 degrees of freedom
## Residual deviance: 26.728 on 12 degrees of freedom
## AIC: 67.654
##
## Number of Fisher Scoring iterations: 4
#Adding three predictors to my model
Model_3 <- glm(formula = cbind(Yes, No) ~ Race + Marriage + Work.Experience, family = binomial, data = Grouped)
summary(Model_3)
##
## Call:
## glm(formula = cbind(Yes, No) ~ Race + Marriage + Work.Experience,
## family = binomial, data = Grouped)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -1.2077 0.4353 -2.774 0.00553 **
## Raceother -0.1760 0.3525 -0.499 0.61750
## Marriagenot married 0.5545 0.4123 1.345 0.17864
## Work.Experienceyes -0.5525 0.2268 -2.436 0.01486 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 31.44 on 14 degrees of freedom
## Residual deviance: 20.75 on 11 degrees of freedom
## AIC: 63.676
##
## Number of Fisher Scoring iterations: 4
#Adding four predictors to my model
Model_4 <- glm(formula = cbind(Yes, No) ~ Race + Marriage + Work.Experience + Financial.Assistance, family = binomial, data = Grouped)
summary(Model_4)
##
## Call:
## glm(formula = cbind(Yes, No) ~ Race + Marriage + Work.Experience +
## Financial.Assistance, family = binomial, data = Grouped)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -1.0114 0.4460 -2.268 0.0233 *
## Raceother -0.2205 0.3547 -0.622 0.5341
## Marriagenot married 0.5841 0.4140 1.411 0.1583
## Work.Experienceyes -0.5511 0.2280 -2.417 0.0156 *
## Financial.Assistanceyes -0.4598 0.2239 -2.054 0.0400 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 31.440 on 14 degrees of freedom
## Residual deviance: 16.487 on 10 degrees of freedom
## AIC: 61.413
##
## Number of Fisher Scoring iterations: 4
#Get AIC and BIC values for all models
library(texreg)
## Version: 1.39.4
## Date: 2024-07-23
## Author: Philip Leifeld (University of Manchester)
##
## Consider submitting praise using the praise or praise_interactive functions.
## Please cite the JSS article in your publications -- see citation("texreg").
htmlreg(list(Model_1, Model_2, Model_3, Model_4), doctype = FALSE)
Model 1 | Model 2 | Model 3 | Model 4 | |
---|---|---|---|---|
(Intercept) | -1.00*** | -1.70*** | -1.21** | -1.01* |
(0.12) | (0.39) | (0.44) | (0.45) | |
Raceother | -0.23 | -0.20 | -0.18 | -0.22 |
(0.35) | (0.35) | (0.35) | (0.35) | |
Marriagenot married | 0.77 | 0.55 | 0.58 | |
(0.40) | (0.41) | (0.41) | ||
Work.Experienceyes | -0.55* | -0.55* | ||
(0.23) | (0.23) | |||
Financial.Assistanceyes | -0.46* | |||
(0.22) | ||||
AIC | 69.92 | 67.65 | 63.68 | 61.41 |
BIC | 71.33 | 69.78 | 66.51 | 64.95 |
Log Likelihood | -32.96 | -30.83 | -27.84 | -25.71 |
Deviance | 30.99 | 26.73 | 20.75 | 16.49 |
Num. obs. | 15 | 15 | 15 | 15 |
***p < 0.001; **p < 0.01; *p < 0.05 |
# Perform a likelihood ratio test
anova(Model_1, Model_2, Model_3, Model_4, test = "Chisq")
## Analysis of Deviance Table
##
## Model 1: cbind(Yes, No) ~ Race
## Model 2: cbind(Yes, No) ~ Race + Marriage
## Model 3: cbind(Yes, No) ~ Race + Marriage + Work.Experience
## Model 4: cbind(Yes, No) ~ Race + Marriage + Work.Experience + Financial.Assistance
## Resid. Df Resid. Dev Df Deviance Pr(>Chi)
## 1 13 30.991
## 2 12 26.728 1 4.2632 0.03895 *
## 3 11 20.750 1 5.9773 0.01449 *
## 4 10 16.487 1 4.2637 0.03893 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#obtain the probabilities for the predictors that are significant in Model 4.
library(pander)
library(dplyr)
Prob_Table <- Rossi %>%
group_by(Work.Experience) %>%
summarise(Rearrested = mean(Rearrested)) %>%
mutate(Not.Rearrested = 1 - Rearrested) %>%
pandoc.table()
##
## -----------------------------------------------
## Work.Experience Rearrested Not.Rearrested
## ----------------- ------------ ----------------
## no 0.3351 0.6649
##
## yes 0.2105 0.7895
## -----------------------------------------------
Prob_Table1 <- Rossi %>%
group_by(Financial.Assistance) %>%
summarise(Rearrested = mean(Rearrested)) %>%
mutate(Not.Rearrested = 1 - Rearrested) %>%
pandoc.table()
##
## ----------------------------------------------------
## Financial.Assistance Rearrested Not.Rearrested
## ---------------------- ------------ ----------------
## no 0.3056 0.6944
##
## yes 0.2222 0.7778
## ----------------------------------------------------
Model 1 | Model 2 | Model 3 | Model 4 | |
---|---|---|---|---|
(Intercept) | -1.00*** | -1.70*** | -1.21** | -1.01* |
(0.12) | (0.39) | (0.44) | (0.45) | |
Raceother | -0.23 | -0.20 | -0.18 | -0.22 |
(0.35) | (0.35) | (0.35) | (0.35) | |
Marriagenot married | 0.77 | 0.55 | 0.58 | |
(0.40) | (0.41) | (0.41) | ||
Work.Experienceyes | -0.55* | -0.55* | ||
(0.23) | (0.23) | |||
Financial.Assistanceyes | -0.46* | |||
(0.22) | ||||
AIC | 69.92 | 67.65 | 63.68 | 61.41 |
BIC | 71.33 | 69.78 | 66.51 | 64.95 |
Log Likelihood | -32.96 | -30.83 | -27.84 | -25.71 |
Deviance | 30.99 | 26.73 | 20.75 | 16.49 |
Num. obs. | 15 | 15 | 15 | 15 |
***p < 0.001; **p < 0.01; *p < 0.05 |
By looking at the table you can see
that adding more predictors to the model improves the model’s fit
because the AIC values and BIC values decrease as the numbers of
predictors increase. So model 4 is the best fit model because it has the
lowest AIC and BIC values compared to the other models.
## Analysis of Deviance Table
##
## Model 1: cbind(Yes, No) ~ Race
## Model 2: cbind(Yes, No) ~ Race + Marriage
## Model 3: cbind(Yes, No) ~ Race + Marriage + Work.Experience
## Model 4: cbind(Yes, No) ~ Race + Marriage + Work.Experience + Financial.Assistance
## Resid. Df Resid. Dev Df Deviance Pr(>Chi)
## 1 13 30.991
## 2 12 26.728 1 4.2632 0.03895 *
## 3 11 20.750 1 5.9773 0.01449 *
## 4 10 16.487 1 4.2637 0.03893 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Since all the p-values are < 0.05,
adding Marriage, Work Experience, and Financial Assistance significantly
improves the model’s fit. So Model 4 would be the best fit model.
##
## Call:
## glm(formula = cbind(Yes, No) ~ Race + Marriage + Work.Experience +
## Financial.Assistance, family = binomial, data = Grouped)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -1.0114 0.4460 -2.268 0.0233 *
## Raceother -0.2205 0.3547 -0.622 0.5341
## Marriagenot married 0.5841 0.4140 1.411 0.1583
## Work.Experienceyes -0.5511 0.2280 -2.417 0.0156 *
## Financial.Assistanceyes -0.4598 0.2239 -2.054 0.0400 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 31.440 on 14 degrees of freedom
## Residual deviance: 16.487 on 10 degrees of freedom
## AIC: 61.413
##
## Number of Fisher Scoring iterations: 4
The Coefficient for Raceother
(-0.2205) has a P value of 0.5341, which is greater than 0.05. This
means that is no statistically significant relationship between Race and
the likelihood of being re-arrested.
The Coefficient for Marriagenot
married (0.2205) has a P value of 0.1583, which is greater than 0.05.
This means that is no statistically significant relationship between
Marriage and the likelihood of being re-arrested.
The P value for the Coefficient
Work.Experienceyes is .0156, which is less than 0.05. This means that
there is a statistically significant relationship between Work
Experience and the likelihood of being re - arrested. The Coefficient
for Work.Experienceyes is -0.2205, which means that having work
experience reduces the likelihood of being re-arrested.
The P value for the Coefficient of
Financial.Assistanceyes is 0.0400 which is less than 0.05. This means
that there is a statistically significant relationship between Financial
Assistance and the likelihood of being re-arrested. The Coefficient for
Financial.Assistanceyes is -0.4598, which means that having financial
assistance reduces the likelihood of being re-arrested.
##
## -----------------------------------------------
## Work.Experience Rearrested Not.Rearrested
## ----------------- ------------ ----------------
## no 0.3351 0.6649
##
## yes 0.2105 0.7895
## -----------------------------------------------
In my table, individuals with work experience have a lower probability of being rearrested compared to those without work experience.
##
## ----------------------------------------------------
## Financial.Assistance Rearrested Not.Rearrested
## ---------------------- ------------ ----------------
## no 0.3056 0.6944
##
## yes 0.2222 0.7778
## ----------------------------------------------------
In my table, individuals with financial assistance have a lower probability of being rearrested compared to those without financial assistance.