1. Reading Assignment

When adding interaction terms in a logit regression model, interpreting their effects can be tricky because the relationship between variables and the outcome is not straightforward. In a linear regression, an interaction term’s coefficient directly shows how two variables combine to affect the outcome. But in a logit model, the effect of an interaction depends on the values of other variables, meaning it can change across observations. Ai and Norton (2003) warn that a significant interaction coefficient does not always mean there is a meaningful impact on the predicted probability. If researchers interpret the interaction term like they would in a linear model, they might mistakenly assume a constant effect when, in reality, it varies across different situations.

A simulation-based approach, as suggested by King, Tomz, and Wittenberg (2000) and Zelner (2009), helps by computing predicted probabilities instead of relying on raw coefficients. By simulating different scenarios, researchers can visualize how an interaction affects the probability of an outcome under various conditions. This makes it easier to see whether the interaction has a strong or weak effect, or if it changes direction depending on the values of other variables. Using simulations helps avoid misinterpretation and provides a clearer, more accurate picture of how interactions work in logit models.

2. Logit Regression

DataSet

The data set I will use to perform a logit regression is the Rossi dataset from the AER package. The data set contains information on 432 individuals released from prison and tracks whether they were re-arrested within a specific period.

Key Variables I will analyze

- Binary Dependent Variable: Rearrested(1 = Yes, 0 = No)

- Independent Variable: race(black,other)

- Independent Variable: Marriage(not married, married)

- Independent Variable: Work Experience(yes, no)

- Independent Variable: Financial Assistance(yes, no)

Research Questions

- Does race influence the likelihood of being re-arrested?

- Does marriage status impact the likelihood of being re-arrested?

- Does having work experience reduce the likelihood of re-arrest?

- Does financial assistance reduce the likelihood of re-arrest?

#Start Session
rm(list =ls())
gc()
##           used (Mb) gc trigger (Mb) limit (Mb) max used (Mb)
## Ncells  544083 29.1    1204401 64.4         NA   700242 37.4
## Vcells 1004419  7.7    8388608 64.0      16384  1963155 15.0
library(AER)
## Loading required package: car
## Loading required package: carData
## Loading required package: lmtest
## Loading required package: zoo
## 
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric
## Loading required package: sandwich
## Loading required package: survival
data(Rossi)
#Tidy Data(Subset Variables and Rename Columns)
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following object is masked from 'package:car':
## 
##     recode
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
Rossi <- select(Rossi, arrest, race, mar, wexp, fin) 
Rossi <- rename(Rossi, Rearrested = "arrest", Race = "race", Marriage = "mar", Work.Experience = "wexp", Financial.Assistance = "fin")
head(Rossi)
##   Rearrested  Race    Marriage Work.Experience Financial.Assistance
## 1          1 black not married              no                   no
## 2          1 black not married              no                   no
## 3          1 other not married             yes                   no
## 4          0 black     married             yes                  yes
## 5          0 other not married             yes                   no
## 6          0 black not married             yes                   no
#Convert my individual-level raw data to grouped data
Grouped <- Rossi %>%
    group_by(Race, Marriage, Work.Experience, Financial.Assistance) %>%
    summarise(total = n(), Yes = sum(Rearrested)) %>%
    mutate(No = total - Yes)
## `summarise()` has grouped output by 'Race', 'Marriage', 'Work.Experience'. You
## can override using the `.groups` argument.
head(Grouped)
## # A tibble: 6 × 7
## # Groups:   Race, Marriage, Work.Experience [3]
##   Race  Marriage    Work.Experience Financial.Assistance total   Yes    No
##   <fct> <fct>       <fct>           <fct>                <int> <int> <int>
## 1 black married     no              no                       2     2     0
## 2 black married     no              yes                      2     1     1
## 3 black married     yes             no                      21     2    19
## 4 black married     yes             yes                     19     2    17
## 5 black not married no              no                      79    31    48
## 6 black not married no              yes                     82    22    60
#Start Logit Regression
Model_1 <- glm(formula = cbind(Yes, No) ~ Race, family = binomial, data = Grouped)
summary(Model_1)
## 
## Call:
## glm(formula = cbind(Yes, No) ~ Race, family = binomial, data = Grouped)
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  -0.9990     0.1158  -8.626   <2e-16 ***
## Raceother    -0.2296     0.3480  -0.660    0.509    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 31.440  on 14  degrees of freedom
## Residual deviance: 30.991  on 13  degrees of freedom
## AIC: 69.917
## 
## Number of Fisher Scoring iterations: 4
#Adding two predictors to my model
Model_2 <- glm(formula = cbind(Yes, No) ~ Race + Marriage, family = binomial, data = Grouped) 
summary(Model_2)
## 
## Call:
## glm(formula = cbind(Yes, No) ~ Race + Marriage, family = binomial, 
##     data = Grouped)
## 
## Coefficients:
##                     Estimate Std. Error z value Pr(>|z|)    
## (Intercept)          -1.6958     0.3874  -4.377  1.2e-05 ***
## Raceother            -0.1963     0.3501  -0.561   0.5750    
## Marriagenot married   0.7716     0.4008   1.925   0.0542 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 31.440  on 14  degrees of freedom
## Residual deviance: 26.728  on 12  degrees of freedom
## AIC: 67.654
## 
## Number of Fisher Scoring iterations: 4
#Adding three predictors to my model
Model_3 <- glm(formula = cbind(Yes, No) ~ Race + Marriage + Work.Experience, family = binomial, data = Grouped) 
summary(Model_3)
## 
## Call:
## glm(formula = cbind(Yes, No) ~ Race + Marriage + Work.Experience, 
##     family = binomial, data = Grouped)
## 
## Coefficients:
##                     Estimate Std. Error z value Pr(>|z|)   
## (Intercept)          -1.2077     0.4353  -2.774  0.00553 **
## Raceother            -0.1760     0.3525  -0.499  0.61750   
## Marriagenot married   0.5545     0.4123   1.345  0.17864   
## Work.Experienceyes   -0.5525     0.2268  -2.436  0.01486 * 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 31.44  on 14  degrees of freedom
## Residual deviance: 20.75  on 11  degrees of freedom
## AIC: 63.676
## 
## Number of Fisher Scoring iterations: 4
#Adding four predictors to my model
Model_4 <- glm(formula = cbind(Yes, No) ~ Race + Marriage + Work.Experience + Financial.Assistance, family = binomial, data = Grouped) 
summary(Model_4)
## 
## Call:
## glm(formula = cbind(Yes, No) ~ Race + Marriage + Work.Experience + 
##     Financial.Assistance, family = binomial, data = Grouped)
## 
## Coefficients:
##                         Estimate Std. Error z value Pr(>|z|)  
## (Intercept)              -1.0114     0.4460  -2.268   0.0233 *
## Raceother                -0.2205     0.3547  -0.622   0.5341  
## Marriagenot married       0.5841     0.4140   1.411   0.1583  
## Work.Experienceyes       -0.5511     0.2280  -2.417   0.0156 *
## Financial.Assistanceyes  -0.4598     0.2239  -2.054   0.0400 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 31.440  on 14  degrees of freedom
## Residual deviance: 16.487  on 10  degrees of freedom
## AIC: 61.413
## 
## Number of Fisher Scoring iterations: 4
#Get AIC and BIC values for all models
library(texreg)
## Version:  1.39.4
## Date:     2024-07-23
## Author:   Philip Leifeld (University of Manchester)
## 
## Consider submitting praise using the praise or praise_interactive functions.
## Please cite the JSS article in your publications -- see citation("texreg").
htmlreg(list(Model_1, Model_2, Model_3, Model_4), doctype = FALSE)
Statistical models
  Model 1 Model 2 Model 3 Model 4
(Intercept) -1.00*** -1.70*** -1.21** -1.01*
  (0.12) (0.39) (0.44) (0.45)
Raceother -0.23 -0.20 -0.18 -0.22
  (0.35) (0.35) (0.35) (0.35)
Marriagenot married   0.77 0.55 0.58
    (0.40) (0.41) (0.41)
Work.Experienceyes     -0.55* -0.55*
      (0.23) (0.23)
Financial.Assistanceyes       -0.46*
        (0.22)
AIC 69.92 67.65 63.68 61.41
BIC 71.33 69.78 66.51 64.95
Log Likelihood -32.96 -30.83 -27.84 -25.71
Deviance 30.99 26.73 20.75 16.49
Num. obs. 15 15 15 15
***p < 0.001; **p < 0.01; *p < 0.05
# Perform a likelihood ratio test
anova(Model_1, Model_2, Model_3, Model_4, test = "Chisq")
## Analysis of Deviance Table
## 
## Model 1: cbind(Yes, No) ~ Race
## Model 2: cbind(Yes, No) ~ Race + Marriage
## Model 3: cbind(Yes, No) ~ Race + Marriage + Work.Experience
## Model 4: cbind(Yes, No) ~ Race + Marriage + Work.Experience + Financial.Assistance
##   Resid. Df Resid. Dev Df Deviance Pr(>Chi)  
## 1        13     30.991                       
## 2        12     26.728  1   4.2632  0.03895 *
## 3        11     20.750  1   5.9773  0.01449 *
## 4        10     16.487  1   4.2637  0.03893 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#obtain the probabilities for the predictors that are significant in Model 4.

library(pander)
library(dplyr)

Prob_Table <- Rossi %>% 
    group_by(Work.Experience) %>% 
    summarise(Rearrested = mean(Rearrested)) %>% 
  mutate(Not.Rearrested = 1 - Rearrested) %>% 
  pandoc.table()
## 
## -----------------------------------------------
##  Work.Experience   Rearrested   Not.Rearrested 
## ----------------- ------------ ----------------
##        no            0.3351         0.6649     
## 
##        yes           0.2105         0.7895     
## -----------------------------------------------
Prob_Table1 <- Rossi %>% 
    group_by(Financial.Assistance) %>% 
    summarise(Rearrested = mean(Rearrested)) %>% 
  mutate(Not.Rearrested = 1 - Rearrested) %>% 
  pandoc.table()
## 
## ----------------------------------------------------
##  Financial.Assistance   Rearrested   Not.Rearrested 
## ---------------------- ------------ ----------------
##           no              0.3056         0.6944     
## 
##          yes              0.2222         0.7778     
## ----------------------------------------------------

Interpretation of Results


AIC and BIC Values


Statistical models
  Model 1 Model 2 Model 3 Model 4
(Intercept) -1.00*** -1.70*** -1.21** -1.01*
  (0.12) (0.39) (0.44) (0.45)
Raceother -0.23 -0.20 -0.18 -0.22
  (0.35) (0.35) (0.35) (0.35)
Marriagenot married   0.77 0.55 0.58
    (0.40) (0.41) (0.41)
Work.Experienceyes     -0.55* -0.55*
      (0.23) (0.23)
Financial.Assistanceyes       -0.46*
        (0.22)
AIC 69.92 67.65 63.68 61.41
BIC 71.33 69.78 66.51 64.95
Log Likelihood -32.96 -30.83 -27.84 -25.71
Deviance 30.99 26.73 20.75 16.49
Num. obs. 15 15 15 15
***p < 0.001; **p < 0.01; *p < 0.05

By looking at the table you can see that adding more predictors to the model improves the model’s fit because the AIC values and BIC values decrease as the numbers of predictors increase. So model 4 is the best fit model because it has the lowest AIC and BIC values compared to the other models.

Likelihood Ratio Test

## Analysis of Deviance Table
## 
## Model 1: cbind(Yes, No) ~ Race
## Model 2: cbind(Yes, No) ~ Race + Marriage
## Model 3: cbind(Yes, No) ~ Race + Marriage + Work.Experience
## Model 4: cbind(Yes, No) ~ Race + Marriage + Work.Experience + Financial.Assistance
##   Resid. Df Resid. Dev Df Deviance Pr(>Chi)  
## 1        13     30.991                       
## 2        12     26.728  1   4.2632  0.03895 *
## 3        11     20.750  1   5.9773  0.01449 *
## 4        10     16.487  1   4.2637  0.03893 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Since all the p-values are < 0.05, adding Marriage, Work Experience, and Financial Assistance significantly improves the model’s fit. So Model 4 would be the best fit model.

Interpretation of Logit Results (Model4)

## 
## Call:
## glm(formula = cbind(Yes, No) ~ Race + Marriage + Work.Experience + 
##     Financial.Assistance, family = binomial, data = Grouped)
## 
## Coefficients:
##                         Estimate Std. Error z value Pr(>|z|)  
## (Intercept)              -1.0114     0.4460  -2.268   0.0233 *
## Raceother                -0.2205     0.3547  -0.622   0.5341  
## Marriagenot married       0.5841     0.4140   1.411   0.1583  
## Work.Experienceyes       -0.5511     0.2280  -2.417   0.0156 *
## Financial.Assistanceyes  -0.4598     0.2239  -2.054   0.0400 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 31.440  on 14  degrees of freedom
## Residual deviance: 16.487  on 10  degrees of freedom
## AIC: 61.413
## 
## Number of Fisher Scoring iterations: 4

The Coefficient for Raceother (-0.2205) has a P value of 0.5341, which is greater than 0.05. This means that is no statistically significant relationship between Race and the likelihood of being re-arrested.

The Coefficient for Marriagenot married (0.2205) has a P value of 0.1583, which is greater than 0.05. This means that is no statistically significant relationship between Marriage and the likelihood of being re-arrested.

The P value for the Coefficient Work.Experienceyes is .0156, which is less than 0.05. This means that there is a statistically significant relationship between Work Experience and the likelihood of being re - arrested. The Coefficient for Work.Experienceyes is -0.2205, which means that having work experience reduces the likelihood of being re-arrested.

The P value for the Coefficient of Financial.Assistanceyes is 0.0400 which is less than 0.05. This means that there is a statistically significant relationship between Financial Assistance and the likelihood of being re-arrested. The Coefficient for Financial.Assistanceyes is -0.4598, which means that having financial assistance reduces the likelihood of being re-arrested.

Probabilities of the likelihood of being re-arrested based on Work Experience and Financial Assistance

## 
## -----------------------------------------------
##  Work.Experience   Rearrested   Not.Rearrested 
## ----------------- ------------ ----------------
##        no            0.3351         0.6649     
## 
##        yes           0.2105         0.7895     
## -----------------------------------------------

In my table, individuals with work experience have a lower probability of being rearrested compared to those without work experience.

## 
## ----------------------------------------------------
##  Financial.Assistance   Rearrested   Not.Rearrested 
## ---------------------- ------------ ----------------
##           no              0.3056         0.6944     
## 
##          yes              0.2222         0.7778     
## ----------------------------------------------------

In my table, individuals with financial assistance have a lower probability of being rearrested compared to those without financial assistance.

3. GitHUB Repository

https://github.com/KevinGregov5/KevinGregov5