HomeWork 8

2025-04-07

Introduction to DataSet

Overview of Dataset

The data set I will use to perform a logit regression is the Rossi dataset from the AER package. The data set contains information on 432 individuals released from prison and tracks whether they were re-arrested within a specific time period. Mass Incarceration is a serious problem in the United States and the incarceration rate is increasing rapidly as time passes. To add on in 2018 rearrest rates reached a historic high, with more than 76.6% of offenders re-offending and returning to prison. Previous studies have shown that many former convicts lack education and employment, placing them at a higher risk of rearrest. I want to see how factors like race, marriage status, work experience, and financial assistance contribute to mass incarceration and rearrest rates.

Key Variables I will analyze

- Binary Dependent Variable: Rearrested(1 = Yes, 0 = No)

- Independent Variable: race(black,other)

- Independent Variable: Marriage(not married, married)

- Independent Variable: Work Experience(yes, no)

- Independent Variable: Financial Assistance(yes, no)

Research Questions

- Does race influence the likelihood of being re-arrested?

- Does marriage status impact the likelihood of being re-arrested?

- Does having work experience impact the likelihood of being re-arrested?

- Does financial assistance impact the likelihood of being re-arrested?

Hypotheses

- 1. Does race influence the likelihood of being re-arrested?

- H₀ (Null Hypothesis): Race has no effect on the likelihood of being re-arrested.

- H₁ (Alternative Hypothesis): Race has a significant effect on the likelihood of being re-arrested.


- 2. Does marriage status impact the likelihood of being re-arrested?

- H₀: Marriage status has no effect on the likelihood of being re-arrested.

- H₁: Marriage status significantly affects the likelihood of being re-arrested.


- 3. Does having work experience impact the likelihood of being re-arrested?

- H₀: Work experience does not impact the likelihood of being re-arrested.

-H₁: Work experience does impact the likelihood of being re-arrested.


- 4. Does financial assistance impact the likelihood of being re-arrested?

- H₀: Receiving financial assistance does not impact the likelihood of being re-arrested.

- H₁: Receiving financial assistance does impact the likelihood of being re-arrested.

Statistical Analysis

- I will perform a binary logit regression to analyze the relationship between my variables.

- The binary logit regression model is appropriate for this analysis because my dependent variable is binary (rearrested: yes or no).

- I will conduct a likelihood ratio test and use the AIC and BIC to select the best model.

- I will obtain the probability of being re-arrested for each independent variable

Data Preparation

# Clear workspace
rm(list = ls())
# Import Data
library(AER)
data("Rossi")
# Create new data frame with variables of interest
library(dplyr) # Data manipulation
Rossi.clean <- select(Rossi, arrest, race, mar, wexp,fin) 
#Rename Variables
Rossi.clean <- rename(Rossi.clean, Rearrested = "arrest", Race = "race", Marriage = "mar", Work.Experience = "wexp", Financial.Assistance = "fin")
# Make sure structure of the data is correct 
glimpse(Rossi.clean)
## Rows: 432
## Columns: 5
## $ Rearrested           <int> 1, 1, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1, 1, 1…
## $ Race                 <fct> black, black, other, black, other, black, black, …
## $ Marriage             <fct> not married, not married, not married, married, n…
## $ Work.Experience      <fct> no, no, yes, yes, yes, yes, yes, yes, no, yes, no…
## $ Financial.Assistance <fct> no, no, no, yes, no, no, no, yes, no, no, yes, no…

Data Table

library(DT) # Data Table
datatable(Rossi.clean)

Descriptive Statistics

Univariate Analysis

library(modelsummary) # Data Tables/Model Summaries
datasummary_skim(Rossi.clean, output = "kableExtra", type = "categorical")
N %
Race black 379 87.7
other 53 12.3
Marriage married 53 12.3
not married 379 87.7
Work.Experience no 185 42.8
yes 247 57.2
Financial.Assistance no 216 50.0
yes 216 50.0

Key Takeaways

  • Majority of participants in the study are black (87.7%)
  • Majority of participants are not married (87.7%)
  • Majority of participants have work experience (57.2%)
  • Half of the participants receive financial assistance, while the other half do not.

Bivariate Analysis

# Cross-tabulation of categorical variables
# I want to get a better understanding of the relationship between financial assistance and work experience among participants of the study. 
datasummary_crosstab(Work.Experience ~ Financial.Assistance, data = Rossi.clean, output = "kableExtra")
Work.Experience no yes All
no N 93 92 185
% row 50.3 49.7 100.0
yes N 123 124 247
% row 49.8 50.2 100.0
All N 216 216 432
% row 50.0 50.0 100.0

Key Takeaways

  • Among participants without work experience, the distribution is nearly equal: 50.3% did not receive financial assistance and 49.7% did.
  • Among participants with work experience, the distribution is nearly equal: 49.8% without and 50.2% with financial assistance.
  • Overall, the distribution of financial assistance is balanced across work experience groups.

Binary Logit Regression

First I will Convert my individual-level raw data into grouped data

Grouped <- Rossi.clean %>%
    group_by(Race, Marriage, Work.Experience, Financial.Assistance) %>%
    summarise(total = n(), Yes = sum(Rearrested)) %>%
    mutate(No = total - Yes)
datatable(Grouped)

Grouped data can be beneficial for a binary logistic regression for a few reasons:

  • Improving Model Fit
  • Reducing Size
  • Reducing Variability

Binary Logistic Regression Models

models <- list(Model_1 = glm(formula = cbind(Yes, No) ~ Race, family = binomial, data = Grouped),
               Model_2 = glm(formula = cbind(Yes, No) ~ Race + Marriage, family = binomial, data = Grouped),
               Model_3 = glm(formula = cbind(Yes, No) ~ Race + Marriage + Work.Experience, family = binomial, data = Grouped), 
               Model_4 = glm(formula = cbind(Yes, No) ~ Race + Marriage + Work.Experience + Financial.Assistance, family = binomial, data = Grouped))
library(huxtable)
modelsummary(models, output = "huxtable", statistic = "p.value")
Model_1Model_2Model_3Model_4
(Intercept)-0.999-1.696-1.208-1.011
(<0.001)(<0.001)(0.006)(0.023)
Raceother-0.230-0.196-0.176-0.221
(0.509)(0.575)(0.617)(0.534)
Marriagenot married0.7720.5550.584
(0.054)(0.179)(0.158)
Work.Experienceyes-0.552-0.551
(0.015)(0.016)
Financial.Assistanceyes-0.460
(0.040)
Num.Obs.15151515
AIC69.967.763.761.4
BIC71.369.866.565.0
Log.Lik.-32.958-30.827-27.838-25.706
F0.4352.0553.3623.505
RMSE0.310.340.310.30

A lower AIC and BIC indicates a better model fit. By looking at the table you can see that adding more predictors to the model improves the model’s fit because the AIC values and BIC values decrease as the numbers of predictors increase. So model 4 is the best fit model because it has the lowest AIC and BIC values compared to the other models. If you look at model 4 you can see that work experience and financial assistance both have a significant impact on the likelihood of being rearrested because the p values are less than .05, while race and marriage status does not have a significant impact on the likelihood of being rearrested because the p value is greater than .05

Likelihood Ratio Test

anova(models$Model_1, models$Model_2, models$Model_3, models$Model_4, test = "Chisq")
## Analysis of Deviance Table
## 
## Model 1: cbind(Yes, No) ~ Race
## Model 2: cbind(Yes, No) ~ Race + Marriage
## Model 3: cbind(Yes, No) ~ Race + Marriage + Work.Experience
## Model 4: cbind(Yes, No) ~ Race + Marriage + Work.Experience + Financial.Assistance
##   Resid. Df Resid. Dev Df Deviance Pr(>Chi)  
## 1        13     30.991                       
## 2        12     26.728  1   4.2632  0.03895 *
## 3        11     20.750  1   5.9773  0.01449 *
## 4        10     16.487  1   4.2637  0.03893 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Create a table for the likelihood ratio test

anova.result <- anova(models$Model_1, models$Model_2, models$Model_3, models$Model_4, test = "Chisq")
library(knitr)
kable(anova.result, caption = "Likelihood Ratio Test")
Likelihood Ratio Test
Resid. Df Resid. Dev Df Deviance Pr(>Chi)
13 30.99092 NA NA NA
12 26.72769 1 4.263235 0.0389457
11 20.75038 1 5.977306 0.0144911
10 16.48664 1 4.263740 0.0389341

Since all the p-values are < 0.05, adding Marriage, Work Experience, and Financial Assistance significantly improves the model’s fit. So Model 4 would be the best fit model.

Obtaining the Probabilities of being re-arrested for significant variables

# Probability of the likelihood of being re-arrested based on Work Experience
Prob_Table <- Rossi.clean %>% 
    group_by(Work.Experience) %>% 
    summarise(Rearrested = mean(Rearrested)) %>% 
  mutate(Not.Rearrested = 1 - Rearrested)
#create table for results
kable(Prob_Table, caption = "Probabilities of being re-arrested based on Work Experience")
Probabilities of being re-arrested based on Work Experience
Work.Experience Rearrested Not.Rearrested
no 0.3351351 0.6648649
yes 0.2105263 0.7894737

Key Takeaways

  • In my table, former convicts with work experience have a lower probability of being rearrested compared to those without work experience.


# Probability of the likelihood of being re-arrested based on Financial Assistance
Prob_Table1 <- Rossi.clean %>% 
    group_by(Financial.Assistance) %>% 
    summarise(Rearrested = mean(Rearrested)) %>% 
  mutate(Not.Rearrested = 1 - Rearrested) 
#create table for results

kable(Prob_Table1, caption = "Probabilities of being re-arrested based on Financial Assistance")
Probabilities of being re-arrested based on Financial Assistance
Financial.Assistance Rearrested Not.Rearrested
no 0.3055556 0.6944444
yes 0.2222222 0.7777778

Key Takeaways

  • In my table, former convicts with financial assistance have a lower probability of being rearrested compared to those without financial assistance.

Conclusion

The results of my analysis indicated that race and marital status do not have a significant impact on the likelihood of re-arrest. However, work experience and financial assistance significantly reduce the likelihood of being re-arrested. This suggests that providing individuals with work experience and financial assistance can help reduce recidivism rates and improve their chances of reintegrating into society successfully.

References

(Esparza Flores 2018)

Esparza Flores, Nayely. 2018. “Contributing Factors to Mass Incarceration and Recidivism.” Themis: Research Journal of Justice Studies and Forensic Science 6 (1): 4.