HomeWork 8

2025-04-07

Introduction to DataSet

Overview of Dataset

The data set I will use to perform a logit regression is the Rossi dataset from the AER package. The data set contains information on 432 individuals released from prison and tracks whether they were re-arrested within a specific time period. Mass Incarceration is a serious problem in the United States and the incarceration rate is increasing rapidly as time passes. To add on in 2018 rearrest rates reached a historic high, with more than 76.6% of offenders re-offending and returning to prison. Previous studies have shown that many former convicts lack education and employment, placing them at a higher risk of rearrest. I want to see how factors like race, marriage status, work experience, and financial assistance contribute to mass incarceration and rearrest rates.

Key Variables I will analyze

- Binary Dependent Variable: Rearrested(1 = Yes, 0 = No)

- Independent Variable: race(black,other)

- Independent Variable: Marriage(not married, married)

- Independent Variable: Work Experience(yes, no)

- Independent Variable: Financial Assistance(yes, no)

Research Questions

- Does race influence the likelihood of being re-arrested?

- Does marriage status impact the likelihood of being re-arrested?

- Does having work experience impact the likelihood of being re-arrested?

- Does financial assistance impact the likelihood of being re-arrested?

Hypotheses

- 1. Does race influence the likelihood of being re-arrested?

- H₀ (Null Hypothesis): Race has no effect on the likelihood of being re-arrested.

- H₁ (Alternative Hypothesis): Race has a significant effect on the likelihood of being re-arrested.

- 2. Does marriage status impact the likelihood of being re-arrested?

- H₀: Marriage status has no effect on the likelihood of being re-arrested.

- H₁: Marriage status significantly affects the likelihood of being re-arrested.

- 3. Does having work experience impact the likelihood of being re-arrested?

- H₀: Work experience does not impact the likelihood of being re-arrested.

-H₁: Work experience does impact the likelihood of being re-arrested.

- 4. Does financial assistance impact the likelihood of being re-arrested?

- H₀: Receiving financial assistance does not impact the likelihood of being re-arrested.

- H₁: Receiving financial assistance does impact the likelihood of being re-arrested.

Statistical Analysis

- I will perform a binary logit regression to analyze the relationship between my variables.

- The binary logit regression model is appropriate for this analysis because my dependent variable is binary (rearrested: yes or no).

- I will conduct a likelihood ratio test and use the AIC and BIC to select the best model.

- I will obtain the probability of being re-arrested for each independent variable

Data Preparation

# Clear workspace
rm(list = ls())
# Import Data
library(AER)
data("Rossi")

# Create new data frame with variables of interest
library(dplyr) # Data manipulation
Rossi.clean <- select(Rossi, arrest, race, mar, wexp,fin)

#Rename Variables
Rossi.clean <- rename(Rossi.clean, Rearrested = "arrest", Race = "race", Marriage = "mar", Work.Experience = "wexp", Financial.Assistance = "fin")

# Make sure structure of the data is correct 
glimpse(Rossi.clean)

## Rows: 432
## Columns: 5
## $ Rearrested           <int> 1, 1, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1, 1, 1…
## $ Race                 <fct> black, black, other, black, other, black, black, …
## $ Marriage             <fct> not married, not married, not married, married, n…
## $ Work.Experience      <fct> no, no, yes, yes, yes, yes, yes, yes, no, yes, no…
## $ Financial.Assistance <fct> no, no, no, yes, no, no, no, yes, no, no, yes, no…

Data Table

library(DT) # Data Table
datatable(Rossi.clean)

Descriptive Statistics

Univariate Analysis

library(modelsummary) # Data Tables/Model Summaries
datasummary_skim(Rossi.clean, output = "kableExtra", type = "categorical")

		N	%
Race	black	379	87.7
	other	53	12.3
Marriage	married	53	12.3
	not married	379	87.7
Work.Experience	no	185	42.8
	yes	247	57.2
Financial.Assistance	no	216	50.0
	yes	216	50.0

Key Takeaways

Majority of participants in the study are black (87.7%)
Majority of participants are not married (87.7%)
Majority of participants have work experience (57.2%)
Half of the participants receive financial assistance, while the other half do not.

Bivariate Analysis

# Cross-tabulation of categorical variables
# I want to get a better understanding of the relationship between financial assistance and work experience among participants of the study. 
datasummary_crosstab(Work.Experience ~ Financial.Assistance, data = Rossi.clean, output = "kableExtra")

Work.Experience		no	yes	All
no	N	93	92	185
	% row	50.3	49.7	100.0
yes	N	123	124	247
	% row	49.8	50.2	100.0
All	N	216	216	432
	% row	50.0	50.0	100.0

Key Takeaways

Among participants without work experience, the distribution is nearly equal: 50.3% did not receive financial assistance and 49.7% did.
Among participants with work experience, the distribution is nearly equal: 49.8% without and 50.2% with financial assistance.
Overall, the distribution of financial assistance is balanced across work experience groups.

Binary Logit Regression

First I will Convert my individual-level raw data into grouped data

Grouped <- Rossi.clean %>%
    group_by(Race, Marriage, Work.Experience, Financial.Assistance) %>%
    summarise(total = n(), Yes = sum(Rearrested)) %>%
    mutate(No = total - Yes)
datatable(Grouped)

Grouped data can be beneficial for a binary logistic regression for a few reasons:

Improving Model Fit
Reducing Size
Reducing Variability

Binary Logistic Regression Models

models <- list(Model_1 = glm(formula = cbind(Yes, No) ~ Race, family = binomial, data = Grouped),
               Model_2 = glm(formula = cbind(Yes, No) ~ Race + Marriage, family = binomial, data = Grouped),
               Model_3 = glm(formula = cbind(Yes, No) ~ Race + Marriage + Work.Experience, family = binomial, data = Grouped), 
               Model_4 = glm(formula = cbind(Yes, No) ~ Race + Marriage + Work.Experience + Financial.Assistance, family = binomial, data = Grouped))
library(huxtable)
modelsummary(models, output = "huxtable", statistic = "p.value")

	Model_1	Model_2	Model_3	Model_4
(Intercept)	-0.999	-1.696	-1.208	-1.011
	(<0.001)	(<0.001)	(0.006)	(0.023)
Raceother	-0.230	-0.196	-0.176	-0.221
	(0.509)	(0.575)	(0.617)	(0.534)
Marriagenot married		0.772	0.555	0.584
		(0.054)	(0.179)	(0.158)
Work.Experienceyes			-0.552	-0.551
			(0.015)	(0.016)
Financial.Assistanceyes				-0.460
				(0.040)
Num.Obs.	15	15	15	15
AIC	69.9	67.7	63.7	61.4
BIC	71.3	69.8	66.5	65.0
Log.Lik.	-32.958	-30.827	-27.838	-25.706
F	0.435	2.055	3.362	3.505
RMSE	0.31	0.34	0.31	0.30

A lower AIC and BIC indicates a better model fit. By looking at the table you can see that adding more predictors to the model improves the model’s fit because the AIC values and BIC values decrease as the numbers of predictors increase. So model 4 is the best fit model because it has the lowest AIC and BIC values compared to the other models. If you look at model 4 you can see that work experience and financial assistance both have a significant impact on the likelihood of being rearrested because the p values are less than .05, while race and marriage status does not have a significant impact on the likelihood of being rearrested because the p value is greater than .05

Likelihood Ratio Test

anova(models$Model_1, models$Model_2, models$Model_3, models$Model_4, test = "Chisq")

## Analysis of Deviance Table
## 
## Model 1: cbind(Yes, No) ~ Race
## Model 2: cbind(Yes, No) ~ Race + Marriage
## Model 3: cbind(Yes, No) ~ Race + Marriage + Work.Experience
## Model 4: cbind(Yes, No) ~ Race + Marriage + Work.Experience + Financial.Assistance
##   Resid. Df Resid. Dev Df Deviance Pr(>Chi)  
## 1        13     30.991                       
## 2        12     26.728  1   4.2632  0.03895 *
## 3        11     20.750  1   5.9773  0.01449 *
## 4        10     16.487  1   4.2637  0.03893 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Create a table for the likelihood ratio test

anova.result <- anova(models$Model_1, models$Model_2, models$Model_3, models$Model_4, test = "Chisq")
library(knitr)
kable(anova.result, caption = "Likelihood Ratio Test")

Likelihood Ratio Test
Resid. Df	Resid. Dev	Df	Deviance	Pr(>Chi)
13	30.99092	NA	NA	NA
12	26.72769	1	4.263235	0.0389457
11	20.75038	1	5.977306	0.0144911
10	16.48664	1	4.263740	0.0389341

Since all the p-values are < 0.05, adding Marriage, Work Experience, and Financial Assistance significantly improves the model’s fit. So Model 4 would be the best fit model.

Obtaining the Probabilities of being re-arrested for significant variables

# Probability of the likelihood of being re-arrested based on Work Experience
Prob_Table <- Rossi.clean %>% 
    group_by(Work.Experience) %>% 
    summarise(Rearrested = mean(Rearrested)) %>% 
  mutate(Not.Rearrested = 1 - Rearrested)
#create table for results
kable(Prob_Table, caption = "Probabilities of being re-arrested based on Work Experience")

Probabilities of being re-arrested based on Work Experience
Work.Experience	Rearrested	Not.Rearrested
no	0.3351351	0.6648649
yes	0.2105263	0.7894737

Key Takeaways

In my table, former convicts with work experience have a lower probability of being rearrested compared to those without work experience.

# Probability of the likelihood of being re-arrested based on Financial Assistance
Prob_Table1 <- Rossi.clean %>% 
    group_by(Financial.Assistance) %>% 
    summarise(Rearrested = mean(Rearrested)) %>% 
  mutate(Not.Rearrested = 1 - Rearrested) 
#create table for results

kable(Prob_Table1, caption = "Probabilities of being re-arrested based on Financial Assistance")

Probabilities of being re-arrested based on Financial Assistance
Financial.Assistance	Rearrested	Not.Rearrested
no	0.3055556	0.6944444
yes	0.2222222	0.7777778

Key Takeaways

In my table, former convicts with financial assistance have a lower probability of being rearrested compared to those without financial assistance.

Conclusion

The results of my analysis indicated that race and marital status do not have a significant impact on the likelihood of re-arrest. However, work experience and financial assistance significantly reduce the likelihood of being re-arrested. This suggests that providing individuals with work experience and financial assistance can help reduce recidivism rates and improve their chances of reintegrating into society successfully.

References

(Esparza Flores 2018)

Esparza Flores, Nayely. 2018. “Contributing Factors to Mass Incarceration and Recidivism.” Themis: Research Journal of Justice Studies and Forensic Science 6 (1): 4.