Sungji Peter Shin 3.17.2019
The original dataset is collected by the Department of Correction and stored in the official NYC Open Data website. It has been maintatined by Kaggle and updated on a daily basis. Each inmate record includes, but not limited to, information about their admission date, race, gender, age, in which level of custody they are located, etc. Using this dataset and a few regression models, what impacts their mental condition will be explored.
library(readr)
inmates <- read_csv("J:/SOC712/ny-daily-inmates-in-custody/daily-inmates-in-custody.csv", col_names = TRUE)
## Parsed with column specification:
## cols(
## INMATEID = col_double(),
## ADMITTED_DT = col_datetime(format = ""),
## DISCHARGED_DT = col_logical(),
## CUSTODY_LEVEL = col_character(),
## BRADH = col_character(),
## RACE = col_character(),
## GENDER = col_character(),
## AGE = col_double(),
## INMATE_STATUS_CODE = col_character(),
## SEALED = col_character(),
## SRG_FLG = col_character(),
## TOP_CHARGE = col_character(),
## INFRACTION = col_character()
## )
head(inmates)
library(dplyr)
library(tidyverse)
library(pander)
library(Zelig)
library(sjmisc)
library(knitr)
library(texreg)
library(interactions)
inmates %>%
count(GENDER, BRADH)
Since a majority of observations contain male inmates, the scope of this analysis is limited to the male inmates, thus the female inmates will be excluded from this analysis. A total of 7,472 male inmates will be used to develop regression models.
inmatesM <- inmates %>%
filter(GENDER == 'M')
inmatesF <- inmates %>%
filter(GENDER == 'F')
inmatesM %>%
count(BRADH)
length(inmatesM$AGE)
## [1] 7472
table(inmatesM$RACE)
##
## A B I O U W
## 131 4136 8 2329 19 849
inmatesM$BLACK <- fct_collapse(inmatesM$RACE,
BLACK = 'B',
OTHERS = c('A', 'I', 'O', 'U', 'W')
)
table(inmatesM$BLACK)
##
## OTHERS BLACK
## 3336 4136
table(inmatesM$CUSTODY_LEVEL)
##
## MAX MED MIN
## 2037 2987 2359
A dichotomous variable, BRADH, represents whether a inmate is under mental observation (no missing values). Race variable is recoded into the new dichotomous variable, BLACK, thus the subjects are classified as either Black or Others (no missing values). CUSTODY_LEVEL variable indicates in which level of custody each inmate is located and possible values are minimum, medium, and maximum (89 missing values).
inmatesM <- inmatesM %>%
mutate(mental = as.factor(BRADH)) %>%
mutate(ment = as.integer(mental))
head(inmatesM)
inmatesM <- inmatesM %>%
mutate(MENTAL_OBS = sjmisc::rec(ment, rec = "2=1; 1=0")) %>%
select(BLACK, CUSTODY_LEVEL, AGE, MENTAL_OBS)
inmatesM <- inmatesM %>%
mutate(INTB = as.factor(BLACK)) %>%
mutate(BLACK_int = as.integer(INTB)) %>%
mutate(BLACK_int = sjmisc::rec(BLACK_int, rec = '2=1; 1=0')) %>%
select(-INTB)
inmatesM <- inmatesM %>%
mutate(cust = as.factor(CUSTODY_LEVEL)) %>%
mutate(CUSTODY_int = as.integer(cust))
inmatesM <- inmatesM %>%
mutate(CUSTODY_int = sjmisc::rec(CUSTODY_int, rec = '1=3; 2=2; 3=1')) %>%
select(-cust)
head(inmatesM)
model1 <- glm(MENTAL_OBS ~ BLACK_int, family = 'binomial', data = inmatesM)
summary(model1)
##
## Call:
## glm(formula = MENTAL_OBS ~ BLACK_int, family = "binomial", data = inmatesM)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.071 -1.071 -1.071 1.288 1.288
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -0.2555788 0.0349098 -7.321 2.46e-13 ***
## BLACK_int -0.0001579 0.0469221 -0.003 0.997
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 10237 on 7471 degrees of freedom
## Residual deviance: 10237 on 7470 degrees of freedom
## AIC: 10241
##
## Number of Fisher Scoring iterations: 3
Model1 shwos that being BLACK does not have effect on whether the inmate is under the mental observation.
model2 <- glm(MENTAL_OBS ~ CUSTODY_int + BLACK_int, family = 'binomial', data = inmatesM)
summary(model2)
##
## Call:
## glm(formula = MENTAL_OBS ~ CUSTODY_int + BLACK_int, family = "binomial",
## data = inmatesM)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.2268 -1.0714 -0.9474 1.2627 1.4263
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -0.82581 0.06798 -12.148 <2e-16 ***
## CUSTODY_int 0.31377 0.03100 10.122 <2e-16 ***
## BLACK_int -0.05639 0.04785 -1.179 0.239
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 10130 on 7382 degrees of freedom
## Residual deviance: 10026 on 7380 degrees of freedom
## (89 observations deleted due to missingness)
## AIC: 10032
##
## Number of Fisher Scoring iterations: 4
Model2 shows that the chance of being under the mental observation decreases if the inmate is BLACK; however, the effect is not statistically significant. At the same time, the level of custody (CUSTODY_int) has a statistically significant impact on the dependent variable; as the level of custody increases by one (minimum to medium, or medium to maximum), the chance of being under the mental observation increases by 0.31377.
model3 <- glm(MENTAL_OBS ~ CUSTODY_int + BLACK_int * AGE, family = 'binomial', data = inmatesM)
summary(model3)
##
## Call:
## glm(formula = MENTAL_OBS ~ CUSTODY_int + BLACK_int * AGE, family = "binomial",
## data = inmatesM)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.3747 -1.0737 -0.9277 1.2405 1.5112
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -1.315799 0.152444 -8.631 < 2e-16 ***
## CUSTODY_int 0.371493 0.034682 10.711 < 2e-16 ***
## BLACK_int 0.060603 0.153376 0.395 0.692748
## AGE 0.010380 0.003154 3.292 0.000996 ***
## BLACK_int:AGE -0.003264 0.003999 -0.816 0.414363
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 10130 on 7382 degrees of freedom
## Residual deviance: 10010 on 7378 degrees of freedom
## (89 observations deleted due to missingness)
## AIC: 10020
##
## Number of Fisher Scoring iterations: 4
Similar to the first two models, being BLACK does not have a statistically significant impact (0.060603) while the level of custody has a statistically significant impact (0.371493) on the chance of being under the mental observation. While AGE itself has a statistically significant impact (0.010), the effect of interaction between BLACK and AGE does not (-0.003). The impact of AGE on the dependent variable decreases, but not significantly, if the inmate is BLACK.
model4 <- glm(MENTAL_OBS ~ BLACK_int + CUSTODY_int * AGE, family = 'binomial', data = inmatesM)
summary(model4)
##
## Call:
## glm(formula = MENTAL_OBS ~ BLACK_int + CUSTODY_int * AGE, family = "binomial",
## data = inmatesM)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.2977 -1.0735 -0.9253 1.2462 1.5494
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -1.475114 0.227712 -6.478 9.30e-11 ***
## BLACK_int -0.061188 0.047962 -1.276 0.20204
## CUSTODY_int 0.490197 0.103757 4.724 2.31e-06 ***
## AGE 0.014751 0.005616 2.627 0.00862 **
## CUSTODY_int:AGE -0.003419 0.002843 -1.203 0.22909
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 10130 on 7382 degrees of freedom
## Residual deviance: 10009 on 7378 degrees of freedom
## (89 observations deleted due to missingness)
## AIC: 10019
##
## Number of Fisher Scoring iterations: 4
Model4 creates the similar results: the level of custody (0.490197) and age (0.014751; but less significantly compared to model3) have statistically significant impact; while being BLACK does not have significant impact (-0.061188). Interaction effect between the level of custody and age (-0.003419) is not statistically significant that the impacts of age upon the chance of being under the mental observation in different levels of custody are alike.
anova(model2, model3, model4, test = 'Chisq')
table1 <- htmlreg(list(model1, model2, model3, model4), digits = 3, doctype = FALSE)
pander(table1)
| Model 1 | Model 2 | Model 3 | Model 4 | ||
|---|---|---|---|---|---|
| (Intercept) | -0.256*** | -0.826*** | -1.316*** | -1.475*** | |
| (0.035) | (0.068) | (0.152) | (0.228) | ||
| BLACK_int | -0.000 | -0.056 | 0.061 | -0.061 | |
| (0.047) | (0.048) | (0.153) | (0.048) | ||
| CUSTODY_int | 0.314*** | 0.371*** | 0.490*** | ||
| (0.031) | (0.035) | (0.104) | |||
| AGE | 0.010*** | 0.015** | |||
| (0.003) | (0.006) | ||||
| BLACK_int:AGE | -0.003 | ||||
| (0.004) | |||||
| CUSTODY_int:AGE | -0.003 | ||||
| (0.003) | |||||
| AIC | 10241.280 | 10031.968 | 10020.271 | 10019.491 | |
| BIC | 10255.117 | 10052.688 | 10054.805 | 10054.026 | |
| Log Likelihood | -5118.640 | -5012.984 | -5005.135 | -5004.746 | |
| Deviance | 10237.280 | 10025.968 | 10010.271 | 10009.491 | |
| Num. obs. | 7472 | 7383 | 7383 | 7383 | |
| p < 0.001, p < 0.01, p < 0.05 | |||||
Model1 does not have the same size compared to other three models, thus excluded from anova. According to the Likelihood Ratio Test, Model3 (MENTAL_OBS ~ CUSTODY_int + BLACK_int * AGE) fits the best. The values of AIC and BIC of model4 are the least but really close to those of model3. Cumulatively, I will say that model3 fits the best.
interact_plot(model3, pred = BLACK_int, modx = AGE, modx.values = c(20, 40, 60))
Compared to race of an inmate, age better determines the chance of whether the person is under the mental observation.
interact_plot(model4, pred = CUSTODY_int, modx = AGE, modx.values = c(20, 40, 60))
As the level of custody and age increase, the chance of being under the mental observation increases as well.
library(visreg)
visreg(model3, 'AGE', gg = TRUE, by = 'CUSTODY_int', scale = 'response', xlab = 'Age of Inmates', ylab = 'Under Mental Observation') + theme_bw()
visreg(model3, 'AGE', gg = TRUE, by = 'BLACK_int', scale = 'response', xlab = 'Age of Inmates', ylab = 'Under Mental Observation') + theme_bw()
As expected, older inmates and those who are located in higher level of custody have more likelihood of being under the mental observation. However, the interaction effects between race & age and the level of custody & age, are not statistically significant. Due to the nature of the dataset, we do not know whether the inmates who are under the mental observation have had any mental issues before or after they placed in the facility. Adding such information and total duration of incarceration (could have been calculated but the scope of this study did not include) to the dataset will render more significant results.