Sungji Peter Shin 3.17.2019

About Dataset: NY Daily Inmates in Custody

The original dataset is collected by the Department of Correction and stored in the official NYC Open Data website. It has been maintatined by Kaggle and updated on a daily basis. Each inmate record includes, but not limited to, information about their admission date, race, gender, age, in which level of custody they are located, etc. Using this dataset and a few regression models, what impacts their mental condition will be explored.

Hypotheses:

  1. Located in the increased level of custody will increase the chance of being under the mental observation;
  2. Older inmates will have more likelihood of being under the mental observation.
library(readr)
inmates <- read_csv("J:/SOC712/ny-daily-inmates-in-custody/daily-inmates-in-custody.csv", col_names = TRUE)
## Parsed with column specification:
## cols(
##   INMATEID = col_double(),
##   ADMITTED_DT = col_datetime(format = ""),
##   DISCHARGED_DT = col_logical(),
##   CUSTODY_LEVEL = col_character(),
##   BRADH = col_character(),
##   RACE = col_character(),
##   GENDER = col_character(),
##   AGE = col_double(),
##   INMATE_STATUS_CODE = col_character(),
##   SEALED = col_character(),
##   SRG_FLG = col_character(),
##   TOP_CHARGE = col_character(),
##   INFRACTION = col_character()
## )
head(inmates)
library(dplyr)
library(tidyverse)
library(pander)
library(Zelig)
library(sjmisc)
library(knitr)
library(texreg)
library(interactions)
inmates %>%
  count(GENDER, BRADH)

Since a majority of observations contain male inmates, the scope of this analysis is limited to the male inmates, thus the female inmates will be excluded from this analysis. A total of 7,472 male inmates will be used to develop regression models.

inmatesM <- inmates %>%
  filter(GENDER == 'M')
inmatesF <- inmates %>%
  filter(GENDER == 'F')

inmatesM %>%
  count(BRADH)
length(inmatesM$AGE)
## [1] 7472
table(inmatesM$RACE)
## 
##    A    B    I    O    U    W 
##  131 4136    8 2329   19  849
inmatesM$BLACK <- fct_collapse(inmatesM$RACE,
  BLACK = 'B',
  OTHERS = c('A', 'I', 'O', 'U', 'W')
  )
table(inmatesM$BLACK)
## 
## OTHERS  BLACK 
##   3336   4136
table(inmatesM$CUSTODY_LEVEL)
## 
##  MAX  MED  MIN 
## 2037 2987 2359

A dichotomous variable, BRADH, represents whether a inmate is under mental observation (no missing values). Race variable is recoded into the new dichotomous variable, BLACK, thus the subjects are classified as either Black or Others (no missing values). CUSTODY_LEVEL variable indicates in which level of custody each inmate is located and possible values are minimum, medium, and maximum (89 missing values).

inmatesM <- inmatesM %>%
  mutate(mental = as.factor(BRADH)) %>%
  mutate(ment = as.integer(mental))
head(inmatesM)
inmatesM <- inmatesM %>%
  mutate(MENTAL_OBS = sjmisc::rec(ment, rec = "2=1; 1=0")) %>%
  select(BLACK, CUSTODY_LEVEL, AGE, MENTAL_OBS)

inmatesM <- inmatesM %>%
  mutate(INTB = as.factor(BLACK)) %>%
  mutate(BLACK_int = as.integer(INTB)) %>%
  mutate(BLACK_int = sjmisc::rec(BLACK_int, rec = '2=1; 1=0')) %>%
  select(-INTB)

inmatesM <- inmatesM %>%
  mutate(cust = as.factor(CUSTODY_LEVEL)) %>%
  mutate(CUSTODY_int = as.integer(cust))
inmatesM <- inmatesM %>%
  mutate(CUSTODY_int = sjmisc::rec(CUSTODY_int, rec = '1=3; 2=2; 3=1')) %>%
  select(-cust)

head(inmatesM)

Modeling

Model 1: BLACK as an independent variable

model1 <- glm(MENTAL_OBS ~ BLACK_int, family = 'binomial', data = inmatesM)
summary(model1)
## 
## Call:
## glm(formula = MENTAL_OBS ~ BLACK_int, family = "binomial", data = inmatesM)
## 
## Deviance Residuals: 
##    Min      1Q  Median      3Q     Max  
## -1.071  -1.071  -1.071   1.288   1.288  
## 
## Coefficients:
##               Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -0.2555788  0.0349098  -7.321 2.46e-13 ***
## BLACK_int   -0.0001579  0.0469221  -0.003    0.997    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 10237  on 7471  degrees of freedom
## Residual deviance: 10237  on 7470  degrees of freedom
## AIC: 10241
## 
## Number of Fisher Scoring iterations: 3

Model1 shwos that being BLACK does not have effect on whether the inmate is under the mental observation.

Model 2: BLACK and CUSTODY LEVEL as independent variables

model2 <- glm(MENTAL_OBS ~ CUSTODY_int + BLACK_int, family = 'binomial', data = inmatesM)
summary(model2)
## 
## Call:
## glm(formula = MENTAL_OBS ~ CUSTODY_int + BLACK_int, family = "binomial", 
##     data = inmatesM)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.2268  -1.0714  -0.9474   1.2627   1.4263  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -0.82581    0.06798 -12.148   <2e-16 ***
## CUSTODY_int  0.31377    0.03100  10.122   <2e-16 ***
## BLACK_int   -0.05639    0.04785  -1.179    0.239    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 10130  on 7382  degrees of freedom
## Residual deviance: 10026  on 7380  degrees of freedom
##   (89 observations deleted due to missingness)
## AIC: 10032
## 
## Number of Fisher Scoring iterations: 4

Model2 shows that the chance of being under the mental observation decreases if the inmate is BLACK; however, the effect is not statistically significant. At the same time, the level of custody (CUSTODY_int) has a statistically significant impact on the dependent variable; as the level of custody increases by one (minimum to medium, or medium to maximum), the chance of being under the mental observation increases by 0.31377.

Model 3: CUSTODY LEVEL and interaction between BLACK and AGE as independent variables

model3 <- glm(MENTAL_OBS ~ CUSTODY_int + BLACK_int * AGE, family = 'binomial', data = inmatesM)
summary(model3)
## 
## Call:
## glm(formula = MENTAL_OBS ~ CUSTODY_int + BLACK_int * AGE, family = "binomial", 
##     data = inmatesM)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.3747  -1.0737  -0.9277   1.2405   1.5112  
## 
## Coefficients:
##                Estimate Std. Error z value Pr(>|z|)    
## (Intercept)   -1.315799   0.152444  -8.631  < 2e-16 ***
## CUSTODY_int    0.371493   0.034682  10.711  < 2e-16 ***
## BLACK_int      0.060603   0.153376   0.395 0.692748    
## AGE            0.010380   0.003154   3.292 0.000996 ***
## BLACK_int:AGE -0.003264   0.003999  -0.816 0.414363    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 10130  on 7382  degrees of freedom
## Residual deviance: 10010  on 7378  degrees of freedom
##   (89 observations deleted due to missingness)
## AIC: 10020
## 
## Number of Fisher Scoring iterations: 4

Similar to the first two models, being BLACK does not have a statistically significant impact (0.060603) while the level of custody has a statistically significant impact (0.371493) on the chance of being under the mental observation. While AGE itself has a statistically significant impact (0.010), the effect of interaction between BLACK and AGE does not (-0.003). The impact of AGE on the dependent variable decreases, but not significantly, if the inmate is BLACK.

Model 4: BLACK and interaction between CUSTODY LEVEL and AGE as independent variables

model4 <- glm(MENTAL_OBS ~ BLACK_int + CUSTODY_int * AGE, family = 'binomial', data = inmatesM)
summary(model4)
## 
## Call:
## glm(formula = MENTAL_OBS ~ BLACK_int + CUSTODY_int * AGE, family = "binomial", 
##     data = inmatesM)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.2977  -1.0735  -0.9253   1.2462   1.5494  
## 
## Coefficients:
##                  Estimate Std. Error z value Pr(>|z|)    
## (Intercept)     -1.475114   0.227712  -6.478 9.30e-11 ***
## BLACK_int       -0.061188   0.047962  -1.276  0.20204    
## CUSTODY_int      0.490197   0.103757   4.724 2.31e-06 ***
## AGE              0.014751   0.005616   2.627  0.00862 ** 
## CUSTODY_int:AGE -0.003419   0.002843  -1.203  0.22909    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 10130  on 7382  degrees of freedom
## Residual deviance: 10009  on 7378  degrees of freedom
##   (89 observations deleted due to missingness)
## AIC: 10019
## 
## Number of Fisher Scoring iterations: 4

Model4 creates the similar results: the level of custody (0.490197) and age (0.014751; but less significantly compared to model3) have statistically significant impact; while being BLACK does not have significant impact (-0.061188). Interaction effect between the level of custody and age (-0.003419) is not statistically significant that the impacts of age upon the chance of being under the mental observation in different levels of custody are alike.

Likelihood Ratio Test

anova(model2, model3, model4, test = 'Chisq')
table1 <- htmlreg(list(model1, model2, model3, model4), digits = 3, doctype = FALSE)
pander(table1)
Statistical models
Model 1 Model 2 Model 3 Model 4
(Intercept) -0.256*** -0.826*** -1.316*** -1.475***
(0.035) (0.068) (0.152) (0.228)
BLACK_int -0.000 -0.056 0.061 -0.061
(0.047) (0.048) (0.153) (0.048)
CUSTODY_int 0.314*** 0.371*** 0.490***
(0.031) (0.035) (0.104)
AGE 0.010*** 0.015**
(0.003) (0.006)
BLACK_int:AGE -0.003
(0.004)
CUSTODY_int:AGE -0.003
(0.003)
AIC 10241.280 10031.968 10020.271 10019.491
BIC 10255.117 10052.688 10054.805 10054.026
Log Likelihood -5118.640 -5012.984 -5005.135 -5004.746
Deviance 10237.280 10025.968 10010.271 10009.491
Num. obs. 7472 7383 7383 7383
p < 0.001, p < 0.01, p < 0.05

Model1 does not have the same size compared to other three models, thus excluded from anova. According to the Likelihood Ratio Test, Model3 (MENTAL_OBS ~ CUSTODY_int + BLACK_int * AGE) fits the best. The values of AIC and BIC of model4 are the least but really close to those of model3. Cumulatively, I will say that model3 fits the best.

Plotting two models with interaction effect

interact_plot(model3, pred = BLACK_int, modx = AGE, modx.values = c(20, 40, 60))

Compared to race of an inmate, age better determines the chance of whether the person is under the mental observation.

interact_plot(model4, pred = CUSTODY_int, modx = AGE, modx.values = c(20, 40, 60))

As the level of custody and age increase, the chance of being under the mental observation increases as well.

Visualization

library(visreg)
visreg(model3, 'AGE', gg = TRUE, by = 'CUSTODY_int', scale = 'response', xlab = 'Age of Inmates', ylab = 'Under Mental Observation') + theme_bw()

visreg(model3, 'AGE', gg = TRUE, by = 'BLACK_int', scale = 'response', xlab = 'Age of Inmates', ylab = 'Under Mental Observation') + theme_bw()

Conclusion

As expected, older inmates and those who are located in higher level of custody have more likelihood of being under the mental observation. However, the interaction effects between race & age and the level of custody & age, are not statistically significant. Due to the nature of the dataset, we do not know whether the inmates who are under the mental observation have had any mental issues before or after they placed in the facility. Adding such information and total duration of incarceration (could have been calculated but the scope of this study did not include) to the dataset will render more significant results.