Summary

This file contains a comparison between our two arrest CNNs (Baseline & Minority-Sampler) and the CNNs produced by Jim (MNV-2). I compare our model performance to that of the top 3 models from Jims experiment, which are labeled as such. The outline is as follows;

  1. Distribution Comparison: To get an understanding of the differences in learning performance I plot the different distributions of p_hat_cnn coming from the 5 models in this analysis:
  1. Baseline CNN: this is the CNN trained by Logan and Celia and has a standard ResNet-50 structure

  2. Minority-Class-Sampler: this is the latest CNN in which I implemented our data-sampler to correct the imbalance of our training data. This also has a ResNet-50 structure

  3. Top, Second, Third: these are the MNV-2 style networks Jim trained over different experiments, taking the top three performing models based on AUC

  1. Table 01 Config 1: Here I compare the regression output (in adj-R-sqrt and AUC) between our two ResNet-50 models and the three MNV-2 networks. I also include an ensamble model which inpts all five models in the regression.

Definitions

As a reminder (these are the same definitions as in previous itterations), here is a list of the regression terms. I split them into definitions for our models and our features:

Model Definitions

  1. Demographic LM: This includes sex and age_arrest to predict our arrest-outcome

  2. Charge Feature LM : This includes felony_flag, gun_crime_flag, drug_crime_flag, violent_crime_flag, property_crime_flag, arrest_year

  3. XgBoost risk: As the name would suggest, this is our XgBoost risk predictor using time-varying arrest history features

Feature Definitions

  1. MTurk Features: These are our high-detail (minimum of 6 workers per image) MTurk features together with their median

  2. Kitchen Sink: The final row of the table includes all previous rows, it is our fully stacked model with all covariates.

Comparing P_hat_CNN distributions

Below I plot 5 different distributions of p_hat_cnn for our different CNN output.

Table 01 - Configuration 01 - Comparing different CNN output

I repeat table 01 configuration 01 for our different CNNs. Each column is using the p_hat_cnn from the corresponding CNN. Thus, the column entitled MNV2-Top repeats the regressions with the predictions from the top performing MNV-2 CNN. The most RHS column entitled Ensamble Model makes use of all 5 models.

Note For definitions of our models and features see the definition section in the beginning

Table 01 - Version 01 - Model Comparison
Fit measured in adjusted R squared and AUC
Model Configuration ResNet-50 Baseline ResNet-50 Minority Sampler MNV-2 Top MNV-2 Second MNV-2 Third Ensable Model (ResNet + MNV2 Versions)
Adjusted R Squared ROC AUC Adjusted R Squared ROC AUC Adjusted R Squared ROC AUC Adjusted R Squared ROC AUC Adjusted R Squared ROC AUC Adjusted R Squared ROC AUC
Single Variable Model
Demographic LM 0.0101 0.5555 0.0101 0.5555 0.0101 0.5555 0.0101 0.5555 0.0101 0.5555 0.0101 0.5555
Lower 95% C.I. 0.0071 0.5415 0.0072 0.5415 0.0072 0.5415 0.0071 0.5415 0.0071 0.5415 0.0072 0.5415
Upper 95% C.I. 0.0136 0.5695 0.0137 0.5695 0.0136 0.5695 0.0138 0.5695 0.0136 0.5695 0.0135 0.5695
Charge Feature LM 0.0907 0.6909 0.0907 0.6909 0.0907 0.6909 0.0907 0.6909 0.0907 0.6909 0.0907 0.6909
0.0800 0.6776 0.0799 0.6776 0.0803 0.6776 0.0802 0.6776 0.0800 0.6776 0.0796 0.6776
0.1007 0.7042 0.1015 0.7042 0.1020 0.7042 0.1019 0.7042 0.1016 0.7042 0.1012 0.7042
XgBoost Risk 0.0334 0.6100 0.0334 0.6100 0.0334 0.6100 0.0334 0.6100 0.0334 0.6100 0.0334 0.6100
0.0270 0.5963 0.0264 0.5963 0.0273 0.5963 0.0270 0.5963 0.0268 0.5963 0.0272 0.5963
0.0409 0.6236 0.0405 0.6236 0.0410 0.6236 0.0410 0.6236 0.0411 0.6236 0.0405 0.6236
MTurk Features (Mean + Median) 0.0002 0.5344 0.0002 0.5344 0.0002 0.5344 0.0002 0.5344 0.0002 0.5344 0.0002 0.5344
0.0007 0.5201 0.0007 0.5201 0.0006 0.5201 0.0006 0.5201 0.0007 0.5201 0.0006 0.5201
0.0050 0.5487 0.0050 0.5487 0.0051 0.5487 0.0053 0.5487 0.0050 0.5487 0.0049 0.5487
P_hat_cnn 0.0327 0.6226 0.0333 0.6222 0.0212 0.6066 0.0226 0.6046 0.0190 0.5993 0.0424 0.6391
0.0266 0.6091 0.0277 0.6086 0.0163 0.5927 0.0175 0.5906 0.0145 0.5852 0.0364 0.6256
0.0396 0.6361 0.0398 0.6358 0.0270 0.6205 0.0279 0.6186 0.0239 0.6133 0.0497 0.6525
Combined Variable Model
Demographics + Charge Feature 0.0972 0.7010 0.0972 0.7010 0.0972 0.7010 0.0972 0.7010 0.0972 0.7010 0.0972 0.7010
0.0861 0.6880 0.0856 0.6880 0.0864 0.6880 0.0865 0.6880 0.0868 0.6880 0.0860 0.6880
0.1088 0.7140 0.1079 0.7140 0.1088 0.7140 0.1078 0.7140 0.1084 0.7140 0.1089 0.7140
Demographics + Charge Feature + Risk 0.1124 0.7177 0.1124 0.7177 0.1124 0.7177 0.1124 0.7177 0.1124 0.7177 0.1124 0.7177
0.1012 0.7050 0.1009 0.7050 0.1015 0.7050 0.1010 0.7050 0.1007 0.7050 0.1015 0.7050
0.1260 0.7304 0.1249 0.7304 0.1240 0.7304 0.1244 0.7304 0.1239 0.7304 0.1251 0.7304
Demographics + Charge Feature + Risk + MTurk (Mean + Median) 0.1125 0.7195 0.1125 0.7195 0.1125 0.7195 0.1125 0.7195 0.1125 0.7195 0.1125 0.7195
0.1028 0.7068 0.1032 0.7068 0.1039 0.7068 0.1039 0.7068 0.1024 0.7068 0.1035 0.7068
0.1269 0.7322 0.1265 0.7322 0.1271 0.7322 0.1272 0.7322 0.1271 0.7322 0.1265 0.7322
Demographics + Charge Feature + Risk + CNN 0.1217 0.7286 0.1231 0.7292 0.1191 0.7236 0.1187 0.7251 0.1161 0.7219 0.1259 0.7324
0.1103 0.7163 0.1119 0.7169 0.1076 0.7110 0.1076 0.7126 0.1042 0.7093 0.1153 0.7202
0.1339 0.7410 0.1351 0.7415 0.1314 0.7362 0.1304 0.7376 0.1274 0.7344 0.1386 0.7447
Kitchen Sink (all RHS variables included) 0.1221 0.7308 0.1230 0.7311 0.1194 0.7257 0.1188 0.7270 0.1166 0.7242 0.1260 0.7344
0.1128 0.7185 0.1143 0.7188 0.1095 0.7131 0.1097 0.7145 0.1079 0.7116 0.1170 0.7222
0.1369 0.7431 0.1377 0.7434 0.1331 0.7383 0.1339 0.7394 0.1307 0.7367 0.1412 0.7467

Visualizing R-sqrt Signal between model itterations