Algorithm’s predictive accuracy for judge decisions

Overall it seems the model is learning to predict the arrest outcome very well. Remember that the outcome is was trained on is the baseline CNN’s prediction. This plot indicates that we are loosing almost no signal in this passthrough.

Checking the distributions of p-hat-cnn and it’s prediction:

Have we cancelled out Well-Groomed signal ? … Kind of

Plot Outline

Have we cancelled out Skin-tone signal ? … No

Plot Outline

Note Since we matched on p-hat-cnn quantiles, I’ve compiled a plot for each. If this matching had worked (I think) we should observe horizontal lines for the skin-tone predictions in each of these plots. Since at a p-hat-cnn quartile level there should be no more skin-tone variation. This is clearly not the case ! However this plot also makes clear that the CNN is being fed well-matched labels (in red).

Regressions of skin-tone and well-groomed on residualized cnn prediction

I am regressing skin-tone and well-groomed on the residualized cnn prediction, which is supposed to contain no signal from either one of these.

Here we can see that:

  • skin-tone is significant and captures a fairly large amount of variation in the algorithms predictions
  • well-groomed is also significant, but has a super tiny r-sqrt which is making me quite hopefull that it’s working better
P-hat CNN V6 controlling for skin-tone
Dependent variable:
Residualized Baseline Prediction
Skin-tone Well-groomed
(1) (2) (3)
skin_tone_cont 0.061*** 0.062***
(0.004) (0.004)
well_groomed 0.008*** 0.009***
(0.001) (0.001)
Constant 0.690*** 0.679*** 0.646***
(0.002) (0.006) (0.007)
Observations 5,966 5,961 5,961
Adjusted R2 0.034 0.005 0.041
Note: p<0.1; p<0.05; p<0.01