Predictions from matched singlehead models

Algorithm’s predictive accuracy for judge decisions

Overall it seems the model is learning to predict the arrest outcome very well. Remember that the outcome is was trained on is the baseline CNN’s prediction. This plot indicates that we are loosing almost no signal in this passthrough.

Checking the distributions of `p-hat-cnn` and it’s prediction:

Have we cancelled out `Well-Groomed` signal ? … Kind of

Plot Outline

I have plotted the CNN labels in red
Predictions are in blue

Have we cancelled out `Skin-tone` signal ? … No

Plot Outline

I have plotted the CNN labels in red
Predictions are in blue

Note Since we matched on p-hat-cnn quantiles, I’ve compiled a plot for each. If this matching had worked (I think) we should observe horizontal lines for the skin-tone predictions in each of these plots. Since at a p-hat-cnn quartile level there should be no more skin-tone variation. This is clearly not the case ! However this plot also makes clear that the CNN is being fed well-matched labels (in red).

Regressions of `skin-tone` and `well-groomed` on `residualized cnn prediction`

I am regressing skin-tone and well-groomed on the residualized cnn prediction, which is supposed to contain no signal from either one of these.

Here we can see that:

skin-tone is significant and captures a fairly large amount of variation in the algorithms predictions
well-groomed is also significant, but has a super tiny r-sqrt which is making me quite hopefull that it’s working better

**P-hat CNN V6 controlling for skin-tone**

	Dependent variable:

	Residualized Baseline Prediction
	Skin-tone	Well-groomed
	(1)	(2)	(3)

skin_tone_cont	0.061^***		0.062^***
	(0.004)		(0.004)
well_groomed		0.008^***	0.009^***
		(0.001)	(0.001)
Constant	0.690^***	0.679^***	0.646^***
	(0.002)	(0.006)	(0.007)

Observations	5,966	5,961	5,961
Adjusted R²	0.034	0.005	0.041

Note:	p<0.1; p<0.05; p<0.01

Predictions from matched singlehead models

Algorithm’s predictive accuracy for judge decisions

Checking the distributions of p-hat-cnn and it’s prediction:

Have we cancelled out Well-Groomed signal ? … Kind of

Have we cancelled out Skin-tone signal ? … No

Regressions of skin-tone and well-groomed on residualized cnn prediction

Checking the distributions of `p-hat-cnn` and it’s prediction:

Have we cancelled out `Well-Groomed` signal ? … Kind of

Have we cancelled out `Skin-tone` signal ? … No

Regressions of `skin-tone` and `well-groomed` on `residualized cnn prediction`