Arrest - GAN Pipeline Sanity Checks - Projection CNN Predictions

One Sentence Summary

I repeat our projector regressions with the full arrest validation set. We see that using the projected images vs. their real counterpart reduced the adjusted r-sqrt of p_hat_cnn from 0.033 to 0.0298 which is very promising

Overview

In this markdown I summarize the results from our updated arrest cnn prediction test. The goal is to see how different the models predictions for release are for real images and their respective projection through the GAN. As a reminder, projected images are those for which the GAN finds the closest approximate image in its own latent space w.r.t some real mugshot.

Here I take our entire validation set and produce projection-target image pairs, which I then pass through the CNN to get predictions. I then produce two summary plots together with out baseline table 01:

Plot 1 Comparing their respective p_hat_cnn distributions. Ideally these would align as closely as possible
Plot 2 Checking the correlation between p_hat_cnn_projection, the prediction for projected images, and p_hat_cnn_target, the cnn prediction for the real image counterpart. Here we want to see as close of a relationship as possible
Table 01 Configuration 1 I repeat the regressions for our baseline table 01 comparing between the p_hat_cnn values for projections and targets. This means I have two p_hat_cnn values, coming from the real mugshot and associated projection respectively. I compare their predictive power in two separate columns.

Definitions

There are two different types of images here:

Targets: These are the real mugshots from our validation set.
Projections: These are the GAN generated images which try to approximate the real targets as close as possible.

An ideal projector will produce generated images which are indistinguishable from their target counterpart.

Note The definitions for the terms used in table 01 are also above the table.

Table 01 - Regressions

NOTE I’m leaving out MTurk variables here since they’re not adding anything and blow up the table. The difference between column 7 and column 8 is the addition of p_hat_cnn_target and between column 7 and column 9 we isolate the p_hat_cnn_projection signal.

Note The p_hat_demo_charge variable is the combined Demo + Charge model in table 01 below.

**Multihead(ResNet50)**

	Dependent variable:

	Release Outcome
	(1)	(2)	(3)	(4)	(5)	(6)	(7)	(8)	(9)	(10)

p_hat_demo	1.271^***
	(1.029, 1.513)

p_hat_charge		1.222^***
		(1.146, 1.297)

p_hat_demo_charge			1.206^***				1.105^***	1.029^***	1.038^***	1.025^***
			(1.135, 1.278)				(1.033, 1.177)	(0.955, 1.102)	(0.965, 1.112)	(0.952, 1.099)

risk_pred_prob				-1.192^***			-0.850^***	-0.771^***	-0.776^***	-0.765^***
				(-1.312, -1.073)			(-0.967, -0.734)	(-0.888, -0.654)	(-0.893, -0.659)	(-0.882, -0.648)

p_hat_cnn_target					0.569^***			0.292^***		0.164^***
					(0.507, 0.630)			(0.232, 0.353)		(0.065, 0.263)

p_hat_cnn_proj						0.592^***			0.317^***	0.176^***
						(0.525, 0.659)			(0.251, 0.383)	(0.068, 0.284)

Constant	-0.243^**	-0.204^***	-0.193^***	1.123^***	0.326^***	0.290^***	0.149^***	-0.037	-0.072	-0.078
	(-0.434, -0.053)	(-0.264, -0.144)	(-0.250, -0.136)	(1.086, 1.161)	(0.279, 0.373)	(0.236, 0.343)	(0.076, 0.222)	(-0.119, 0.046)	(-0.158, 0.014)	(-0.164, 0.008)


Observations	6,782	6,782	6,782	6,782	6,782	6,782	6,782	6,782	6,782	6,782
Adjusted R²	0.011	0.095	0.102	0.038	0.033	0.030	0.120	0.128	0.128	0.129
F Statistic	74.608^*** (df = 1; 6780)	709.641^*** (df = 1; 6780)	768.396^*** (df = 1; 6780)	269.594^*** (df = 1; 6780)	232.596^*** (df = 1; 6780)	209.470^*** (df = 1; 6780)	464.509^*** (df = 2; 6779)	333.500^*** (df = 3; 6778)	333.418^*** (df = 3; 6778)	252.150^*** (df = 4; 6777)

Note:	p<0.1; p<0.05; p<0.01

Table 01 - Configuration 01 - Comparing CNN output for Projections and Targets

I repeat our baseline table 01 and compare the predictive power of the p_hat_cnn values from our projections and targets. Below is an outline of the definitions:

Definitions for Table 01

As a reminder (these are the same definitions as in previous iterations), here is a list of the regression terms. I split them into definitions for our models and our features:

Model Definitions

Demographic LM: This includes sex and age_arrest to predict our arrest-outcome
Charge Feature LM : This includes felony_flag, gun_crime_flag, drug_crime_flag, violent_crime_flag, property_crime_flag, arrest_year
XgBoost risk: As the name would suggest, this is our XgBoost risk predictor using time-varying arrest history features

Feature Definitions

MTurk Features: These are our high-detail (minimum of 6 workers per image) MTurk features together with their median
Kitchen Sink: The final row of the table includes all previous rows, it is our fully stacked model with all covariates.

NOTE The column titled P_hat_Targets uses the p_hat_cnn of the real mugshot targets. The column titled P_hat_Projections, as the name would suggest, uses the p_hat_cnn from the corresponding GAN projections.

Table 01 - Version 01 - Model Comparison
Fit measured in adjusted R squared and AUC
Model Configuration	P_hat_Target		P_hat_Projection		P_hat_Target + P_hat_Projection
Model Configuration	Adjusted R Squared	ROC AUC	Adjusted R Squared	ROC AUC	Adjusted R Squared	ROC AUC
Single Variable Model
Demographic LM	0.0107	0.5571	0.0107	0.5571	0.0107	0.5571
Lower 95% C.I.	0.0075	0.5416	0.0073	0.5416	0.0072	0.5416
Upper 95% C.I.	0.0146	0.5726	0.0147	0.5726	0.0147	0.5726
Charge Feature LM	0.0946	0.6949	0.0946	0.6949	0.0946	0.6949
	0.0826	0.6802	0.0823	0.6802	0.0832	0.6802
	0.1070	0.7095	0.1076	0.7095	0.1065	0.7095
XgBoost Risk	0.0381	0.6167	0.0381	0.6167	0.0381	0.6167
	0.0301	0.6016	0.0305	0.6016	0.0307	0.6016
	0.0472	0.6319	0.0474	0.6319	0.0474	0.6319
MTurk Features (Mean + Median)	0.0002	0.5382	0.0002	0.5382	0.0002	0.5382
	0.0008	0.5223	0.0008	0.5223	0.0007	0.5223
	0.0063	0.5540	0.0065	0.5540	0.0063	0.5540
P_hat_cnn	0.0330	0.6240	0.0298	0.6147	0.0349	0.6258
	0.0262	0.6091	0.0231	0.5995	0.0284	0.6108
	0.0401	0.6390	0.0363	0.6299	0.0426	0.6408
Combined Variable Model
Demographics + Charge Feature	0.1017	0.7053	0.1017	0.7053	0.1017	0.7053
	0.0900	0.6910	0.0889	0.6910	0.0899	0.6910
	0.1148	0.7196	0.1148	0.7196	0.1145	0.7196
Demographics + Charge Feature + Risk	0.1203	0.7232	0.1203	0.7232	0.1203	0.7232
	0.1075	0.7092	0.1074	0.7092	0.1071	0.7092
	0.1344	0.7371	0.1333	0.7371	0.1342	0.7371
Demographics + Charge Feature + Risk + MTurk (Mean + Median)	0.1203	0.7245	0.1203	0.7245	0.1203	0.7245
	0.1093	0.7106	0.1089	0.7106	0.1099	0.7106
	0.1360	0.7385	0.1363	0.7385	0.1367	0.7385
Demographics + Charge Feature + Risk + CNN	0.1282	0.7325	0.1282	0.7326	0.1290	0.7336
	0.1162	0.7188	0.1153	0.7189	0.1162	0.7199
	0.1420	0.7461	0.1431	0.7462	0.1428	0.7472
Kitchen Sink (all RHS variables included)	0.1299	0.7357	0.1294	0.7354	0.1307	0.7369
	0.1200	0.7221	0.1189	0.7217	0.1210	0.7233
	0.1468	0.7493	0.1453	0.7490	0.1481	0.7504