This document contains a summary of our robustness checks:

  1. MTurk label regressions
  2. Non-linearity checks for p_hat_cnn
  3. Skin-tone regressions
  4. Further skin-tone sanity checks and graphs


\(\color{red}{\text{MTurk Label regressions}}\)

We now regress the individual, and combined, MTurk label on the release outcome. These are;

  1. Attractiveness
  2. Competence
  3. Dominance
  4. Trustworthiness

We split these regressions by: gender and race.

Combined Male-Female

Multihead(ResNet50)
Dependent variable:
Release Outcome
(1) (2) (3) (4) (5)
attractiveness 0.005 0.003
(-0.001, 0.012) (-0.006, 0.013)
competence 0.005 0.003
(-0.002, 0.013) (-0.009, 0.015)
dominance -0.002 -0.006
(-0.009, 0.006) (-0.014, 0.003)
trustworthiness 0.006 0.003
(-0.001, 0.012) (-0.008, 0.013)
Constant 0.737*** 0.735*** 0.770*** 0.735*** 0.745***
(0.706, 0.769) (0.697, 0.773) (0.732, 0.808) (0.702, 0.768) (0.700, 0.790)
Observations 8,479 8,479 8,479 8,479 8,479
Adjusted R2 0.0001 0.00004 -0.0001 0.0001 -0.0001
F Statistic 1.654 (df = 1; 8477) 1.351 (df = 1; 8477) 0.141 (df = 1; 8477) 1.893 (df = 1; 8477) 0.838 (df = 4; 8474)
Note: p<0.1; p<0.05; p<0.01

Subsample Female

Multihead(ResNet50)
Dependent variable:
Release Outcome
(1) (2) (3) (4) (5) (6)
attractiveness 0.019*** 0.016 0.013*
(0.007, 0.030) (-0.001, 0.033) (0.0005, 0.026)
competence 0.014* -0.010
(0.001, 0.027) (-0.032, 0.011)
dominance 0.021*** 0.016* 0.014
(0.008, 0.034) (0.001, 0.032) (-0.0004, 0.028)
trustworthiness 0.015** 0.004
(0.003, 0.027) (-0.016, 0.023)
Constant 0.753*** 0.772*** 0.741*** 0.770*** 0.720*** 0.710***
(0.696, 0.809) (0.704, 0.839) (0.676, 0.806) (0.710, 0.831) (0.643, 0.796) (0.639, 0.782)
Observations 1,833 1,833 1,833 1,833 1,833 1,833
Adjusted R2 0.003 0.001 0.003 0.002 0.004 0.004
F Statistic 7.370*** (df = 1; 1831) 3.162* (df = 1; 1831) 7.024*** (df = 1; 1831) 4.116** (df = 1; 1831) 2.639** (df = 4; 1828) 4.969*** (df = 2; 1830)
Note: p<0.1; p<0.05; p<0.01

Subsample Male

Multihead(ResNet50)
Dependent variable:
Release Outcome
(1) (2) (3) (4) (5)
attractiveness -0.001 -0.004
(-0.009, 0.007) (-0.015, 0.007)
competence 0.002 0.005
(-0.007, 0.011) (-0.008, 0.019)
dominance -0.001 -0.001
(-0.009, 0.008) (-0.011, 0.008)
trustworthiness 0.001 0.0003
(-0.007, 0.008) (-0.012, 0.013)
Constant 0.745*** 0.729*** 0.743*** 0.737*** 0.737***
(0.708, 0.782) (0.684, 0.774) (0.697, 0.789) (0.698, 0.775) (0.683, 0.791)
Observations 6,646 6,646 6,646 6,646 6,646
Adjusted R2 -0.0001 -0.0001 -0.0001 -0.0001 -0.001
F Statistic 0.072 (df = 1; 6644) 0.138 (df = 1; 6644) 0.019 (df = 1; 6644) 0.013 (df = 1; 6644) 0.162 (df = 4; 6641)
Note: p<0.1; p<0.05; p<0.01

Subsample Black

Multihead(ResNet50)
Dependent variable:
Release Outcome
(1) (2) (3) (4) (5)
attractiveness 0.001 0.001
(-0.008, 0.010) (-0.012, 0.015)
competence 0.003 0.007
(-0.007, 0.013) (-0.008, 0.023)
dominance -0.004 -0.007
(-0.014, 0.006) (-0.018, 0.004)
trustworthiness 0.001 -0.003
(-0.008, 0.010) (-0.017, 0.011)
Constant 0.755*** 0.746*** 0.781*** 0.756*** 0.767***
(0.712, 0.797) (0.695, 0.797) (0.729, 0.833) (0.712, 0.799) (0.705, 0.828)
Observations 4,768 4,768 4,768 4,768 4,768
Adjusted R2 -0.0002 -0.0002 -0.0001 -0.0002 -0.001
F Statistic 0.044 (df = 1; 4766) 0.222 (df = 1; 4766) 0.450 (df = 1; 4766) 0.025 (df = 1; 4766) 0.327 (df = 4; 4763)
Note: p<0.1; p<0.05; p<0.01

Subsample Not-Black

Multihead(ResNet50)
Dependent variable:
Release Outcome
(1) (2) (3) (4) (5)
attractiveness 0.010 0.006
(-0.0002, 0.019) (-0.009, 0.021)
competence 0.008 -0.002
(-0.003, 0.019) (-0.020, 0.016)
dominance 0.001 -0.005
(-0.009, 0.012) (-0.017, 0.008)
trustworthiness 0.011* 0.010
(0.001, 0.022) (-0.006, 0.026)
Constant 0.716*** 0.723*** 0.756*** 0.708*** 0.719***
(0.667, 0.766) (0.666, 0.781) (0.700, 0.813) (0.657, 0.759) (0.653, 0.786)
Observations 3,711 3,711 3,711 3,711 3,711
Adjusted R2 0.0004 0.0001 -0.0003 0.001 0.00004
F Statistic 2.602 (df = 1; 3709) 1.377 (df = 1; 3709) 0.046 (df = 1; 3709) 3.398* (df = 1; 3709) 1.033 (df = 4; 3706)
Note: p<0.1; p<0.05; p<0.01


\(\color{red}{\text{Non-linearity in `p_hat_cnn`}}\)

Decile Plots

Here I provide two types of plots for each of p_hat_cnn , p_hat_covariate, and risk_pred_prob:

  1. Decile Plot A - The max value in a decile vs. the mean arrest outcome in that decile
  2. Decile Plot B - The mean arrest outcome at each decile index

Decile Plot 1 - p_hat_cnn

Decile Plot 2 - p_hat_covariate

Decile Plot 3 - risk_pred_prob

Average decile value for p_hat_cnn

Here I fixed the average decile values for p_hat_cnn and we now see the regression coefficient becoming significant.

Multihead(ResNet50)
Dependent variable:
Release Outcome
(1) (2) (3) (4)
risk_pred_prob -1.105*** -1.106*** -0.778*** -0.716***
(-1.211, -0.998) (-1.214, -0.999) (-0.883, -0.673) (-0.821, -0.611)
skin_tonenumber_f7ddc4 0.002 -0.027 -0.041**
(-0.033, 0.038) (-0.061, 0.007) (-0.075, -0.007)
age -0.0003 0.0003 0.001
(-0.001, 0.001) (-0.001, 0.001) (-0.00003, 0.002)
attractiveness -0.002 0.001 0.002
(-0.012, 0.008) (-0.009, 0.010) (-0.008, 0.011)
competence 0.003 -0.001 -0.003
(-0.009, 0.015) (-0.013, 0.010) (-0.014, 0.008)
dominance 0.001 0.005 0.006
(-0.008, 0.009) (-0.003, 0.012) (-0.001, 0.014)
trustworthiness 0.001 0.001 -0.002
(-0.010, 0.011) (-0.009, 0.011) (-0.012, 0.009)
p_hat_covariates 1.083*** 1.007***
(1.018, 1.148) (0.941, 1.073)
p_hat_cnn_decile_avr 0.398***
(0.329, 0.466)
Constant 1.102*** 1.088*** 0.114** -0.155**
(1.069, 1.136) (1.017, 1.158) (0.024, 0.203) (-0.256, -0.055)
Observations 8,479 8,479 8,479 8,479
Adjusted R2 0.033 0.033 0.112 0.122
F Statistic 291.841*** (df = 1; 8477) 13.490*** (df = 23; 8455) 45.641*** (df = 24; 8454) 47.945*** (df = 25; 8453)
Note: p<0.1; p<0.05; p<0.01

Direct coding of deciles

Here I include integers 1-10 for the corresponding decile that the observation is in.

Multihead(ResNet50)
Dependent variable:
Release Outcome
(1) (2) (3) (4)
risk_pred_prob -1.105*** -1.106*** -0.778*** -0.718***
(-1.211, -0.998) (-1.214, -0.999) (-0.883, -0.673) (-0.823, -0.613)
skin_tonenumber_f7ddc4 0.002 -0.027 -0.041**
(-0.033, 0.038) (-0.061, 0.007) (-0.075, -0.007)
age -0.0003 0.0003 0.001
(-0.001, 0.001) (-0.001, 0.001) (-0.0001, 0.002)
attractiveness -0.002 0.001 0.002
(-0.012, 0.008) (-0.009, 0.010) (-0.008, 0.011)
competence 0.003 -0.001 -0.003
(-0.009, 0.015) (-0.013, 0.010) (-0.014, 0.009)
dominance 0.001 0.005 0.007
(-0.008, 0.009) (-0.003, 0.012) (-0.001, 0.015)
trustworthiness 0.001 0.001 -0.002
(-0.010, 0.011) (-0.009, 0.011) (-0.012, 0.008)
p_hat_covariates 1.083*** 1.005***
(1.018, 1.148) (0.939, 1.071)
p_hat_cnn_decile 0.015***
(0.013, 0.018)
Constant 1.102*** 1.088*** 0.114** 0.061
(1.069, 1.136) (1.017, 1.158) (0.024, 0.203) (-0.028, 0.150)
Observations 8,479 8,479 8,479 8,479
Adjusted R2 0.033 0.033 0.112 0.122
F Statistic 291.841*** (df = 1; 8477) 13.490*** (df = 23; 8455) 45.641*** (df = 24; 8454) 47.910*** (df = 25; 8453)
Note: p<0.1; p<0.05; p<0.01

Higher order

We now include two higher order terms of p_hat_cnn, none of which become significant.

Multihead(ResNet50)
Dependent variable:
Release Outcome
(1) (2) (3) (4) (5) (6)
risk_pred_prob -1.105*** -1.106*** -0.778*** -0.713*** -0.712*** -0.712***
(-1.211, -0.998) (-1.214, -0.999) (-0.883, -0.673) (-0.818, -0.608) (-0.817, -0.607) (-0.817, -0.607)
skin_tonenumber_f7ddc4 0.002 -0.027 -0.041** -0.040* -0.040*
(-0.033, 0.038) (-0.061, 0.007) (-0.075, -0.007) (-0.074, -0.006) (-0.074, -0.006)
age -0.0003 0.0003 0.001* 0.001* 0.001*
(-0.001, 0.001) (-0.001, 0.001) (0.00004, 0.002) (0.00002, 0.002) (0.00004, 0.002)
attractiveness -0.002 0.001 0.002 0.002 0.002
(-0.012, 0.008) (-0.009, 0.010) (-0.008, 0.012) (-0.008, 0.011) (-0.008, 0.011)
competence 0.003 -0.001 -0.003 -0.003 -0.003
(-0.009, 0.015) (-0.013, 0.010) (-0.014, 0.008) (-0.014, 0.008) (-0.014, 0.008)
dominance 0.001 0.005 0.006 0.007 0.007
(-0.008, 0.009) (-0.003, 0.012) (-0.001, 0.014) (-0.001, 0.015) (-0.001, 0.015)
trustworthiness 0.001 0.001 -0.002 -0.002 -0.002
(-0.010, 0.011) (-0.009, 0.011) (-0.012, 0.008) (-0.012, 0.008) (-0.012, 0.008)
p_hat_covariates 1.083*** 1.005*** 1.002*** 1.002***
(1.018, 1.148) (0.939, 1.071) (0.936, 1.068) (0.936, 1.068)
p_hat_cnn 0.403*** 0.019 1.417
(0.336, 0.470) (-0.642, 0.681) (-2.771, 5.605)
I(p_hat_cnn2) 0.265 -1.758
(-0.190, 0.720) (-7.762, 4.245)
I(p_hat_cnn3) 0.954
(-1.869, 3.778)
Constant 1.102*** 1.088*** 0.114** -0.161*** -0.023 -0.338
(1.069, 1.136) (1.017, 1.158) (0.024, 0.203) (-0.261, -0.061) (-0.280, 0.233) (-1.303, 0.627)
Observations 8,479 8,479 8,479 8,479 8,479 8,479
Adjusted R2 0.033 0.033 0.112 0.122 0.122 0.122
F Statistic 291.841*** (df = 1; 8477) 13.490*** (df = 23; 8455) 45.641*** (df = 24; 8454) 48.180*** (df = 25; 8453) 46.362*** (df = 26; 8452) 44.652*** (df = 27; 8451)
Note: p<0.1; p<0.05; p<0.01


\(\color{red}{\text{Bivariate MTurk Skin-tone Regressions}}\)

We now consider a set of regression of skin-tone and it’s categories on our release outcome. These regression include:

  1. All skin-tone labels
  2. Three categories based on skin-tone into dark, medium, and light skin tones
  3. The race label with categories (Asian, Black, White, Indian, Unsure, Other)

We find that:

Combined Male-Female

Multihead(ResNet50)
Dependent variable:
Release Outcome
(1) (2) (3)
skin_tonenumber_623a17 -0.002
(-0.036, 0.031)
skin_tonenumber_76441f 0.027
(-0.015, 0.068)
skin_tonenumber_80492a 0.034
(-0.004, 0.071)
skin_tonenumber_885633 -0.007
(-0.051, 0.038)
skin_tonenumber_94623d 0.061**
(0.018, 0.103)
skin_tonenumber_ab8b64 0.028
(-0.012, 0.067)
skin_tonenumber_b26949 0.041
(-0.011, 0.092)
skin_tonenumber_cb9662 0.061**
(0.018, 0.104)
skin_tonenumber_d09e7d 0.004
(-0.048, 0.056)
skin_tonenumber_e7bc91 0.012
(-0.053, 0.077)
skin_tonenumber_e9cba7 0.041
(-0.017, 0.099)
skin_tonenumber_ecc083 0.053*
(0.007, 0.098)
skin_tonenumber_eed0b8 0.039*
(0.00003, 0.078)
skin_tonenumber_efc088 0.035
(-0.035, 0.105)
skin_tonenumber_efc794 0.010
(-0.040, 0.061)
skin_tonenumber_f6e1aa 0.044*
(0.005, 0.083)
skin_tonenumber_f7ddc4 0.023
(-0.013, 0.059)
skin_tone_cat_light_skin 0.004
(-0.013, 0.021)
skin_tone_cat_medium_skin 0.021
(-0.0003, 0.042)
race_mturkblack 0.011
(-0.020, 0.041)
race_mturkcaucasian 0.019
(-0.013, 0.051)
race_mturkhispanic 0.028
(-0.012, 0.068)
race_mturkindian -0.025
(-0.096, 0.046)
race_mturkother -0.016
(-0.124, 0.093)
race_mturkunsure -0.028
(-0.112, 0.056)
Constant 0.736*** 0.756*** 0.749***
(0.710, 0.763) (0.745, 0.767) (0.720, 0.778)
Observations 8,479 8,479 8,479
Adjusted R2 0.0005 0.0001 -0.0003
F Statistic 1.226 (df = 17; 8461) 1.334 (df = 2; 8476) 0.614 (df = 6; 8472)
Note: p<0.1; p<0.05; p<0.01

Coefficient Plot - Regression 1 - Combined

Coefficient Plot - Regression 2 - Combined

Coefficient Plot - Regression 3 - Combined



Subsample Female

Multihead(ResNet50)
Dependent variable:
Release Outcome
(1) (2) (3)
skin_tonenumber_623a17 -0.029
(-0.110, 0.052)
skin_tonenumber_76441f -0.043
(-0.137, 0.050)
skin_tonenumber_80492a -0.029
(-0.113, 0.055)
skin_tonenumber_885633 -0.081
(-0.176, 0.014)
skin_tonenumber_94623d 0.013
(-0.077, 0.102)
skin_tonenumber_ab8b64 0.019
(-0.066, 0.104)
skin_tonenumber_b26949 -0.032
(-0.137, 0.073)
skin_tonenumber_cb9662 0.031
(-0.056, 0.117)
skin_tonenumber_d09e7d -0.042
(-0.142, 0.058)
skin_tonenumber_e7bc91 0.008
(-0.124, 0.140)
skin_tonenumber_e9cba7 -0.081
(-0.191, 0.029)
skin_tonenumber_ecc083 0.021
(-0.069, 0.111)
skin_tonenumber_eed0b8 -0.075
(-0.157, 0.007)
skin_tonenumber_efc088 -0.010
(-0.136, 0.116)
skin_tonenumber_efc794 -0.033
(-0.135, 0.068)
skin_tonenumber_f6e1aa -0.092*
(-0.170, -0.013)
skin_tonenumber_f7ddc4 -0.080*
(-0.154, -0.006)
skin_tone_cat_light_skin -0.045**
(-0.077, -0.014)
skin_tone_cat_medium_skin 0.016
(-0.022, 0.054)
race_mturkblack 0.009
(-0.044, 0.063)
race_mturkcaucasian -0.073**
(-0.128, -0.019)
race_mturkhispanic -0.029
(-0.098, 0.040)
race_mturkindian -0.119
(-0.276, 0.038)
race_mturkother 0.031
(-0.164, 0.226)
race_mturkunsure -0.069
(-0.211, 0.073)
Constant 0.881*** 0.859*** 0.869***
(0.816, 0.946) (0.835, 0.882) (0.819, 0.918)
Observations 1,833 1,833 1,833
Adjusted R2 0.004 0.004 0.008
F Statistic 1.419 (df = 17; 1815) 4.713*** (df = 2; 1830) 3.562*** (df = 6; 1826)
Note: p<0.1; p<0.05; p<0.01

Coefficient Plot - Regression 1 - Female

Coefficient Plot - Regression 2 - Female

Coefficient Plot - Regression 3 - Female

Subsample Male

Multihead(ResNet50)
Dependent variable:
Release Outcome
(1) (2) (3)
skin_tonenumber_623a17 -0.001
(-0.038, 0.036)
skin_tonenumber_76441f 0.032
(-0.014, 0.078)
skin_tonenumber_80492a 0.036
(-0.006, 0.077)
skin_tonenumber_885633 -0.004
(-0.055, 0.046)
skin_tonenumber_94623d 0.054*
(0.006, 0.103)
skin_tonenumber_ab8b64 0.012
(-0.033, 0.056)
skin_tonenumber_b26949 0.041
(-0.018, 0.100)
skin_tonenumber_cb9662 0.039
(-0.011, 0.089)
skin_tonenumber_d09e7d -0.009
(-0.069, 0.051)
skin_tonenumber_e7bc91 -0.002
(-0.076, 0.072)
skin_tonenumber_e9cba7 0.053
(-0.014, 0.121)
skin_tonenumber_ecc083 0.033
(-0.021, 0.086)
skin_tonenumber_eed0b8 0.049*
(0.004, 0.094)
skin_tonenumber_efc088 0.019
(-0.063, 0.102)
skin_tonenumber_efc794 0.001
(-0.058, 0.059)
skin_tonenumber_f6e1aa 0.060**
(0.014, 0.106)
skin_tonenumber_f7ddc4 0.021
(-0.021, 0.063)
skin_tone_cat_light_skin 0.008
(-0.012, 0.028)
skin_tone_cat_medium_skin 0.010
(-0.014, 0.035)
race_mturkblack 0.024
(-0.012, 0.060)
race_mturkcaucasian 0.046*
(0.007, 0.084)
race_mturkhispanic 0.045
(-0.002, 0.093)
race_mturkindian 0.009
(-0.071, 0.089)
race_mturkother -0.025
(-0.152, 0.102)
race_mturkunsure -0.016
(-0.116, 0.084)
Constant 0.716*** 0.734*** 0.711***
(0.687, 0.745) (0.722, 0.747) (0.677, 0.745)
Observations 6,646 6,646 6,646
Adjusted R2 -0.0001 -0.0002 0.0001
F Statistic 0.975 (df = 17; 6628) 0.337 (df = 2; 6643) 1.126 (df = 6; 6639)
Note: p<0.1; p<0.05; p<0.01

Coefficient Plot - Regression 1 - Male

Coefficient Plot - Regression 2 - Male

Coefficient Plot - Regression 3 - Combined



\(\color{red}{\text{Skin-tone sanity checks}}\)

Skin-tone and race relation (sanity check)

Release-rate and skin-tone

We now presents plots for the difference relationships of skin-tone w.r.t a set of MTurk labels. These are:

  1. Mean arrest rate
  2. Attractiveness
  3. Competence
  4. Dominance
  5. Trustworthiness