Algorithm’s predictive accuracy for judge decisions

Prediction distributions

Regression Tables

Here I re-run table 2 and table 3 from the current V2 of the faces paper. This includes each of the baseline , matched-0.1 and matched-0.2 models.

Does the model contain R2 above well-groomed ?

Detain outcome and P-hat CNN V3 controlling for WG
Dependent variable:
arrest_final_outcome
(1) (2) (3)
P-hat CNN V3 1.037*** 1.019***
(0.094) (0.094)
Well-Groomed -0.021*** -0.019***
(0.004) (0.004)
Constant -0.278*** 0.332*** -0.180***
(0.046) (0.021) (0.052)
Observations 9,495 9,495 9,495
Adjusted R2 0.013 0.002 0.014
Note: p<0.1; p<0.05; p<0.01

Table 2 - tab:HumanGuess - “How much of existing knowledge has the algorithm rediscovered?”

How much of existing knowledge has the algorithm rediscovered?
Dependent variable:
P-hat CNN Baseline P-hat CNN Matched-0.2 P-hat CNN Matched-0.3
(1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11) (12)
Male -0.119*** -0.117*** -0.115*** -0.115*** 0.008*** 0.010*** 0.009*** 0.009*** -0.009*** -0.007*** -0.007*** -0.007***
(0.002) (0.002) (0.003) (0.003) (0.002) (0.002) (0.002) (0.002) (0.001) (0.001) (0.001) (0.001)
Unknown-Gender -0.057 -0.060 -0.057 -0.055 0.030 0.043 0.041 0.041 -0.015 -0.005 -0.005 -0.005
(0.099) (0.099) (0.098) (0.098) (0.070) (0.070) (0.070) (0.070) (0.046) (0.045) (0.045) (0.045)
Age 0.001*** 0.0003*** 0.0003*** -0.0003*** -0.0002*** -0.0002*** 0.0001*** 0.0001** 0.0001**
(0.0001) (0.0001) (0.0001) (0.0001) (0.0001) (0.0001) (0.00004) (0.00004) (0.00004)
Black 0.004 0.003 0.003 -0.012*** -0.012*** -0.012*** -0.011*** -0.011*** -0.011***
(0.003) (0.003) (0.003) (0.002) (0.002) (0.002) (0.001) (0.001) (0.001)
Unknown-Race 0.013* 0.017** 0.017** -0.003 -0.004 -0.004 -0.012*** -0.012*** -0.012***
(0.007) (0.007) (0.007) (0.005) (0.005) (0.005) (0.003) (0.003) (0.003)
Asian -0.014 -0.013 -0.013 -0.014* -0.014* -0.014* -0.015*** -0.015*** -0.015***
(0.012) (0.012) (0.012) (0.008) (0.008) (0.008) (0.005) (0.005) (0.005)
Indian 0.012 0.019 0.019 -0.018 -0.019 -0.019 -0.010 -0.010 -0.010
(0.024) (0.024) (0.024) (0.017) (0.017) (0.017) (0.011) (0.011) (0.011)
Skin-tone -0.201*** -0.172*** -0.169*** -0.059* -0.061** -0.062** -0.121*** -0.119*** -0.119***
(0.044) (0.043) (0.043) (0.031) (0.031) (0.031) (0.020) (0.020) (0.020)
Attractiveness -0.006*** -0.006*** 0.003** 0.003** -0.001 -0.001
(0.002) (0.002) (0.001) (0.001) (0.001) (0.001)
Competence -0.009*** -0.009*** -0.001 -0.001 -0.001 -0.001
(0.002) (0.002) (0.001) (0.001) (0.001) (0.001)
Dominance 0.004*** 0.004*** -0.001 -0.001 0.0004 0.0004
(0.001) (0.001) (0.001) (0.001) (0.001) (0.001)
Trustworthiness -0.005*** -0.004*** 0.001 0.001 0.0003 0.0003
(0.002) (0.002) (0.001) (0.001) (0.001) (0.001)
Human Detention Guess 0.007** -0.0004 -0.0001
(0.003) (0.002) (0.002)
Constant 0.278*** 0.264*** 0.327*** 0.323*** 0.488*** 0.502*** 0.496*** 0.496*** 0.495*** 0.497*** 0.502*** 0.502***
(0.001) (0.003) (0.007) (0.008) (0.001) (0.002) (0.005) (0.005) (0.001) (0.001) (0.003) (0.003)
Observations 9,495 9,495 9,495 9,495 9,495 9,495 9,495 9,495 9,495 9,495 9,495 9,495
Adjusted R2 0.196 0.201 0.217 0.218 0.002 0.012 0.013 0.013 0.006 0.033 0.033 0.033
Note: p<0.1; p<0.05; p<0.01

Table 2b - tab:HumanGuess - With Well-Groomed - “How much of existing knowledge has the algorithm rediscovered?”

How much of existing knowledge has the algorithm rediscovered? - Including Well-Groomed
Dependent variable:
P-hat CNN Baseline P-hat CNN Matched-0.2 P-hat CNN Matched-0.3
(1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11) (12)
Male -0.119*** -0.118*** -0.117*** -0.117*** 0.008*** 0.010*** 0.009*** 0.009*** -0.009*** -0.007*** -0.007*** -0.007***
(0.002) (0.002) (0.002) (0.002) (0.002) (0.002) (0.002) (0.002) (0.001) (0.001) (0.001) (0.001)
Unknown-Gender -0.057 -0.073 -0.070 -0.069 0.030 0.044 0.041 0.041 -0.015 -0.007 -0.006 -0.007
(0.099) (0.097) (0.097) (0.097) (0.070) (0.070) (0.070) (0.070) (0.046) (0.045) (0.045) (0.045)
Age 0.0003*** 0.0002** 0.0002** -0.0003*** -0.0002*** -0.0002*** 0.0001** 0.0001** 0.0001**
(0.0001) (0.0001) (0.0001) (0.0001) (0.0001) (0.0001) (0.00004) (0.00004) (0.00004)
Black 0.002 0.002 0.002 -0.012*** -0.012*** -0.012*** -0.011*** -0.011*** -0.011***
(0.003) (0.003) (0.003) (0.002) (0.002) (0.002) (0.001) (0.001) (0.001)
Unknown-Race 0.020*** 0.020*** 0.020*** -0.004 -0.004 -0.004 -0.011*** -0.011*** -0.011***
(0.007) (0.007) (0.007) (0.005) (0.005) (0.005) (0.003) (0.003) (0.003)
Asian -0.009 -0.008 -0.008 -0.014* -0.014* -0.014* -0.014*** -0.014*** -0.014***
(0.012) (0.012) (0.012) (0.008) (0.008) (0.008) (0.005) (0.005) (0.005)
Indian 0.021 0.022 0.022 -0.018 -0.019 -0.019 -0.009 -0.009 -0.009
(0.024) (0.024) (0.024) (0.017) (0.017) (0.017) (0.011) (0.011) (0.011)
Skin-tone -0.195*** -0.185*** -0.183*** -0.059* -0.061** -0.061** -0.121*** -0.121*** -0.121***
(0.043) (0.043) (0.043) (0.031) (0.031) (0.031) (0.020) (0.020) (0.020)
Well-Groomed -0.018*** -0.016*** -0.016*** 0.001 0.001 0.0005 -0.002*** -0.002*** -0.002***
(0.001) (0.001) (0.001) (0.001) (0.001) (0.001) (0.0005) (0.001) (0.001)
Attractiveness 0.0002 0.0002 0.002** 0.002** -0.0002 -0.0002
(0.002) (0.002) (0.001) (0.001) (0.001) (0.001)
Competence -0.006*** -0.006*** -0.001 -0.001 -0.0005 -0.0005
(0.002) (0.002) (0.001) (0.001) (0.001) (0.001)
Dominance 0.004*** 0.004*** -0.001 -0.001 0.0004 0.0004
(0.001) (0.001) (0.001) (0.001) (0.001) (0.001)
Trustworthiness -0.003 -0.003 0.001 0.001 0.001 0.001
(0.002) (0.002) (0.001) (0.001) (0.001) (0.001)
Human Detention Guess 0.006* -0.0004 -0.0003
(0.003) (0.002) (0.002)
Constant 0.278*** 0.363*** 0.366*** 0.362*** 0.488*** 0.496*** 0.495*** 0.495*** 0.495*** 0.508*** 0.507*** 0.507***
(0.001) (0.006) (0.008) (0.008) (0.001) (0.004) (0.006) (0.006) (0.001) (0.003) (0.004) (0.004)
Observations 9,495 9,495 9,495 9,495 9,495 9,495 9,495 9,495 9,495 9,495 9,495 9,495
Adjusted R2 0.196 0.228 0.231 0.231 0.002 0.013 0.013 0.013 0.006 0.035 0.035 0.035
Note: p<0.1; p<0.05; p<0.01

Table 3 - tab:KnownBiases - “Is the additional variation the algorithm captures beyond known hypotheses reflecting signal or noise?”

Is the algorithm simply rediscovering known biases?
Dependent variable:
Detention outcome
(1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11) (12) (13)
Male -0.099*** -0.027** -0.097*** -0.109*** -0.026** -0.096*** -0.107*** -0.027** -0.093*** -0.104***
(0.011) (0.012) (0.010) (0.010) (0.012) (0.010) (0.011) (0.012) (0.011) (0.011)
Unknown Gender -0.266 -0.227 -0.260 -0.286 -0.227 -0.259 -0.284 -0.231 -0.262 -0.286
(0.420) (0.416) (0.419) (0.420) (0.416) (0.418) (0.420) (0.416) (0.418) (0.419)
Age -0.001 -0.001* -0.0003 0.00001 -0.001* -0.0003 -0.00000 -0.001** -0.001 -0.001
(0.0005) (0.0004) (0.0004) (0.0004) (0.0004) (0.0004) (0.0004) (0.0005) (0.0005) (0.0005)
White 0.022* 0.014 0.028*** 0.020** 0.022* 0.036*** 0.031*** 0.020* 0.033*** 0.028**
(0.011) (0.010) (0.010) (0.010) (0.011) (0.011) (0.011) (0.011) (0.011) (0.011)
Unknown Race 0.038 0.016 0.038 0.024 0.023 0.045 0.034 0.027 0.050* 0.040
(0.031) (0.030) (0.030) (0.030) (0.030) (0.030) (0.030) (0.030) (0.030) (0.030)
Asian -0.044 -0.047 -0.043 -0.054 -0.038 -0.033 -0.041 -0.036 -0.030 -0.038
(0.049) (0.048) (0.048) (0.048) (0.049) (0.049) (0.049) (0.049) (0.049) (0.049)
Indian 0.096 0.071 0.091 0.088 0.077 0.097 0.096 0.084 0.106 0.106
(0.102) (0.101) (0.102) (0.102) (0.101) (0.102) (0.102) (0.101) (0.102) (0.102)
Skin-tone -0.315* -0.235 -0.244 -0.338* -0.204 -0.194 -0.282
(0.185) (0.183) (0.184) (0.185) (0.183) (0.184) (0.184)
Attractiveness -0.002 0.001 -0.001 -0.003
(0.007) (0.007) (0.007) (0.007)
Competence -0.022*** -0.016** -0.021*** -0.021***
(0.007) (0.007) (0.007) (0.007)
Dominance 0.009* 0.007 0.009* 0.010*
(0.005) (0.005) (0.005) (0.005)
Trustworthiness -0.014* -0.011 -0.014** -0.014**
(0.007) (0.007) (0.007) (0.007)
P-hat baseline 0.695*** 0.660*** 0.657*** 0.631***
(0.038) (0.043) (0.043) (0.043)
P-hat matched-0.3 1.043*** 1.019*** 1.011*** 0.997***
(0.093) (0.094) (0.095) (0.095)
P-hat matched-0.2 0.465*** 0.508*** 0.506*** 0.511***
(0.062) (0.062) (0.062) (0.062)
Constant 0.375*** 0.058*** -0.280*** 0.005 0.094*** -0.246*** 0.002 0.099*** -0.237*** 0.010 0.176*** -0.127** 0.121***
(0.034) (0.011) (0.046) (0.030) (0.019) (0.049) (0.035) (0.020) (0.050) (0.035) (0.036) (0.058) (0.045)
Naive-AUC 0.579 0.624 0.58 0.554 0.625 0.602 0.59 0.625 0.603 0.591 0.63 0.61 0.601
Observations 9,603 9,603 9,603 9,603 9,603 9,603 9,603 9,603 9,603 9,603 9,603 9,603 9,603
Adjusted R2 0.014 0.033 0.013 0.006 0.033 0.021 0.017 0.034 0.022 0.017 0.035 0.025 0.021
Note: p<0.1; p<0.05; p<0.01

Table 3b - tab:KnownBiases - Including Well-Groomed - “Is the additional variation the algorithm captures beyond known hypotheses reflecting signal or noise?”

Is the algorithm simply rediscovering known biases?
Dependent variable:
Detention outcome
(1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11) (12) (13)
Male -0.101*** -0.030*** -0.099*** -0.111*** -0.030** -0.098*** -0.110*** -0.028** -0.094*** -0.105***
(0.011) (0.012) (0.010) (0.010) (0.012) (0.010) (0.011) (0.012) (0.011) (0.011)
Unknown Gender -0.278 -0.237 -0.276 -0.304 -0.237 -0.275 -0.303 -0.235 -0.272 -0.298
(0.420) (0.416) (0.418) (0.419) (0.416) (0.418) (0.419) (0.416) (0.418) (0.419)
Age -0.001 -0.001** -0.001* -0.001 -0.001** -0.001* -0.001 -0.001** -0.001* -0.001
(0.0005) (0.0004) (0.0004) (0.0004) (0.0004) (0.0004) (0.0004) (0.0005) (0.0005) (0.0005)
White 0.021* 0.013 0.025** 0.017* 0.021* 0.033*** 0.028** 0.020* 0.032*** 0.027**
(0.011) (0.010) (0.010) (0.010) (0.011) (0.011) (0.011) (0.011) (0.011) (0.011)
Unknown Race 0.041 0.020 0.045 0.032 0.027 0.051* 0.042 0.028 0.052* 0.043
(0.031) (0.030) (0.030) (0.030) (0.030) (0.030) (0.030) (0.030) (0.030) (0.030)
Asian -0.040 -0.045 -0.038 -0.049 -0.036 -0.029 -0.036 -0.034 -0.027 -0.034
(0.049) (0.048) (0.048) (0.048) (0.049) (0.049) (0.049) (0.049) (0.049) (0.049)
Indian 0.099 0.076 0.099 0.098 0.082 0.105 0.105 0.085 0.108 0.109
(0.102) (0.101) (0.102) (0.102) (0.101) (0.102) (0.102) (0.101) (0.102) (0.102)
Well-Groomed -0.014*** -0.011*** -0.021*** -0.023*** -0.011*** -0.021*** -0.023*** -0.004 -0.012** -0.014***
(0.005) (0.004) (0.004) (0.004) (0.004) (0.004) (0.004) (0.005) (0.005) (0.005)
Skin-tone -0.326* -0.236 -0.240 -0.331* -0.208 -0.205 -0.294
(0.185) (0.183) (0.184) (0.184) (0.183) (0.184) (0.184)
Attractiveness 0.003 0.002 0.003 0.002
(0.007) (0.007) (0.007) (0.007)
Competence -0.019*** -0.015** -0.019*** -0.019**
(0.007) (0.007) (0.007) (0.007)
Dominance 0.009* 0.007 0.009* 0.010*
(0.005) (0.005) (0.005) (0.005)
Trustworthiness -0.012* -0.010 -0.013* -0.012*
(0.007) (0.007) (0.007) (0.007)
P-hat baseline 0.695*** 0.640*** 0.637*** 0.627***
(0.038) (0.044) (0.044) (0.044)
P-hat matched-0.3 1.043*** 0.997*** 0.989*** 0.988***
(0.093) (0.094) (0.095) (0.095)
P-hat matched-0.2 0.465*** 0.513*** 0.510*** 0.512***
(0.062) (0.062) (0.062) (0.061)
Constant 0.411*** 0.058*** -0.280*** 0.005 0.164*** -0.118** 0.133*** 0.169*** -0.110* 0.141*** 0.189*** -0.092 0.157***
(0.036) (0.011) (0.046) (0.030) (0.033) (0.056) (0.042) (0.033) (0.057) (0.043) (0.039) (0.060) (0.047)
Naive-AUC 0.581 0.624 0.58 0.554 0.626 0.603 0.594 0.626 0.604 0.596 0.63 0.61 0.602
Observations 9,603 9,603 9,603 9,603 9,603 9,603 9,603 9,603 9,603 9,603 9,603 9,603 9,603
Adjusted R2 0.015 0.033 0.013 0.006 0.034 0.024 0.019 0.034 0.024 0.020 0.035 0.026 0.022
Note: p<0.1; p<0.05; p<0.01