Background

Colorism is a widely claimed but poorly supported model for racial inequality in social outcomes. The model is that people are racist and they rely on skin tone (/color) as a proxy. The discrimination people with darker skin face results in a variety of worse social outcomes. For instance, Marira, T. D., & Mitra, P. (2013). Colorism: Ubiquitous yet understudied. Industrial and Organizational Psychology, 6(1), 103-107.:

Why Should I–O Psychology Be Concerned With Colorism? I–O psychologists should be concerned with the issue of colorism because the phenomenon has implications that are capable of cutting across categories such as race, religion, gender, age, sexuality, nationality, and occupation. That is to say, extant psychological research contains evidence for the preference and undue favoritism of lighter skin complexions among Black, White, Latino, and Asian populations from around the world (Glenn, 2009). … However, as stated previously, colorism does not simply affect African Americans and Latinos; rather, it is a global phenomenon that consistently privileges lighter skin tones over darker ones (Glenn, 2008). Thus, the pervasiveness of this form of discrimination and its impact on workplace and labor market related outcomes, both in the United States and abroad, dictate that I–O psychologists become more acquainted with this form of discrimination.

We use data from a recent study (de Franca et al 2017) to test this model.

Start

library(pacman)
p_load(kirkegaard, haven, rms, sjstats)
options(contrasts = rep("contr.treatment", 2))

Data and recode

d = read_stata("data/PONE-D-17-02552R.dta")

#recode
d$sex = d$sex %>% plyr::mapvalues(1:2, c("Male", "Female")) %>% factor() %>% fct_relevel("Male")

#ordinals
quintile_labels = c("0-20%", "20-40%", "40-60%", "60-80%", "80-100%")
quintile_recode = function(x) {
  plyr::mapvalues(x, 1:5, quintile_labels) %>% ordered()
}
d$african = quintile_recode(d$afr_q5)
d$amerindian = quintile_recode(d$nat_q5)
d$european = quintile_recode(d$eur_q5)
d$education = ordered(d$education4)
d$skin_tone = d$skin_colour %>% plyr::mapvalues(0:2, c("White", "Mixed", "Black")) %>% ordered(levels = c("White", "Mixed", "Black"))

#intervals
d$african_num = d$afr_q5
d$amerindian_num = d$nat_q5
d$european_num = d$eur_q5
d$skin_tone_num = d$skin_colour

Simple

Simplest approach is just to plot the bivariate associations.

GG_group_means(d, "education4", "african") +
  xlab("African ancestry") +
  ylab("Education attainment (average ordinal)")
## Missing values were removed.

GG_save("figs/edu_afri.png")

GG_group_means(d, "education4", "amerindian") +
  xlab("Amerindian ancestry") +
  ylab("Education attainment (average ordinal)")
## Missing values were removed.

GG_save("figs/edu_amer.png")

GG_group_means(d, "education4", "european") +
  xlab("European ancestry") +
  ylab("Education attainment (average ordinal)")
## Missing values were removed.

GG_save("figs/edu_euro.png")

Modeling

Since both the predictors and the outcome are ordinal in nature, the best analytic approach is ordinal regression coding the predictors as ordinals as well.

#full ordinal model
lrm(education ~ sex + skin_tone, data = d)
## Frequencies of Missing Values Due to Each Variable
## education       sex skin_tone 
##        38         0       295 
## 
## Logistic Regression Model
##  
##  lrm(formula = education ~ sex + skin_tone, data = d)
##  
##  
##  Frequencies of Responses
##  
##     0    1    2    3 
##   586 1202  785  595 
##  
##  
##                        Model Likelihood     Discrimination    Rank Discrim.    
##                           Ratio Test           Indexes           Indexes       
##  Obs          3168    LR chi2     156.49    R2       0.052    C       0.591    
##  max |deriv| 4e-12    d.f.             3    g        0.427    Dxy     0.182    
##                       Pr(> chi2) <0.0001    gr       1.532    gamma   0.259    
##                                             gp       0.100    tau-a   0.132    
##                                             Brier    0.236                     
##  
##                  Coef    S.E.   Wald Z Pr(>|Z|)
##  y>=1             1.5794 0.0604  26.16 <0.0001 
##  y>=2            -0.2233 0.0518  -4.31 <0.0001 
##  y>=3            -1.4691 0.0592 -24.83 <0.0001 
##  sex=Female       0.3455 0.0647   5.34 <0.0001 
##  skin_tone=Mixed -0.8106 0.0887  -9.14 <0.0001 
##  skin_tone=Black -0.9028 0.1157  -7.81 <0.0001 
## 
lrm(education ~ sex + african + amerindian, data = d)
## Frequencies of Missing Values Due to Each Variable
##  education        sex    african amerindian 
##         38          0        603        603 
## 
## Logistic Regression Model
##  
##  lrm(formula = education ~ sex + african + amerindian, data = d)
##  
##  
##  Frequencies of Responses
##  
##     0    1    2    3 
##   532 1088  729  518 
##  
##  
##                        Model Likelihood     Discrimination    Rank Discrim.    
##                           Ratio Test           Indexes           Indexes       
##  Obs          2867    LR chi2     254.81    R2       0.091    C       0.629    
##  max |deriv| 1e-10    d.f.             9    g        0.637    Dxy     0.258    
##                       Pr(> chi2) <0.0001    gr       1.891    gamma   0.264    
##                                             gp       0.150    tau-a   0.187    
##                                             Brier    0.231                     
##  
##                     Coef    S.E.   Wald Z Pr(>|Z|)
##  y>=1                2.2727 0.1074  21.16 <0.0001 
##  y>=2                0.4367 0.0994   4.39 <0.0001 
##  y>=3               -0.8959 0.1008  -8.89 <0.0001 
##  sex=Female          0.4014 0.0686   5.85 <0.0001 
##  african=20-40%     -0.2694 0.1118  -2.41 0.0160  
##  african=40-60%     -0.4354 0.1155  -3.77 0.0002  
##  african=60-80%     -0.8871 0.1182  -7.51 <0.0001 
##  african=80-100%    -1.1595 0.1150 -10.08 <0.0001 
##  amerindian=20-40%  -0.2834 0.1100  -2.58 0.0100  
##  amerindian=40-60%  -0.3545 0.1130  -3.14 0.0017  
##  amerindian=60-80%  -0.5087 0.1148  -4.43 <0.0001 
##  amerindian=80-100% -0.5877 0.1184  -4.96 <0.0001 
## 
lrm(education ~ sex + african + amerindian + skin_tone, data = d)
## Frequencies of Missing Values Due to Each Variable
##  education        sex    african amerindian  skin_tone 
##         38          0        603        603        295 
## 
## Logistic Regression Model
##  
##  lrm(formula = education ~ sex + african + amerindian + skin_tone, 
##      data = d)
##  
##  
##  Frequencies of Responses
##  
##     0    1    2    3 
##   532 1088  729  518 
##  
##  
##                        Model Likelihood     Discrimination    Rank Discrim.    
##                           Ratio Test           Indexes           Indexes       
##  Obs          2867    LR chi2     261.88    R2       0.094    C       0.631    
##  max |deriv| 2e-10    d.f.            11    g        0.646    Dxy     0.262    
##                       Pr(> chi2) <0.0001    gr       1.908    gamma   0.267    
##                                             gp       0.151    tau-a   0.189    
##                                             Brier    0.231                     
##  
##                     Coef    S.E.   Wald Z Pr(>|Z|)
##  y>=1                2.2892 0.1077 21.26  <0.0001 
##  y>=2                0.4492 0.0996  4.51  <0.0001 
##  y>=3               -0.8852 0.1009 -8.77  <0.0001 
##  sex=Female          0.3970 0.0686  5.78  <0.0001 
##  african=20-40%     -0.2687 0.1119 -2.40  0.0163  
##  african=40-60%     -0.4314 0.1156 -3.73  0.0002  
##  african=60-80%     -0.8203 0.1210 -6.78  <0.0001 
##  african=80-100%    -0.9202 0.1829 -5.03  <0.0001 
##  amerindian=20-40%  -0.2848 0.1100 -2.59  0.0097  
##  amerindian=40-60%  -0.3546 0.1130 -3.14  0.0017  
##  amerindian=60-80%  -0.5185 0.1149 -4.51  <0.0001 
##  amerindian=80-100% -0.5755 0.1186 -4.85  <0.0001 
##  skin_tone=Mixed    -0.2450 0.1730 -1.42  0.1567  
##  skin_tone=Black    -0.3751 0.1416 -2.65  0.0081  
## 
#detectable validity from skin tone?
lrtest(
  lrm(education ~ sex + african + amerindian, data = d),
  lrm(education ~ sex + african + amerindian + skin_tone, data = d)
  )
## 
## Model 1: education ~ sex + african + amerindian
## Model 2: education ~ sex + african + amerindian + skin_tone
## 
## L.R. Chisq       d.f.          P 
## 7.06326421 2.00000000 0.02925713
#estimate out of sample R2
set.seed(1)
validate(lrm(education ~ sex + african + amerindian, data = d, x = T, y = T), B = 1000)
##           index.orig training    test optimism index.corrected    n
## Dxy           0.2582   0.2624  0.2541   0.0083          0.2498 1000
## R2            0.0913   0.0953  0.0887   0.0066          0.0847 1000
## Intercept     0.0000   0.0000 -0.0095   0.0095         -0.0095 1000
## Slope         1.0000   1.0000  0.9647   0.0353          0.9647 1000
## Emax          0.0000   0.0000  0.0095   0.0095          0.0095 1000
## D             0.0885   0.0926  0.0859   0.0068          0.0817 1000
## U            -0.0007  -0.0007 -1.2773   1.2766         -1.2773 1000
## Q             0.0892   0.0933  1.3631  -1.2698          1.3590 1000
## B             0.2313   0.2306  0.2319  -0.0014          0.2327 1000
## g             0.6371   0.6504  0.6256   0.0249          0.6122 1000
## gp            0.1496   0.1523  0.1471   0.0052          0.1444 1000
validate(lrm(education ~ sex + african + amerindian + skin_tone, data = d, x = T, y = T), B = 1000)
##           index.orig training    test optimism index.corrected    n
## Dxy           0.2615   0.2664  0.2575   0.0089          0.2526 1000
## R2            0.0938   0.0975  0.0904   0.0071          0.0867 1000
## Intercept     0.0000   0.0000 -0.0073   0.0073         -0.0073 1000
## Slope         1.0000   1.0000  0.9628   0.0372          0.9628 1000
## Emax          0.0000   0.0000  0.0096   0.0096          0.0096 1000
## D             0.0910   0.0949  0.0876   0.0073          0.0837 1000
## U            -0.0007  -0.0007 -1.2772   1.2765         -1.2772 1000
## Q             0.0917   0.0956  1.3648  -1.2692          1.3609 1000
## B             0.2308   0.2300  0.2315  -0.0015          0.2323 1000
## g             0.6462   0.6586  0.6319   0.0268          0.6195 1000
## gp            0.1514   0.1537  0.1482   0.0055          0.1459 1000

For comparison, we also report the OLS results as well as interval coding of predictors.

#ordinal outcome, interval predictors
#numeric predictors
lrm(education ~ sex + skin_tone_num, data = d)
## Frequencies of Missing Values Due to Each Variable
##     education           sex skin_tone_num 
##            38             0           295 
## 
## Logistic Regression Model
##  
##  lrm(formula = education ~ sex + skin_tone_num, data = d)
##  
##  
##  Frequencies of Responses
##  
##     0    1    2    3 
##   586 1202  785  595 
##  
##  
##                        Model Likelihood     Discrimination    Rank Discrim.    
##                           Ratio Test           Indexes           Indexes       
##  Obs          3168    LR chi2     143.05    R2       0.047    C       0.591    
##  max |deriv| 3e-12    d.f.             2    g        0.399    Dxy     0.182    
##                       Pr(> chi2) <0.0001    gr       1.490    gamma   0.259    
##                                             gp       0.093    tau-a   0.132    
##                                             Brier    0.237                     
##  
##                Coef    S.E.   Wald Z Pr(>|Z|)
##  y>=1           1.5504 0.0597  25.95 <0.0001 
##  y>=2          -0.2479 0.0514  -4.83 <0.0001 
##  y>=3          -1.4889 0.0589 -25.28 <0.0001 
##  sex=Female     0.3419 0.0647   5.28 <0.0001 
##  skin_tone_num -0.5481 0.0518 -10.57 <0.0001 
## 
lrm(education ~ sex + african_num + amerindian_num, data = d)
## Frequencies of Missing Values Due to Each Variable
##      education            sex    african_num amerindian_num 
##             38              0            603            603 
## 
## Logistic Regression Model
##  
##  lrm(formula = education ~ sex + african_num + amerindian_num, 
##      data = d)
##  
##  
##  Frequencies of Responses
##  
##     0    1    2    3 
##   532 1088  729  518 
##  
##  
##                        Model Likelihood     Discrimination    Rank Discrim.    
##                           Ratio Test           Indexes           Indexes       
##  Obs          2867    LR chi2     250.05    R2       0.090    C       0.627    
##  max |deriv| 9e-11    d.f.             3    g        0.630    Dxy     0.253    
##                       Pr(> chi2) <0.0001    gr       1.878    gamma   0.260    
##                                             gp       0.148    tau-a   0.184    
##                                             Brier    0.231                     
##  
##                 Coef    S.E.   Wald Z Pr(>|Z|)
##  y>=1            2.6663 0.1129  23.62 <0.0001 
##  y>=2            0.8325 0.1026   8.11 <0.0001 
##  y>=3           -0.4990 0.1027  -4.86 <0.0001 
##  sex=Female      0.4009 0.0685   5.85 <0.0001 
##  african_num    -0.2975 0.0261 -11.40 <0.0001 
##  amerindian_num -0.1334 0.0259  -5.15 <0.0001 
## 
lrm(education ~ sex + african_num + amerindian_num + skin_tone_num, data = d)
## Frequencies of Missing Values Due to Each Variable
##      education            sex    african_num amerindian_num  skin_tone_num 
##             38              0            603            603            295 
## 
## Logistic Regression Model
##  
##  lrm(formula = education ~ sex + african_num + amerindian_num + 
##      skin_tone_num, data = d)
##  
##  
##  Frequencies of Responses
##  
##     0    1    2    3 
##   532 1088  729  518 
##  
##  
##                        Model Likelihood     Discrimination    Rank Discrim.    
##                           Ratio Test           Indexes           Indexes       
##  Obs          2867    LR chi2     257.38    R2       0.092    C       0.631    
##  max |deriv| 1e-10    d.f.             4    g        0.640    Dxy     0.262    
##                       Pr(> chi2) <0.0001    gr       1.896    gamma   0.267    
##                                             gp       0.150    tau-a   0.190    
##                                             Brier    0.231                     
##  
##                 Coef    S.E.   Wald Z Pr(>|Z|)
##  y>=1            2.6128 0.1146 22.80  <0.0001 
##  y>=2            0.7737 0.1049  7.38  <0.0001 
##  y>=3           -0.5588 0.1051 -5.32  <0.0001 
##  sex=Female      0.3962 0.0685  5.78  <0.0001 
##  african_num    -0.2525 0.0309 -8.17  <0.0001 
##  amerindian_num -0.1386 0.0260 -5.34  <0.0001 
##  skin_tone_num  -0.1752 0.0648 -2.70  0.0068  
## 
#ordinal predictors
ols(education4 ~ sex + skin_tone, data = d)
## Frequencies of Missing Values Due to Each Variable
## education4        sex  skin_tone 
##         38          0        295 
## 
## Linear Regression Model
##  
##  ols(formula = education4 ~ sex + skin_tone, data = d)
##  
##  
##                 Model Likelihood     Discrimination    
##                    Ratio Test           Indexes        
##  Obs    3168    LR chi2    163.64    R2       0.050    
##  sigma0.9711    d.f.            3    R2 adj   0.049    
##  d.f.   3164    Pr(> chi2) 0.0000    g        0.235    
##  
##  Residuals
##  
##      Min      1Q  Median      3Q     Max 
##  -1.6495 -0.6495 -0.1707  0.5426  2.0213 
##  
##  
##                  Coef    S.E.   t     Pr(>|t|)
##  Intercept        1.4574 0.0265 55.01 <0.0001 
##  sex=Female       0.1921 0.0345  5.57 <0.0001 
##  skin_tone=Mixed -0.4565 0.0477 -9.58 <0.0001 
##  skin_tone=Black -0.4788 0.0607 -7.89 <0.0001 
## 
ols(education4 ~ sex + african + amerindian, data = d)
## Frequencies of Missing Values Due to Each Variable
## education4        sex    african amerindian 
##         38          0        603        603 
## 
## Linear Regression Model
##  
##  ols(formula = education4 ~ sex + african + amerindian, data = d)
##  
##  
##                 Model Likelihood     Discrimination    
##                    Ratio Test           Indexes        
##  Obs    2867    LR chi2    271.88    R2       0.090    
##  sigma0.9446    d.f.            9    R2 adj   0.088    
##  d.f.   2857    Pr(> chi2) 0.0000    g        0.342    
##  
##  Residuals
##  
##       Min       1Q   Median       3Q      Max 
##  -2.01242 -0.69253 -0.08034  0.75900  2.13759 
##  
##  
##                     Coef    S.E.   t      Pr(>|t|)
##  Intercept           1.7945 0.0497  36.11 <0.0001 
##  sex=Female          0.2179 0.0354   6.16 <0.0001 
##  african=20-40%     -0.1376 0.0574  -2.40 0.0165  
##  african=40-60%     -0.2316 0.0594  -3.90 <0.0001 
##  african=60-80%     -0.4752 0.0608  -7.81 <0.0001 
##  african=80-100%    -0.6211 0.0590 -10.53 <0.0001 
##  amerindian=20-40%  -0.1503 0.0568  -2.65 0.0081  
##  amerindian=40-60%  -0.1822 0.0583  -3.13 0.0018  
##  amerindian=60-80%  -0.2692 0.0592  -4.55 <0.0001 
##  amerindian=80-100% -0.3110 0.0610  -5.10 <0.0001 
## 
ols(education4 ~ sex + african + amerindian + skin_tone, data = d)
## Frequencies of Missing Values Due to Each Variable
## education4        sex    african amerindian  skin_tone 
##         38          0        603        603        295 
## 
## Linear Regression Model
##  
##  ols(formula = education4 ~ sex + african + amerindian + skin_tone, 
##      data = d)
##  
##  
##                 Model Likelihood     Discrimination    
##                    Ratio Test           Indexes        
##  Obs    2867    LR chi2    278.63    R2       0.093    
##  sigma0.9438    d.f.           11    R2 adj   0.089    
##  d.f.   2855    Pr(> chi2) 0.0000    g        0.346    
##  
##  Residuals
##  
##       Min       1Q   Median       3Q      Max 
##  -2.01559 -0.69852 -0.08474  0.73297  2.17835 
##  
##  
##                     Coef    S.E.   t     Pr(>|t|)
##  Intercept           1.8001 0.0497 36.21 <0.0001 
##  sex=Female          0.2155 0.0354  6.09 <0.0001 
##  african=20-40%     -0.1357 0.0574 -2.37 0.0181  
##  african=40-60%     -0.2287 0.0594 -3.85 0.0001  
##  african=60-80%     -0.4408 0.0623 -7.08 <0.0001 
##  african=80-100%    -0.4874 0.0944 -5.16 <0.0001 
##  amerindian=20-40%  -0.1501 0.0567 -2.65 0.0082  
##  amerindian=40-60%  -0.1814 0.0583 -3.11 0.0019  
##  amerindian=60-80%  -0.2741 0.0592 -4.63 <0.0001 
##  amerindian=80-100% -0.3044 0.0610 -4.99 <0.0001 
##  skin_tone=Mixed    -0.1391 0.0898 -1.55 0.1215  
##  skin_tone=Black    -0.1867 0.0727 -2.57 0.0103  
## 
#numeric predictors
ols(education4 ~ sex + skin_tone_num, data = d)
## Frequencies of Missing Values Due to Each Variable
##    education4           sex skin_tone_num 
##            38             0           295 
## 
## Linear Regression Model
##  
##  ols(formula = education4 ~ sex + skin_tone_num, data = d)
##  
##  
##                 Model Likelihood     Discrimination    
##                    Ratio Test           Indexes        
##  Obs    3168    LR chi2    146.80    R2       0.045    
##  sigma0.9735    d.f.            2    R2 adj   0.045    
##  d.f.   3165    Pr(> chi2) 0.0000    g        0.217    
##  
##  Residuals
##  
##      Min      1Q  Median      3Q     Max 
##  -1.6331 -0.6331 -0.1476  0.5569  2.1479 
##  
##  
##                Coef    S.E.   t      Pr(>|t|)
##  Intercept      1.4431 0.0263  54.81 <0.0001 
##  sex=Female     0.1900 0.0346   5.49 <0.0001 
##  skin_tone_num -0.2955 0.0272 -10.88 <0.0001 
## 
ols(education4 ~ sex + african_num + amerindian_num, data = d)
## Frequencies of Missing Values Due to Each Variable
##     education4            sex    african_num amerindian_num 
##             38              0            603            603 
## 
## Linear Regression Model
##  
##  ols(formula = education4 ~ sex + african_num + amerindian_num, 
##      data = d)
##  
##  
##                 Model Likelihood     Discrimination    
##                    Ratio Test           Indexes        
##  Obs    2867    LR chi2    266.76    R2       0.089    
##  sigma0.9444    d.f.            3    R2 adj   0.088    
##  d.f.   2863    Pr(> chi2) 0.0000    g        0.339    
##  
##  Residuals
##  
##       Min       1Q   Median       3Q      Max 
##  -1.99861 -0.70960 -0.08785  0.76484  2.14302 
##  
##  
##                 Coef    S.E.   t      Pr(>|t|)
##  Intercept       2.0113 0.0514  39.10 <0.0001 
##  sex=Female      0.2182 0.0353   6.18 <0.0001 
##  african_num    -0.1600 0.0133 -12.00 <0.0001 
##  amerindian_num -0.0709 0.0134  -5.31 <0.0001 
## 
ols(education4 ~ sex + african_num + amerindian_num + skin_tone_num, data = d)
## Frequencies of Missing Values Due to Each Variable
##     education4            sex    african_num amerindian_num  skin_tone_num 
##             38              0            603            603            295 
## 
## Linear Regression Model
##  
##  ols(formula = education4 ~ sex + african_num + amerindian_num + 
##      skin_tone_num, data = d)
##  
##  
##                 Model Likelihood     Discrimination    
##                    Ratio Test           Indexes        
##  Obs    2867    LR chi2    273.73    R2       0.091    
##  sigma0.9435    d.f.            4    R2 adj   0.090    
##  d.f.   2862    Pr(> chi2) 0.0000    g        0.343    
##  
##  Residuals
##  
##      Min      1Q  Median      3Q     Max 
##  -1.9868 -0.6980 -0.0664  0.7234  2.2463 
##  
##  
##                 Coef    S.E.   t     Pr(>|t|)
##  Intercept       1.9817 0.0526 37.68 <0.0001 
##  sex=Female      0.2156 0.0353  6.11 <0.0001 
##  african_num    -0.1373 0.0159 -8.66 <0.0001 
##  amerindian_num -0.0732 0.0134 -5.48 <0.0001 
##  skin_tone_num  -0.0877 0.0332 -2.64 0.0083  
## 

Conclusion

Skin tone has quite limited incremental validity beyond ancestry about 0.2% R2, and large validity alone. This is the pattern of a non-causal correlate, not a cause. Skin tone is not an important cause of differences in educational attainment in this dataset from Brazil.