Colorism is a widely claimed but poorly supported model for racial inequality in social outcomes. The model is that people are racist and they rely on skin tone (/color) as a proxy. The discrimination people with darker skin face results in a variety of worse social outcomes. For instance, Marira, T. D., & Mitra, P. (2013). Colorism: Ubiquitous yet understudied. Industrial and Organizational Psychology, 6(1), 103-107.:
Why Should I–O Psychology Be Concerned With Colorism? I–O psychologists should be concerned with the issue of colorism because the phenomenon has implications that are capable of cutting across categories such as race, religion, gender, age, sexuality, nationality, and occupation. That is to say, extant psychological research contains evidence for the preference and undue favoritism of lighter skin complexions among Black, White, Latino, and Asian populations from around the world (Glenn, 2009). … However, as stated previously, colorism does not simply affect African Americans and Latinos; rather, it is a global phenomenon that consistently privileges lighter skin tones over darker ones (Glenn, 2008). Thus, the pervasiveness of this form of discrimination and its impact on workplace and labor market related outcomes, both in the United States and abroad, dictate that I–O psychologists become more acquainted with this form of discrimination.
We use data from a recent study (de Franca et al 2017) to test this model.
library(pacman)
p_load(kirkegaard, haven, rms, sjstats)
options(contrasts = rep("contr.treatment", 2))
d = read_stata("data/PONE-D-17-02552R.dta")
#recode
d$sex = d$sex %>% plyr::mapvalues(1:2, c("Male", "Female")) %>% factor() %>% fct_relevel("Male")
#ordinals
quintile_labels = c("0-20%", "20-40%", "40-60%", "60-80%", "80-100%")
quintile_recode = function(x) {
plyr::mapvalues(x, 1:5, quintile_labels) %>% ordered()
}
d$african = quintile_recode(d$afr_q5)
d$amerindian = quintile_recode(d$nat_q5)
d$european = quintile_recode(d$eur_q5)
d$education = ordered(d$education4)
d$skin_tone = d$skin_colour %>% plyr::mapvalues(0:2, c("White", "Mixed", "Black")) %>% ordered(levels = c("White", "Mixed", "Black"))
#intervals
d$african_num = d$afr_q5
d$amerindian_num = d$nat_q5
d$european_num = d$eur_q5
d$skin_tone_num = d$skin_colour
Simplest approach is just to plot the bivariate associations.
GG_group_means(d, "education4", "african") +
xlab("African ancestry") +
ylab("Education attainment (average ordinal)")
## Missing values were removed.
GG_save("figs/edu_afri.png")
GG_group_means(d, "education4", "amerindian") +
xlab("Amerindian ancestry") +
ylab("Education attainment (average ordinal)")
## Missing values were removed.
GG_save("figs/edu_amer.png")
GG_group_means(d, "education4", "european") +
xlab("European ancestry") +
ylab("Education attainment (average ordinal)")
## Missing values were removed.
GG_save("figs/edu_euro.png")
Since both the predictors and the outcome are ordinal in nature, the best analytic approach is ordinal regression coding the predictors as ordinals as well.
#full ordinal model
lrm(education ~ sex + skin_tone, data = d)
## Frequencies of Missing Values Due to Each Variable
## education sex skin_tone
## 38 0 295
##
## Logistic Regression Model
##
## lrm(formula = education ~ sex + skin_tone, data = d)
##
##
## Frequencies of Responses
##
## 0 1 2 3
## 586 1202 785 595
##
##
## Model Likelihood Discrimination Rank Discrim.
## Ratio Test Indexes Indexes
## Obs 3168 LR chi2 156.49 R2 0.052 C 0.591
## max |deriv| 4e-12 d.f. 3 g 0.427 Dxy 0.182
## Pr(> chi2) <0.0001 gr 1.532 gamma 0.259
## gp 0.100 tau-a 0.132
## Brier 0.236
##
## Coef S.E. Wald Z Pr(>|Z|)
## y>=1 1.5794 0.0604 26.16 <0.0001
## y>=2 -0.2233 0.0518 -4.31 <0.0001
## y>=3 -1.4691 0.0592 -24.83 <0.0001
## sex=Female 0.3455 0.0647 5.34 <0.0001
## skin_tone=Mixed -0.8106 0.0887 -9.14 <0.0001
## skin_tone=Black -0.9028 0.1157 -7.81 <0.0001
##
lrm(education ~ sex + african + amerindian, data = d)
## Frequencies of Missing Values Due to Each Variable
## education sex african amerindian
## 38 0 603 603
##
## Logistic Regression Model
##
## lrm(formula = education ~ sex + african + amerindian, data = d)
##
##
## Frequencies of Responses
##
## 0 1 2 3
## 532 1088 729 518
##
##
## Model Likelihood Discrimination Rank Discrim.
## Ratio Test Indexes Indexes
## Obs 2867 LR chi2 254.81 R2 0.091 C 0.629
## max |deriv| 1e-10 d.f. 9 g 0.637 Dxy 0.258
## Pr(> chi2) <0.0001 gr 1.891 gamma 0.264
## gp 0.150 tau-a 0.187
## Brier 0.231
##
## Coef S.E. Wald Z Pr(>|Z|)
## y>=1 2.2727 0.1074 21.16 <0.0001
## y>=2 0.4367 0.0994 4.39 <0.0001
## y>=3 -0.8959 0.1008 -8.89 <0.0001
## sex=Female 0.4014 0.0686 5.85 <0.0001
## african=20-40% -0.2694 0.1118 -2.41 0.0160
## african=40-60% -0.4354 0.1155 -3.77 0.0002
## african=60-80% -0.8871 0.1182 -7.51 <0.0001
## african=80-100% -1.1595 0.1150 -10.08 <0.0001
## amerindian=20-40% -0.2834 0.1100 -2.58 0.0100
## amerindian=40-60% -0.3545 0.1130 -3.14 0.0017
## amerindian=60-80% -0.5087 0.1148 -4.43 <0.0001
## amerindian=80-100% -0.5877 0.1184 -4.96 <0.0001
##
lrm(education ~ sex + african + amerindian + skin_tone, data = d)
## Frequencies of Missing Values Due to Each Variable
## education sex african amerindian skin_tone
## 38 0 603 603 295
##
## Logistic Regression Model
##
## lrm(formula = education ~ sex + african + amerindian + skin_tone,
## data = d)
##
##
## Frequencies of Responses
##
## 0 1 2 3
## 532 1088 729 518
##
##
## Model Likelihood Discrimination Rank Discrim.
## Ratio Test Indexes Indexes
## Obs 2867 LR chi2 261.88 R2 0.094 C 0.631
## max |deriv| 2e-10 d.f. 11 g 0.646 Dxy 0.262
## Pr(> chi2) <0.0001 gr 1.908 gamma 0.267
## gp 0.151 tau-a 0.189
## Brier 0.231
##
## Coef S.E. Wald Z Pr(>|Z|)
## y>=1 2.2892 0.1077 21.26 <0.0001
## y>=2 0.4492 0.0996 4.51 <0.0001
## y>=3 -0.8852 0.1009 -8.77 <0.0001
## sex=Female 0.3970 0.0686 5.78 <0.0001
## african=20-40% -0.2687 0.1119 -2.40 0.0163
## african=40-60% -0.4314 0.1156 -3.73 0.0002
## african=60-80% -0.8203 0.1210 -6.78 <0.0001
## african=80-100% -0.9202 0.1829 -5.03 <0.0001
## amerindian=20-40% -0.2848 0.1100 -2.59 0.0097
## amerindian=40-60% -0.3546 0.1130 -3.14 0.0017
## amerindian=60-80% -0.5185 0.1149 -4.51 <0.0001
## amerindian=80-100% -0.5755 0.1186 -4.85 <0.0001
## skin_tone=Mixed -0.2450 0.1730 -1.42 0.1567
## skin_tone=Black -0.3751 0.1416 -2.65 0.0081
##
#detectable validity from skin tone?
lrtest(
lrm(education ~ sex + african + amerindian, data = d),
lrm(education ~ sex + african + amerindian + skin_tone, data = d)
)
##
## Model 1: education ~ sex + african + amerindian
## Model 2: education ~ sex + african + amerindian + skin_tone
##
## L.R. Chisq d.f. P
## 7.06326421 2.00000000 0.02925713
#estimate out of sample R2
set.seed(1)
validate(lrm(education ~ sex + african + amerindian, data = d, x = T, y = T), B = 1000)
## index.orig training test optimism index.corrected n
## Dxy 0.2582 0.2624 0.2541 0.0083 0.2498 1000
## R2 0.0913 0.0953 0.0887 0.0066 0.0847 1000
## Intercept 0.0000 0.0000 -0.0095 0.0095 -0.0095 1000
## Slope 1.0000 1.0000 0.9647 0.0353 0.9647 1000
## Emax 0.0000 0.0000 0.0095 0.0095 0.0095 1000
## D 0.0885 0.0926 0.0859 0.0068 0.0817 1000
## U -0.0007 -0.0007 -1.2773 1.2766 -1.2773 1000
## Q 0.0892 0.0933 1.3631 -1.2698 1.3590 1000
## B 0.2313 0.2306 0.2319 -0.0014 0.2327 1000
## g 0.6371 0.6504 0.6256 0.0249 0.6122 1000
## gp 0.1496 0.1523 0.1471 0.0052 0.1444 1000
validate(lrm(education ~ sex + african + amerindian + skin_tone, data = d, x = T, y = T), B = 1000)
## index.orig training test optimism index.corrected n
## Dxy 0.2615 0.2664 0.2575 0.0089 0.2526 1000
## R2 0.0938 0.0975 0.0904 0.0071 0.0867 1000
## Intercept 0.0000 0.0000 -0.0073 0.0073 -0.0073 1000
## Slope 1.0000 1.0000 0.9628 0.0372 0.9628 1000
## Emax 0.0000 0.0000 0.0096 0.0096 0.0096 1000
## D 0.0910 0.0949 0.0876 0.0073 0.0837 1000
## U -0.0007 -0.0007 -1.2772 1.2765 -1.2772 1000
## Q 0.0917 0.0956 1.3648 -1.2692 1.3609 1000
## B 0.2308 0.2300 0.2315 -0.0015 0.2323 1000
## g 0.6462 0.6586 0.6319 0.0268 0.6195 1000
## gp 0.1514 0.1537 0.1482 0.0055 0.1459 1000
For comparison, we also report the OLS results as well as interval coding of predictors.
#ordinal outcome, interval predictors
#numeric predictors
lrm(education ~ sex + skin_tone_num, data = d)
## Frequencies of Missing Values Due to Each Variable
## education sex skin_tone_num
## 38 0 295
##
## Logistic Regression Model
##
## lrm(formula = education ~ sex + skin_tone_num, data = d)
##
##
## Frequencies of Responses
##
## 0 1 2 3
## 586 1202 785 595
##
##
## Model Likelihood Discrimination Rank Discrim.
## Ratio Test Indexes Indexes
## Obs 3168 LR chi2 143.05 R2 0.047 C 0.591
## max |deriv| 3e-12 d.f. 2 g 0.399 Dxy 0.182
## Pr(> chi2) <0.0001 gr 1.490 gamma 0.259
## gp 0.093 tau-a 0.132
## Brier 0.237
##
## Coef S.E. Wald Z Pr(>|Z|)
## y>=1 1.5504 0.0597 25.95 <0.0001
## y>=2 -0.2479 0.0514 -4.83 <0.0001
## y>=3 -1.4889 0.0589 -25.28 <0.0001
## sex=Female 0.3419 0.0647 5.28 <0.0001
## skin_tone_num -0.5481 0.0518 -10.57 <0.0001
##
lrm(education ~ sex + african_num + amerindian_num, data = d)
## Frequencies of Missing Values Due to Each Variable
## education sex african_num amerindian_num
## 38 0 603 603
##
## Logistic Regression Model
##
## lrm(formula = education ~ sex + african_num + amerindian_num,
## data = d)
##
##
## Frequencies of Responses
##
## 0 1 2 3
## 532 1088 729 518
##
##
## Model Likelihood Discrimination Rank Discrim.
## Ratio Test Indexes Indexes
## Obs 2867 LR chi2 250.05 R2 0.090 C 0.627
## max |deriv| 9e-11 d.f. 3 g 0.630 Dxy 0.253
## Pr(> chi2) <0.0001 gr 1.878 gamma 0.260
## gp 0.148 tau-a 0.184
## Brier 0.231
##
## Coef S.E. Wald Z Pr(>|Z|)
## y>=1 2.6663 0.1129 23.62 <0.0001
## y>=2 0.8325 0.1026 8.11 <0.0001
## y>=3 -0.4990 0.1027 -4.86 <0.0001
## sex=Female 0.4009 0.0685 5.85 <0.0001
## african_num -0.2975 0.0261 -11.40 <0.0001
## amerindian_num -0.1334 0.0259 -5.15 <0.0001
##
lrm(education ~ sex + african_num + amerindian_num + skin_tone_num, data = d)
## Frequencies of Missing Values Due to Each Variable
## education sex african_num amerindian_num skin_tone_num
## 38 0 603 603 295
##
## Logistic Regression Model
##
## lrm(formula = education ~ sex + african_num + amerindian_num +
## skin_tone_num, data = d)
##
##
## Frequencies of Responses
##
## 0 1 2 3
## 532 1088 729 518
##
##
## Model Likelihood Discrimination Rank Discrim.
## Ratio Test Indexes Indexes
## Obs 2867 LR chi2 257.38 R2 0.092 C 0.631
## max |deriv| 1e-10 d.f. 4 g 0.640 Dxy 0.262
## Pr(> chi2) <0.0001 gr 1.896 gamma 0.267
## gp 0.150 tau-a 0.190
## Brier 0.231
##
## Coef S.E. Wald Z Pr(>|Z|)
## y>=1 2.6128 0.1146 22.80 <0.0001
## y>=2 0.7737 0.1049 7.38 <0.0001
## y>=3 -0.5588 0.1051 -5.32 <0.0001
## sex=Female 0.3962 0.0685 5.78 <0.0001
## african_num -0.2525 0.0309 -8.17 <0.0001
## amerindian_num -0.1386 0.0260 -5.34 <0.0001
## skin_tone_num -0.1752 0.0648 -2.70 0.0068
##
#ordinal predictors
ols(education4 ~ sex + skin_tone, data = d)
## Frequencies of Missing Values Due to Each Variable
## education4 sex skin_tone
## 38 0 295
##
## Linear Regression Model
##
## ols(formula = education4 ~ sex + skin_tone, data = d)
##
##
## Model Likelihood Discrimination
## Ratio Test Indexes
## Obs 3168 LR chi2 163.64 R2 0.050
## sigma0.9711 d.f. 3 R2 adj 0.049
## d.f. 3164 Pr(> chi2) 0.0000 g 0.235
##
## Residuals
##
## Min 1Q Median 3Q Max
## -1.6495 -0.6495 -0.1707 0.5426 2.0213
##
##
## Coef S.E. t Pr(>|t|)
## Intercept 1.4574 0.0265 55.01 <0.0001
## sex=Female 0.1921 0.0345 5.57 <0.0001
## skin_tone=Mixed -0.4565 0.0477 -9.58 <0.0001
## skin_tone=Black -0.4788 0.0607 -7.89 <0.0001
##
ols(education4 ~ sex + african + amerindian, data = d)
## Frequencies of Missing Values Due to Each Variable
## education4 sex african amerindian
## 38 0 603 603
##
## Linear Regression Model
##
## ols(formula = education4 ~ sex + african + amerindian, data = d)
##
##
## Model Likelihood Discrimination
## Ratio Test Indexes
## Obs 2867 LR chi2 271.88 R2 0.090
## sigma0.9446 d.f. 9 R2 adj 0.088
## d.f. 2857 Pr(> chi2) 0.0000 g 0.342
##
## Residuals
##
## Min 1Q Median 3Q Max
## -2.01242 -0.69253 -0.08034 0.75900 2.13759
##
##
## Coef S.E. t Pr(>|t|)
## Intercept 1.7945 0.0497 36.11 <0.0001
## sex=Female 0.2179 0.0354 6.16 <0.0001
## african=20-40% -0.1376 0.0574 -2.40 0.0165
## african=40-60% -0.2316 0.0594 -3.90 <0.0001
## african=60-80% -0.4752 0.0608 -7.81 <0.0001
## african=80-100% -0.6211 0.0590 -10.53 <0.0001
## amerindian=20-40% -0.1503 0.0568 -2.65 0.0081
## amerindian=40-60% -0.1822 0.0583 -3.13 0.0018
## amerindian=60-80% -0.2692 0.0592 -4.55 <0.0001
## amerindian=80-100% -0.3110 0.0610 -5.10 <0.0001
##
ols(education4 ~ sex + african + amerindian + skin_tone, data = d)
## Frequencies of Missing Values Due to Each Variable
## education4 sex african amerindian skin_tone
## 38 0 603 603 295
##
## Linear Regression Model
##
## ols(formula = education4 ~ sex + african + amerindian + skin_tone,
## data = d)
##
##
## Model Likelihood Discrimination
## Ratio Test Indexes
## Obs 2867 LR chi2 278.63 R2 0.093
## sigma0.9438 d.f. 11 R2 adj 0.089
## d.f. 2855 Pr(> chi2) 0.0000 g 0.346
##
## Residuals
##
## Min 1Q Median 3Q Max
## -2.01559 -0.69852 -0.08474 0.73297 2.17835
##
##
## Coef S.E. t Pr(>|t|)
## Intercept 1.8001 0.0497 36.21 <0.0001
## sex=Female 0.2155 0.0354 6.09 <0.0001
## african=20-40% -0.1357 0.0574 -2.37 0.0181
## african=40-60% -0.2287 0.0594 -3.85 0.0001
## african=60-80% -0.4408 0.0623 -7.08 <0.0001
## african=80-100% -0.4874 0.0944 -5.16 <0.0001
## amerindian=20-40% -0.1501 0.0567 -2.65 0.0082
## amerindian=40-60% -0.1814 0.0583 -3.11 0.0019
## amerindian=60-80% -0.2741 0.0592 -4.63 <0.0001
## amerindian=80-100% -0.3044 0.0610 -4.99 <0.0001
## skin_tone=Mixed -0.1391 0.0898 -1.55 0.1215
## skin_tone=Black -0.1867 0.0727 -2.57 0.0103
##
#numeric predictors
ols(education4 ~ sex + skin_tone_num, data = d)
## Frequencies of Missing Values Due to Each Variable
## education4 sex skin_tone_num
## 38 0 295
##
## Linear Regression Model
##
## ols(formula = education4 ~ sex + skin_tone_num, data = d)
##
##
## Model Likelihood Discrimination
## Ratio Test Indexes
## Obs 3168 LR chi2 146.80 R2 0.045
## sigma0.9735 d.f. 2 R2 adj 0.045
## d.f. 3165 Pr(> chi2) 0.0000 g 0.217
##
## Residuals
##
## Min 1Q Median 3Q Max
## -1.6331 -0.6331 -0.1476 0.5569 2.1479
##
##
## Coef S.E. t Pr(>|t|)
## Intercept 1.4431 0.0263 54.81 <0.0001
## sex=Female 0.1900 0.0346 5.49 <0.0001
## skin_tone_num -0.2955 0.0272 -10.88 <0.0001
##
ols(education4 ~ sex + african_num + amerindian_num, data = d)
## Frequencies of Missing Values Due to Each Variable
## education4 sex african_num amerindian_num
## 38 0 603 603
##
## Linear Regression Model
##
## ols(formula = education4 ~ sex + african_num + amerindian_num,
## data = d)
##
##
## Model Likelihood Discrimination
## Ratio Test Indexes
## Obs 2867 LR chi2 266.76 R2 0.089
## sigma0.9444 d.f. 3 R2 adj 0.088
## d.f. 2863 Pr(> chi2) 0.0000 g 0.339
##
## Residuals
##
## Min 1Q Median 3Q Max
## -1.99861 -0.70960 -0.08785 0.76484 2.14302
##
##
## Coef S.E. t Pr(>|t|)
## Intercept 2.0113 0.0514 39.10 <0.0001
## sex=Female 0.2182 0.0353 6.18 <0.0001
## african_num -0.1600 0.0133 -12.00 <0.0001
## amerindian_num -0.0709 0.0134 -5.31 <0.0001
##
ols(education4 ~ sex + african_num + amerindian_num + skin_tone_num, data = d)
## Frequencies of Missing Values Due to Each Variable
## education4 sex african_num amerindian_num skin_tone_num
## 38 0 603 603 295
##
## Linear Regression Model
##
## ols(formula = education4 ~ sex + african_num + amerindian_num +
## skin_tone_num, data = d)
##
##
## Model Likelihood Discrimination
## Ratio Test Indexes
## Obs 2867 LR chi2 273.73 R2 0.091
## sigma0.9435 d.f. 4 R2 adj 0.090
## d.f. 2862 Pr(> chi2) 0.0000 g 0.343
##
## Residuals
##
## Min 1Q Median 3Q Max
## -1.9868 -0.6980 -0.0664 0.7234 2.2463
##
##
## Coef S.E. t Pr(>|t|)
## Intercept 1.9817 0.0526 37.68 <0.0001
## sex=Female 0.2156 0.0353 6.11 <0.0001
## african_num -0.1373 0.0159 -8.66 <0.0001
## amerindian_num -0.0732 0.0134 -5.48 <0.0001
## skin_tone_num -0.0877 0.0332 -2.64 0.0083
##
Skin tone has quite limited incremental validity beyond ancestry about 0.2% R2, and large validity alone. This is the pattern of a non-causal correlate, not a cause. Skin tone is not an important cause of differences in educational attainment in this dataset from Brazil.