Exploring a Replicate of a Field Experiment on Labor Market Discrimination βAre Emily and Greg More Employable Than Lakisha and Jamal? by Marianne Bertrand and Sendhil Mullainathan
QUESTION 1
bm_balance <- select(bm, race, education, jobs)
bm_balance %>%
group_by(race) %>%
summarise_all(mean)
## # A tibble: 2 x 3
## race education jobs
## <chr> <dbl> <dbl>
## 1 b 3.62 3.66
## 2 w 3.62 3.66
Yes, they are balanced across race. Education is balanced at 3.62 between white sounding and black sounding names (Race). Jobs are balanced at 3.66 between white sounding and black sounding names (Race).Education and jobs should be balanced across race because the observables within this data set should not be biased in any way. Balanced observable data is key to fair analysis. Most importantly, this is important because it eliminates selection bias. Selection bias is caused by a set of differences in the observable and measurable characteriscs. This usually occurs when proper randomization is not achieved. By balancing education and jobs, we ensure that we are properly randomizing, testing, calculating the causal effect of race on callback rate. As such, we can confidently say, because education and race are balanced race, that distinctively racial sounding names are the primary drivers to different callback rates for resumes represented by white sounding names and black sounding names. The random assignment of racial names to resumes that are the balanced are the same on average in observable and unobservable characteristics across resumes, eliminates selection bias and sures that any difference in outcomes is solely attributed to the difference in names.
QUESTION 2
summarize(bm, avg_callback = mean(call, na.rm = TRUE))
## # A tibble: 1 x 1
## avg_callback
## <dbl>
## 1 0.0805
QUESTION 3
bm_call1 <- transmute(bm, race, call)
bm_call1 %>%
group_by(race) %>%
summarise_all(mean)
## # A tibble: 2 x 2
## race call
## <chr> <dbl>
## 1 b 0.0645
## 2 w 0.0965
diff_per_call1 = 0.0965 - 0.0645
diff_per_call1
## [1] 0.032
On average, the results suggest a difference in callback rates of 3.20 percentage point or 50%, and this difference is solely attributed to the racial sounding name manipulation as other variables are balanced, this difference is also statistically significant. These results suggest that applicants with white sounding names can expect, on average, a callback for every 10 applications, whereas applicants with black-sounding names can only expect a call back every 15 applications submitted. This is very relevant as job openings may not be frequently posted. In summary, the results suggest a significant gap in callback rates based on applicant racial-sounding-names.
QUESTION 4
bm_call2 <- select(bm, race, call, gender)
bm_call2 %>%
group_by(race, gender) %>%
summarise_all(mean)
## # A tibble: 4 x 3
## # Groups: race [2]
## race gender call
## <chr> <chr> <dbl>
## 1 b f 0.0663
## 2 b m 0.0583
## 3 w f 0.0989
## 4 w m 0.0887
diff_per_call2f = 0.0989 - 0.0663
diff_per_call2f
## [1] 0.0326
diff_per_call2m = 0.0887 - 0.0583
diff_per_call2m
## [1] 0.0304
On average, these results suggest a significant racial gap in call backs for both male and female applicants with black sounding names against applicants with racially white-sounding names. On average, the difference between white and black sounding name females is 3.26 percentage points and the difference between white and black sounding name males is 3.04 percentage points. According to this data, the gap between females is higher than the gap between males which is very interestingly. Another interesting observation is that females appear to recieve more call backs than males, however, this may be due to the fact that female applications were sent to sales and administrative ads whereas male names were exclusively for sales openings, so that may explain its statistical insignificance. More importantly however, these results suggest racial discrimination against racially implicated names across genders, both females and males. This is really important, because the data is balanced, and so therefore we can say with certain confidence the difference in call back rates is a result of the difference in names.