Problem Set 3
In this problem set, we will experiment with a dataset by Bertrand and Mullainathan (BM for short).
- Are education and jobs balanced across race? Why is this important?
Solution It is important to ensure that education and jobs are balanced across race as the comparison of the resumes have to be solely based on race and not on education or jobs. These two variables are balanced with education at 3.61 and jobs at 3.66 on average.
by_race <- bm %>% group_by(race)
by_race %>% summarise(
edu = mean(bm$education),
jobs = mean(bm$jobs)
)
## `summarise()` ungrouping output (override with `.groups` argument)
## # A tibble: 2 x 3
## race edu jobs
## <chr> <dbl> <dbl>
## 1 b 3.62 3.66
## 2 w 3.62 3.66
- Calculate the average callback rate for all resumes.
Solution Average callback rate for all resumes is 0.080
summarise(bm, avg_callback = mean(call))
## # A tibble: 1 x 1
## avg_callback
## <dbl>
## 1 0.0805
- Calculate the average callback rates separately for resumes with “white-sounding” and “blacksounding” names (first row from Table 1 in paper). What do your results suggest?
Solution The results suggest that African-American sounding names get less average call backs compared to average call back rates of white-sounding names. T Average call back of white sounding names is more than the average call back of all resumes at 0.09 and African-American sounding names have a call back rate of 0.06.
by_race <- group_by(bm,race)
summarise(by_race, avg_callback = mean(call))
## `summarise()` ungrouping output (override with `.groups` argument)
## # A tibble: 2 x 2
## race avg_callback
## <chr> <dbl>
## 1 b 0.0645
## 2 w 0.0965
- Repeat question 3, but calculate the average rates for each combination of race and sex (rows 4 and 7 from Table 1 in paper). What do your results suggest?
Solution Once call back rates are grouped by gender, the results suggest that White sounding female names get the highest call back rate of all the groups. Whereas african american sounding male names get the lowest call backs. The paper also suggest that on both gender and race, white sounding names get an overall higher call back.
by_racegender <- group_by(bm,race, gender)
summarise(by_racegender, avg_callback = mean(call))
## `summarise()` regrouping output by 'race' (override with `.groups` argument)
## # A tibble: 4 x 3
## # Groups: race [2]
## race gender avg_callback
## <chr> <chr> <dbl>
## 1 b f 0.0663
## 2 b m 0.0583
## 3 w f 0.0989
## 4 w m 0.0887