This is the third challenge question for the course. Remember that it’s meant to be challenging and stretch your skills, but you can always ask for help or collaborate with your classmate. There are some articles at the bottom which will help provide broader context for the questions we’re studying here.
The Gender Wage Gap is the average difference between the wages of men and women who are working. The “unadjusted” GWG is simply the raw difference. Various types of “adjusted” GWGs account for different factors which may explain the differences in pay. However, conducting the adjustments is conceptually subtle: it is not always obvious what ought to be adjusted for and what ought not be adjusted for.
To illustrate this issue, consider the following thought exercise. Suppose the law (or, if you prefer, very strong cultural norms) dictated that, barring extreme circumstances, women leave work once they marry and may only re-enter if widowed or divorced. In this scenario the unadjusted GWG might be dramatically large, particularly as the average age of men and women being compared increased and larger shares of women got married. Adjusting for marital status and age by only comparing women and men of similar ages who have never been married might shrink the gap substantially. If one were to do this, one would not be able to cleanly argue that “the ‘true’ GWG is smaller once you adjust for marital status and age, thus the extent of discrimination against women is overstated”. Why not? In this scenario, marital status and age are part of the mechanism through which discrimination operates! While one could certainly make a case that the marital-status-and-age-adjusted GWG is the right measure to consider, it’s a bit disingenuous to argue that, once you’ve controlled for one of the major channels through which discrimination operates, there’s no more discrimination.
You work for a think tank in Washington DC, and are trying to assess the severity of the GWG to use in designing a new model legislation.
The dataset for this exercise is acs_2018_small_gwg.csv, available here. It is very similar to the acs_2018_small.csv dataset we’ve worked with in the past (in fact, it comes from the same ACS 2018 release), though instead of racblk and racwht there are three other variables:
sex, which takes the values Male or Female;labforce, which describes whether the individual is in the labor force (working or actively looking for a job) or not;classwkrd, which describes the type of work the individual does (e.g. work for a private company, some level of government, unpaid housework, etc.)facet_wrap where appropriate (you should have three sets of plots: one at the national level, one across states, and one across type of work). What is the unit of observation in this dataset?The following articles offer additional context. They are all linked or available on Canvas. Read them, evaluate them critically, and use the details to inform your answers above.