Spring 2026
(updated 15 Apr 26)

Academic honesty statement

I have been academically honest in all of my work and will not tolerate academic dishonesty of others, consistent with UGA’s Academic Honesty Policy.

Sign the academic honesty statement by typing your name on the Signature line.

Dorian Cushing:

We will not accept submissions that omit a signed Academic Honesty statement.

Introduction

Overview

In this project, I look at earnings differences between full-time working men and women using data from the March 2024 CPS. The data set includes earnings, education, demographics, and household characteristics regarding this social group. I find an evidently clear gender wage gap that actually gets bigger over the course of a career and in controlling for things like education and demographics, the gap shrinks, but definitely doesn’t leave.

Data

March 2024 CPS

The March 2024 CPS ASEC surveys ~100,000 households. The ASEC includes data on annual earnings, total income, work experience, and health insurance coverage from the previous calendar year, making it especially useful for analyzing income differences like the gender wage gap.

March 2024 CPS Extract

cpsmar_e <- read_csv(here("data", "cpsmar_e.csv")) 

The extract was created using the script cpsmar_e.R. It selects key variables from the person file, including age, earnings, hours, weeks, race, gender, marital status, and education. New variables are created including indicators for gender, Hispanic origin, full-time status, union membership, and education levels. This is then restricted to full-time workers using filter. Overall contains 50,388 obs. and 20 variables.

Analysis sample

cpsmar_a <- cpsmar_e %>%
filter(
age >= 23,
age <= 62,
earnings > 0
) %>%
mutate(
gender = case_when(female == 1 ~ "Female", TRUE ~ "Male"),
wage = earnings / (weeks * hours),
lwage = log(wage),
Black = case_when(race == 2 ~ 1, TRUE ~ 0),
south = case_when(region == 3 ~ 1, TRUE ~ 0),
married = case_when((marital == 1 | marital == 2 | marital == 3) ~ 1, TRUE ~ 0),
age_centered = age - 23
)

The CPS extract is narrowed to individuals between ages men and women: 23 to 62 with positive earnings. Final sample contains 44,614 observations.

Baseline earnings distributions

Plotting earnings distributions

figure1 <- ggplot(cpsmar_a, aes(x = earnings, group = gender, fill = gender)) +
  geom_density(alpha = 0.4) +
  labs(
    title="Figure 1. Distribution of earnings by gender",
    x="Earnings",
    y="Density"
    )+
  theme_minimal()

earnings_fvm <- cpsmar_a %>%
  group_by(gender) %>%
  summarize(avg_earnings = round(mean(earnings, na.rm = TRUE), 0))

avg_earnings_f <- earnings_fvm %>% 
  filter(gender == "Female") %>% 
  pull(avg_earnings)

avg_earnings_m <- earnings_fvm %>% 
  filter(gender == "Male") %>% 
  pull(avg_earnings)

Distribution of earnings by gender

Baseline comparisons

Figure 1 shows that the earnings distribution for men is shifted to the right relative to women, indicating higher earnings on average. Average earnings for women are $75,026, compared to $94,269 for men. This implies a difference of $19,243, which is about 25.6% higher for men. This provides clear evidence of a gender earnings gap in the sample.

The career gender gap

Wages and hours differences

Table 1. Wages and hours by gender
Female Male
Mean SD N Mean SD N
lwage 3.29 0.69 19396 3.46 0.73 25218
hours 42.17 6.06 19396 43.81 7.52 25218

Documenting the differences

Table 1 shows that men earn higher log wages than women on average. Men also work more hours per week than women on average. Both differences are consistent with a meaningful gender gap in compensation.

Plotting career log wage profiles

cef_fvm_w <- cpsmar_a %>%
  group_by(gender, age_centered) %>%
  summarize(avg_lwage = mean(lwage, na.rm = TRUE))

figure2 <- ggplot(cef_fvm_w, aes(x = age_centered, y = avg_lwage, color = gender, linetype = gender, linewidth = gender)) +
  geom_point() +
  geom_line() +
  scale_linetype_manual(values = c("Female" = "longdash", "Male" = "solid")) + 
  scale_linewidth_manual(values = c("Female" = 0.7, "Male" = 0.5)) + 
  guides(linewidth = "none") +
  labs(
    title="Figure 2. Career log-wage profiles for women and men",
    x="Year",
    y="Average log wage"
    )+
  theme_minimal()

Career log wage profiles

Estimating wage differences over a career

males <- cef_fvm_w %>%
  filter(gender == "Male") %>%
  rename(avg_lwage_male = avg_lwage) %>%
  select(!gender) 

females <- cef_fvm_w %>%
  filter(gender == "Female") %>%
  rename(avg_lwage_female = avg_lwage) %>%
  select(!gender)

diff_fvm <- inner_join(males, females, by = "age_centered") %>%
  filter(age_centered <= 30) %>%
  mutate(
    diff = avg_lwage_male - avg_lwage_female,
    age_group = cut(
      age_centered, 
      breaks = c(-1, 10, 20, 30), 
      labels = c("1-10", "11-20", "21-30"))
    ) %>%
  group_by(age_group) %>%
  summarize(mean_diff = mean(diff) * 100) 
  
table2 <- kable(
  diff_fvm,
  digits = 2,
  col.names = c("Year Range", "Avg Pct Difference"),
  align = "cc",
  caption = "Table 2. Percent wage differences, first 30 years",
  ) %>%
  kable_styling(position = "center")

Evolution of the gender wage gap

Table 2. Percent wage differences, first 30 years
Year Range Avg Pct Difference
1-10 7.96
11-20 15.28
21-30 20.75

Discussing the gender wage gap evolution

The log wage gap grows over the first years of a career. The average percentage gap is smallest in years 1-10 and largest in years 21-30. The gender wage gap isn’t just starting salaries, but also career progression.

Explaining the gender wage gap

Fitting the log wage profiles

formula <- avg_lwage ~ age_centered + I(age_centered^2)

figure3 <- figure2 +  
  geom_smooth(
    method = "lm", 
    formula = y ~ x + I(x^2), 
    aes(group = gender), 
    se = FALSE
  ) +
  stat_poly_eq(
    aes(label = after_stat(eq.label)),
    formula = y ~ x + I(x^2),
    parse = TRUE
  ) +
  labs(
    title="Figure 3. Career log-wage profiles with quadratic fits",
    x="Year",
    y="Average log wage"
  ) +
  theme_minimal() +
  theme(legend.position = "bottom")

Log wage profiles with quadratic fits

Gender differences in education

Table 3. Educational attainment by gender
Female Male
Mean SD N Mean SD N
HSGrad 0.20 0.40 19396 0.28 0.45 25218
SomeColl 0.25 0.43 19396 0.24 0.43 25218
CollDeg 0.51 0.50 19396 0.41 0.49 25218

Gender differences in demographics

Table 4. Demographic characteristics by gender
Female Male
Mean SD N Mean SD N
Black 0.13 0.33 19396 0.09 0.29 25218
hisp 0.19 0.40 19396 0.22 0.41 25218
south 0.38 0.49 19396 0.37 0.48 25218
city 0.68 0.47 19396 0.68 0.47 25218
union 0.02 0.13 19396 0.02 0.12 25218

Documenting differences in characteristics

Table 3 shows that women in the sample are more likely than men to have a college degree and less likely to have only a high school diploma. Table 4 highlights that men in the sample are more likely to be Black, live in the South, and to be union members, while women are more likely to live in cities. These differences in observable characteristics could help explain part of the gender wage gap, but they do not seem to fully account for it.

Controlling for education and demographic characteristics

singles <- cpsmar_a %>%
  filter(
    married == 0,
    child_u6 == 0
    )

models <- list(
  "Baseline"      = lm(lwage ~ female +
                       age_centered + I(age_centered^2),
                       cpsmar_a),
  "Add Education" = lm(lwage ~ female +
                       age_centered + I(age_centered^2) + HSGrad + SomeColl + CollDeg,
                       cpsmar_a),
  "Add Person"    = lm(lwage ~ female +
                       age_centered + I(age_centered^2) + HSGrad + SomeColl + CollDeg +
                       Black + hisp + south + city,
                       cpsmar_a),
  "Add Household" = lm(lwage ~ female +
                       age_centered + I(age_centered^2) + HSGrad + SomeColl + CollDeg +
                       Black + hisp + south + city +
                       married + child_u6,
                       cpsmar_a),
  "Only Singles"  = lm(lwage ~ female +
                       age_centered + I(age_centered^2) + HSGrad + SomeColl + CollDeg +
                       Black + hisp + south + city, 
                       singles)
)

Reporting the results

cm <- c(
  'female'            = 'Female',
  'age_centered'      = 'Age',
  'I(age_centered^2)' = 'Age$^2$',
  '(Intercept)'       = 'Constant'
)
gm <- tibble::tribble(
  ~raw, ~clean, ~fmt,
  "nobs", "$N$", 0,
  "r.squared", "$R^2$", 2
)
rows <- tribble(~term, ~Baseline, ~Add_Education, ~Add_Person, ~Add_Household, ~Only_Singles,
  'Education controls',   ' ',         'X',           'X',          'X',           'X',
  'Demographic controls', ' ',         ' ',           'X',          'X',           'X',
  'Household controls',   ' ',         ' ',           ' ',          'X',           'X'
)
attr(rows, 'position') <- c(9, 10, 11)

table5 <- modelsummary(
  models,
  add_rows = rows,
  coef_map = cm,
  gof_map = gm,
  vcov = c("HC1", "HC1", "HC1", "HC1", "HC1"),
  title = "Table 5. OLS estimates of the gender wage gap",
  notes = "Heteroskedasticity-robust standard errors in parentheses.",
  escape = FALSE
  )

Explaining the gender wage gap

Table 5. OLS estimates of the gender wage gap
Baseline Add Education Add Person Add Household Only Singles
Heteroskedasticity-robust standard errors in parentheses.
Female -0.165 -0.239 -0.230 -0.218 -0.137
(0.007) (0.006) (0.006) (0.006) (0.010)
Age 0.037 0.029 0.028 0.022 0.020
(0.001) (0.001) (0.001) (0.001) (0.002)
Age$^2$ -0.001 -0.001 -0.001 -0.000 -0.000
(0.000) (0.000) (0.000) (0.000) (0.000)
Constant 3.114 2.630 2.683 2.642 2.637
(0.010) (0.016) (0.017) (0.017) (0.029)
Education controls X X X X
Demographic controls X X X
Household controls X X
$N$ 44614 44614 44614 44614 15163
$R^2$ 0.04 0.21 0.23 0.24 0.18

Documenting the findings

In the baseline model, the female coefficient is -0.165, implying women earn about ~16.5% less on average. Adding education controls increases the gap to 23.9%. Demographics slightly reduce the gap to 23.0%, and including household characteristics reduces it to 21.8%. Restricting to singles without young children decreases it to 13.7%. Observable characteristics explain some of the gap, but a substantial difference persists.

Conclusion

Summary

The CPS data on M/F workers from ages 23 to 62 shows a clear gender wage gap, with men earning more on average. The gap increases from about 8% early in careers to over 20% later on. Controlling for education and demographic characteristics reduces the gap but does not eliminate it. Regression results show that women earn between 13% and 24% less, even after accounting for observable differences. These findings suggest that the gender wage gap is driven by observable and unobservable factors.

Appendix

Data documentation

# Define the variables and their descriptions
variables <- data.frame(
  Variable = c(
    "age", 
    "earnings", 
    "hours", 
    "race", 
    "marital", 
    "HSGrad", 
    "SomeColl", 
    "CollDeg", 
    "region", 
    "female", 
    "hisp", 
    "fulltime"
    ),
  Definition = c(
    "years; capped at 85",
    "annual pre-tax wage and salary earnings",
    "usual hours worked per week",
    "respondent's race (1 = White only, 2 = Black only, 3 = AI only, 4 = Asian only, 5 = Hawaiian/Pacific Islander only (HP), 6 = White-Black, 7 = White-AI, 8 = White-Asian, 9 = White-HP, 11 = Black-Asian, 12 = Black-HP, 13 = AI-Asian, 14 = AI-HP, 15 = Asian-HP, 16 = White-Black-AI, 17 = White-Black-Asian, 18 = White-Black-HP, 19 = White-AI-Asian, 20 = White-AI-HP, 21 = White-Asian-HP, 22 = Black-AI-Asian, 23 = White-Black-AI-Asian, 24 = White-AI-Asian-HP, 25 = White-Black-AI-Asian-HP, 25 = Other 3 race comb., 26 = Other 4 or 5 race comb.)",
    "marital status (1 = Married civilian, 2 = Married AF, 3 = Married absent, 4 = Widowed, 5 = Divorced, 6 = Separated, 7 = Never married)",
    "= 1, if highest education is a high school diploma (education == 39)",
    "= 1, if some college but no four-year degree (education 40-42)",
    "= 1, if holds a four-year college degree or more (education >= 43)",
    "household region (1 = Northeast, 2 = Midwest, 3 = South, 4 = West)",
    "= 1 if female",
    "= 1 if Hispanic, Spanish, or Latino",
    "= 1 if worked at least 48 weeks and at least 36 hours per week"
  )
)

List of main variables with definitions

This is a list of the main variables used in this project with their definitions.

Variable Definition
age years; capped at 85
earnings annual pre-tax wage and salary earnings
hours usual hours worked per week
race respondent’s race (1 = White only, 2 = Black only, 3 = AI only, 4 = Asian only, 5 = Hawaiian/Pacific Islander only (HP), 6 = White-Black, 7 = White-AI, 8 = White-Asian, 9 = White-HP, 11 = Black-Asian, 12 = Black-HP, 13 = AI-Asian, 14 = AI-HP, 15 = Asian-HP, 16 = White-Black-AI, 17 = White-Black-Asian, 18 = White-Black-HP, 19 = White-AI-Asian, 20 = White-AI-HP, 21 = White-Asian-HP, 22 = Black-AI-Asian, 23 = White-Black-AI-Asian, 24 = White-AI-Asian-HP, 25 = White-Black-AI-Asian-HP, 25 = Other 3 race comb., 26 = Other 4 or 5 race comb.)
marital marital status (1 = Married civilian, 2 = Married AF, 3 = Married absent, 4 = Widowed, 5 = Divorced, 6 = Separated, 7 = Never married)
HSGrad = 1, if highest education is a high school diploma (education == 39)
SomeColl = 1, if some college but no four-year degree (education 40-42)
CollDeg = 1, if holds a four-year college degree or more (education >= 43)
region household region (1 = Northeast, 2 = Midwest, 3 = South, 4 = West)
female = 1 if female
hisp = 1 if Hispanic, Spanish, or Latino
fulltime = 1 if worked at least 48 weeks and at least 36 hours per week