Fall 2025
(updated 02 Dec 25)

Academic honesty statement

I have been academically honest in all of my work and will not tolerate academic dishonesty of others, consistent with UGA’s Academic Honesty Policy.

Sign the academic honesty statement by typing your name on the Signature line.

Signature: Sabiha Jamal

We will not accept submissions that omit a signed Academic Honesty statement.

Introduction

Overview

This report will document the gender wage gap, showcasing a plausible relationship between wages and gender, but will also explore other contributors that impact these factors. The data this report utilizes includes information on individuals living within America and factors in their life which will be used to document plausible connections for the gender wage gap.

Data

March 2024 CPS

The data is the 2024 Current Population Survey (CPS), a monthly survey to gain necessary insights on households, and the Annual Social and Economic (ASEC), demographic information and labor force data. supplemented from the US Census Bureau from 54000 households.

March 2024 CPS Extract

cpsmar_e <- read_csv(here("data", "cpsmar_e.csv")) 

The data set includes 50388 observations and 20 The variables chosen for this report specifically for personal characteristics are an individual’s race, region, as well as their age. The household factors refers to their marital status as well as any children under 6 being present within the house. The data restricts the region and city to specific principle ones.

Analysis sample

cpsmar_a <- cpsmar_e %>%
  filter(
    age >= 23, 
    age <= 62,
    earnings > 0
  ) %>%
  mutate(
    gender = ifelse(female == 1, "Female", "Male"),
    wage = earnings/(hours*weeks), 
    lwage = log(wage),
    Black = case_when(race==2~1, TRUE ~ 0),
    south = case_when(region==3~1, TRUE ~ 0),
    married = case_when((marital==1 | marital==2 | marital==3)~1, TRUE ~ 0),
    age_centered = age - 23
  )

Baseline earnings distributions

Plotting earnings distributions

figure1 <- ggplot(data = cpsmar_a, aes(x = earnings, group = gender, fill = gender)) +
  geom_density(alpha = 0.4) +
  labs(
    title="Figure 1. Distribution of earnings by gender",
    x="Earnings",
    y="Density"
    )+
  theme_minimal()
earnings_fvm <- cpsmar_a %>%
  group_by(gender) %>%
  summarize(avg_earnings = round(mean(earnings, na.rm = TRUE),0))

avg_earnings_f <- earnings_fvm %>% 
  filter(gender == "Female") %>% 
  pull(avg_earnings) # `pull` extracts the "avg_earnings" value for "Female" from earnings_fvm, a single value since the data only record two genders.

avg_earnings_m <- earnings_fvm %>% 
  filter(gender == "Male") %>% 
  pull(avg_earnings) # `pull` extracts the "avg_earnings" value for "Male" from earnings_fvm, a single value since the data only record two genders. 

Distribution of earnings by gender

Baseline comparisons

When taking a look at the current graph, it is clear to see there is a larger peak for female wages and its location is lower than the male equivalent’s. Another major observation to be made is to understand there is a larger variation in men’s earnings.

The career gender gap

Wages and hours differences

Table 1. Wages and hours by gender
Female Male
N Mean SD N Mean SD
wage 19396 34.17 33.13 25218 41.43 42.34
hours 19396 42.17 6.06 25218 43.81 7.52

Documenting the differences

The table shows a clear distinction between men and women average wages, exemplifying a significant increase of 7.26 percentage points in men wages. The men standard deviation for wages is 9.21 percentage points higher than the women, showing more variation in their earnings. When looking at the hours logged for each gender, there is less of a difference between the two, with men higher a mean that is 1.64 percentage points higher and standard deviation that is 1.46 percentage points higher.

Plotting career log wage profiles

cef_fvm_w <- cpsmar_a %>%
  group_by(gender, age) %>%
  summarise(avg_lwage = mean(lwage, na.rm = TRUE))

figure2 <- ggplot(data = cef_fvm_w, aes(x = age, y = avg_lwage, color = gender, linetype = gender, linewidth = gender)) +
  geom_point() +
  geom_line() +
  scale_linetype_manual(values = c("Female" = "longdash", "Male" = "solid")) + 
  scale_linewidth_manual(values = c("Female" = 0.7, "Male" = 0.5)) + 
  guides(linewidth = "none") +
  labs(
    title="Figure 2. Career log-wage profiles for women and men",
    x="Year",
    y="Average log wage"
    )+
  theme_minimal()

Career log wage profiles

Estimating wage differences over a career

males <- cef_fvm_w %>%
  filter(gender == "Male") %>%
  rename(avg_lwage_male = avg_lwage) %>%
  select(-gender) 
females <- cef_fvm_w %>%
  filter(gender == "Female") %>%
  rename(avg_lwage_female = avg_lwage) %>%
  select(-gender)

diff_fvm <- inner_join(males, females, by = "age") %>%
  filter(age <= 30) %>%
  mutate(
    diff = avg_lwage_male - avg_lwage_female,
    age_group = cut(
      age, 
      breaks = c(-1, 10, 20, 30), 
      labels = c("1-10", "11-20", "21-30"))
    ) %>%
  group_by(age_group) %>%
  summarise(mean_diff = mean(diff)*100)
  
table2 <- kable(
  diff_fvm,
  digits = 2,
  col.names = c("Year Range", "Avg Pct Difference"),
  align = "cc",
  caption = "Table 2. Percent wage differences, first 30 years",
  ) %>%
  kable_styling(position = "center")

Evolution of the gender wage gap

Table 2. Percent wage differences, first 30 years
Year Range Avg Pct Difference
21-30 7.7

Discussing the gender wage gap evolution

When looking at Figure 2, it is clear to see that the women earnings plateau much lower than the men. Table 2 quantifies the percentage difference of 7.7 for 21-30 year old age group to give a better idea of the gap. The graph confirms this gap by showing how men also start off with a higher starting salary, while women salaries decrease substantially more in the last 10 years.

Explaining the gender wage gap

Fitting the log wage profiles

formula <- lwage ~ age + I(age^2)
figure3 <- figure2 +  
  geom_smooth(
    method = "lm", 
    formula = formula, 
    aes(group = gender), 
    se = FALSE
    ) +
  stat_poly_eq(
    aes(label =  after_stat(eq.label)),
    formula = y ~ x,
    parse = TRUE
    ) +
  labs(
    title="Figure 3. Career log-wage profiles with quadratic fits",
    x="Year",
    y="Average log wage"
    )+
  theme_minimal()+
  theme(legend.position = "bottom")

Log wage profiles with quadratic fits

Gender differences in education

Table 3. Educational attainment by gender
Female Male
N Mean SD N Mean SD
HSGrad 19396 0.20 0.40 25218 0.28 0.45
SomeColl 19396 0.25 0.43 25218 0.24 0.43
CollDeg 19396 0.51 0.50 25218 0.41 0.49

Gender differences in demographics

Table 4. Demographic characteristics by gender
Female Male
N Mean SD N Mean SD
age 19396 42.36 10.67 25218 42.16 10.67
race 19396 1.60 1.56 25218 1.54 1.46
region 19396 2.76 1.02 25218 2.80 1.02
marital 19396 3.25 2.65 25218 2.93 2.65

Documenting differences in characteristics

Table 3 presents educational attainment of the men and women within the study and shows more women have completed a college degree over men by 10 percentage points. Despite this, it is clear to see women still are getting paid less despite the higher proportion of women attaining a higher degree of education. While Table 3 focuses on educational characteristics, table 4 hones into demographic related factors such as whether an individual is age, race, region, and marital status. In terms of demographics, there is not as much variation between the two gender groups. It can be seen that women have a greater mean in the marital status. This can signify that men are more likely to be married than women.

Controlling for education and demographic characteristics

singles <- cpsmar_a %>%
  filter(
    married== 0,
    child_u6== 0
    )


models <- list(
  "Baseline"      = lm(lwage ~ female +
                       age_centered + I(age_centered^2),
                       data = cpsmar_a),
  "Add Education" = lm(lwage ~ female +
                       age_centered + I(age_centered^2) + HSGrad + SomeColl + CollDeg,
                       data = cpsmar_a),
  "Add Person"    = lm(lwage ~ female +
                       age_centered + I(age_centered^2) + HSGrad + SomeColl + CollDeg +
                       hisp + race + region + married,
                       data = cpsmar_a),
  "Add Household" = lm(lwage ~ female +
                       age_centered + I(age_centered^2) + HSGrad + SomeColl + CollDeg +
                       hisp + race + region + married +
                       child_u6,
                       data = cpsmar_a),
  "Only Singles"  = lm(lwage ~ female +
                       age_centered + I(age_centered^2) + HSGrad + SomeColl + CollDeg +
                       hisp + race + region + marital, 
                       data = singles)
)

Reporting the results

cm <- c(
  'female'            = 'Female',
  'age_centered'      = 'Age',
  'I(age_centered^2)' = 'Age$^2$',
  '(Intercept)'       = 'Constant'
)
gm <-  tibble::tribble(
  ~raw, ~clean, ~fmt,
  "nobs", "$N$", 0,
  "r.squared", "$R^2$", 2
)
rows <- tribble(~term, ~Baseline, ~Add_Education, ~Add_Person, ~Add_Household, ~Only_Singles,
  'Education controls',   ' ',         'X',           'X',          'X',           'X',
  'Demographic controls', ' ',         ' ',           'X',          'X',           'X',
  'Household controls',   ' ',         ' ',           ' ',          'X',           'X'
)
attr(rows, 'position') <- c(9, 10, 11) # Positions where you want these rows to appear

table5 <- modelsummary(
  models,
  add_rows = rows,
  coef_map = cm,
  gof_map = gm,
  vcov = c("HC1","HC2","HC3","HC4","HC5"),
  title = "Table 5. OLS estimates of the gender wage gap",
  notes = "robust standard errors in parentheses.",
  escape = FALSE
  )

Explaining the gender wage gap

Table 5. OLS estimates of the gender wage gap
Baseline Add Education Add Person Add Household Only Singles
robust standard errors in parentheses.
Female -0.165 -0.239 -0.225 -0.224 -0.147
(0.007) (0.006) (0.006) (0.006) (0.011)
Age 0.037 0.029 0.022 0.022 0.017
(0.001) (0.001) (0.001) (0.001) (0.002)
Age$^2$ -0.001 -0.001 -0.000 -0.000 -0.000
(0.000) (0.000) (0.000) (0.000) (0.000)
Constant 3.114 2.630 2.677 2.669 2.821
(0.010) (0.016) (0.018) (0.018) (0.056)
Education controls X X X X
Demographic controls X X X
Household controls X X
$N$ 44614 44614 44614 44614 15163
$R^2$ 0.04 0.21 0.23 0.23 0.17

Documenting the findings

Within the baseline model, the estimated gender wage gap is seen to be around -16.5 log points showcasing that women are earn less than men holding age constant.Once we add education as a factor, we can see the coefficient becomes more negative, indicating that the gap in wages becomes larger, showcasing the female gender’s conditional disadvantage. When including demographic controls it can be seen that these characteristics can connect and possibly account for the wage gap. However, once including household factors, there is not a significant difference when controlling for these factors. This contrasts with singles as the gap significantly decreases possibly due to the less variation in household structure. This large difference can be attributed with difference in demand for marital and child responsibilities for women.

Conclusion

Summary

These results describe and provide background for factors that can be possibly correlated with the gender wage gap.

Appendix

Data documentation

# Define the variables and their descriptions
variables <- data.frame(
  Variable = c(
    "age", 
    "earnings", 
    "hours", 
    "race", 
    "marital", 
    "HSGrad", 
    "SomeColl", 
    "CollDeg", 
    "region", 
    "female", 
    "hisp", 
    "fulltime"
    ),
  Definition = c(
    "years; capped at 85",
    " age > 21",
    "age < 63",
    "respondent’s race (1 = White only, 2 = Black only, 3 = AI only, 4 = Asian only, 5 = Hawaiian/Pacific Islander only (HP), 6 = White-Black, 7 = White-AI, 8 = White-Asian, 9 = White-HP, 11 = Black-Asian, 12 = Black-HP, 13 = AI-Asian, 14 = AI-HP, 15 = Asian-HP, 16 = White-Black-AI, 17 = White-Black-Asian, 18 = White-Black-HP, 19 = White-AI-Asian, 20 = White-AI-HP, 21 = White-Asian-HP, 22 = Black-AI-Asian, 23 = White-Black-AI-Asian, 24 = White-AI-Asian-HP, 25 = White-Black-AI-Asian-HP, 25 = Other 3 race comb., 26 = Other 4 or 5 race comb.)",
    "marital status (1 = Married civilian, 2 = Married AF, 3 = Married absent, 4 = Widowed, 5 = Divorced, 6 = Separated, 7 = Never married)",
    "= 1, if High School Graduate",
    "= 1, if Some College Education",
    "= 1, if College Graduate",
    "household region (1 = Northeast, 2 = Midwest, 3 = South, 4 = West)",
    "= 1 if female",
    "= 1 if Hispanic, Spanish, or Latino",
    "= 1 if Fulltime worker"
  )
)

List of main variables with definitions

This is a list of the main variables used in this project with their definitions.

Variable Definition
age years; capped at 85
earnings age > 21
hours age < 63
race respondent’s race (1 = White only, 2 = Black only, 3 = AI only, 4 = Asian only, 5 = Hawaiian/Pacific Islander only (HP), 6 = White-Black, 7 = White-AI, 8 = White-Asian, 9 = White-HP, 11 = Black-Asian, 12 = Black-HP, 13 = AI-Asian, 14 = AI-HP, 15 = Asian-HP, 16 = White-Black-AI, 17 = White-Black-Asian, 18 = White-Black-HP, 19 = White-AI-Asian, 20 = White-AI-HP, 21 = White-Asian-HP, 22 = Black-AI-Asian, 23 = White-Black-AI-Asian, 24 = White-AI-Asian-HP, 25 = White-Black-AI-Asian-HP, 25 = Other 3 race comb., 26 = Other 4 or 5 race comb.)
marital marital status (1 = Married civilian, 2 = Married AF, 3 = Married absent, 4 = Widowed, 5 = Divorced, 6 = Separated, 7 = Never married)
HSGrad = 1, if High School Graduate
SomeColl = 1, if Some College Education
CollDeg = 1, if College Graduate
region household region (1 = Northeast, 2 = Midwest, 3 = South, 4 = West)
female = 1 if female
hisp = 1 if Hispanic, Spanish, or Latino
fulltime = 1 if Fulltime worker