Introduction

Being a woman who is about to enter the workforce, pay parity is of the utmost importance. Using the U.S. Department of Labor and the U.S. Department of Labor Women’s Bureau data, we can analyze gender participation in the workforce and pay disparity. We used to live in a more patriarchal world, where men were at work, and women were tending to the children and the home. Now, that is not the case.

Over the past several decades, female representation in the workforce has steadily increased. According to the Bureau of Labor Statistics, women now comprise 56.8% of the labor force. Despite this significant increase in participation, wage disparity between men and women remains an ongoing issue.

It would be expected that when women initially entered the workforce, their median salaries would be lower than men’s. This is due to the fact that, at the time, men had more work experience and could apply for and hold more senior positions. But now that women are nearly fully integrated into the workforce, the question remains: has the differential leveled out?

knitr::opts_knit$set(root.dir = "/Users/ruthiemaurer/Desktop/DATA 710/DATA 710 Assignment 2")
library(readxl)
library(dplyr)
library(ggplot2)
library(MASS)       
library(clarify)    
library(texreg)     
library(tidyr)
library(purrr)
library(betareg)
library(modelsummary)

Data Preparation and Descriptive Analysis

labor_force <- read_excel("/Users/ruthiemaurer/Desktop/DATA 710/DATA 710 Assignment 2/Number in Civilian Labor Force.xlsx")
earnings <- read_excel("/Users/ruthiemaurer/Desktop/DATA 710/DATA 710 Assignment 2/Earnings Disparity Sex Data 2.xlsx")

labor_force$Percentage_Difference <- ((labor_force$Men - labor_force$Women) / labor_force$Women) * 100

print(labor_force)

## # A tibble: 75 × 4
##    Year  Women   Men Percentage_Difference
##    <chr> <dbl> <dbl>                 <dbl>
##  1 1948  17335 43286                  150.
##  2 1949  17788 43498                  145.
##  3 1950  18389 43819                  138.
##  4 1951  19016 43001                  126.
##  5 1952  19269 42869                  122.
##  6 1953  19382 43633                  125.
##  7 1954  19678 43965                  123.
##  8 1955  20548 44475                  116.
##  9 1956  21461 45091                  110.
## 10 1957  21732 45197                  108.
## # ℹ 65 more rows

labor_force_every_5_years <- labor_force %>%
mutate(Year = as.numeric(Year)) %>%
filter(Year %% 5 == 0)

print(labor_force_every_5_years)

## # A tibble: 15 × 4
##     Year Women   Men Percentage_Difference
##    <dbl> <dbl> <dbl>                 <dbl>
##  1  1950 18389 43819                 138. 
##  2  1955 20548 44475                 116. 
##  3  1960 23240 46388                  99.6
##  4  1965 26200 48255                  84.2
##  5  1970 31543 51228                  62.4
##  6  1975 37475 56299                  50.2
##  7  1980 45487 61453                  35.1
##  8  1985 51050 64411                  26.2
##  9  1990 56829 69011                  21.4
## 10  1995 60944 71360                  17.1
## 11  2000 66303 76280                  15.0
## 12  2005 69288 80033                  15.5
## 13  2010 71904 81985                  14.0
## 14  2015 73510 83620                  13.8
## 15  2020 75538 85204                  12.8

ggplot(labor_force_every_5_years, aes(x = Year, y = Women, group = 1)) + 
  
geom_point(color = "pink", size = 2) +
labs(title = "Women in the Workforce", 
subtitle = "Trends over the years",
x = "Year", y = "Number of Women in Workforce") +
theme_minimal(base_size = 15) +
theme(plot.title = element_text(face = "bold"),
axis.text.x = element_text(angle = 50, hjust = 1, size = 5.5))

The labor force data visualizations show a clear and consistent rise in the number of women in the workforce since the 1950s. On average, 48,550 women were in the workforce every five years. The numbers ranged from 18,389 at the lowest to 75,538 at the highest.

Despite this growth, the average percentage difference in participation between men and women remained at 48.08%, which shows that while women have entered the workforce in large numbers, men still make up a higher proportion.

Summary Statistics: Labor and Earnings

colnames(earnings)

## [1] "State"                   "Data Type"              
## [3] "Average Weekly Earnings" "Number of Workers"      
## [5] "Earnings Disparity"      "Employed Percent"

earnings <- earnings %>%
  rename(
    "Gender" = "Data Type",   
    "Disparity Per Dollar" = "Earnings Disparity"  
  )

print(earnings)

## # A tibble: 104 × 6
##    State    Gender `Average Weekly Earnings` `Number of Workers`
##    <chr>    <chr>                      <dbl>               <dbl>
##  1 NATIONAL Male                       1094.           82519194.
##  2 NATIONAL Female                      836.           73023354.
##  3 AK       Male                       1130.             175260.
##  4 AK       Female                      900.             156792.
##  5 AL       Male                       1011.            1129482.
##  6 AL       Female                      739.             995575.
##  7 AR       Male                        941.             677494.
##  8 AR       Female                      728.             629512.
##  9 AZ       Male                       1039.            1747963.
## 10 AZ       Female                      782.            1504803.
## # ℹ 94 more rows
## # ℹ 2 more variables: `Disparity Per Dollar` <dbl>, `Employed Percent` <dbl>

earnings <- earnings %>%
  mutate_if(is.numeric, ~ round(., 2))

print(earnings)

## # A tibble: 104 × 6
##    State    Gender `Average Weekly Earnings` `Number of Workers`
##    <chr>    <chr>                      <dbl>               <dbl>
##  1 NATIONAL Male                       1094.           82519194.
##  2 NATIONAL Female                      836.           73023354.
##  3 AK       Male                       1130.             175260.
##  4 AK       Female                      900.             156792.
##  5 AL       Male                       1011.            1129482.
##  6 AL       Female                      739.             995575.
##  7 AR       Male                        941.             677494.
##  8 AR       Female                      728.             629512.
##  9 AZ       Male                       1039.            1747963.
## 10 AZ       Female                      782.            1504803.
## # ℹ 94 more rows
## # ℹ 2 more variables: `Disparity Per Dollar` <dbl>, `Employed Percent` <dbl>

When examining workforce parity, the number of women has increased, but a gap still exists between male and female participation rates. This finding may be noteworthy, but it does not explain the pay inequity.

In terms of earnings disparity, the data below provides a summary of the differences in pay by calculating the average, median, minimum, maximum disparity per dollar, and the standard deviation. These statistics paint a troubling picture: despite near parity in workforce numbers, the average pay disparity suggests that women continue to earn less per dollar earned than men for comparable work.

earnings_summary <- earnings %>%
summarize(
    Mean_Disparity = round(mean(`Disparity Per Dollar`, na.rm = TRUE), 2),
    Median_Disparity = round(median(`Disparity Per Dollar`, na.rm = TRUE), 2),
    Min_Disparity = round(min(`Disparity Per Dollar`, na.rm = TRUE), 2),
    Max_Disparity = round(max(`Disparity Per Dollar`, na.rm = TRUE), 2),
    SD_Disparity = round(sd(`Disparity Per Dollar`, na.rm = TRUE), 2)
  )

datasummary_df(earnings_summary, title = "Earnings Disparity Summary")

Earnings Disparity Summary
Mean_Disparity	Median_Disparity	Min_Disparity	Max_Disparity	SD_Disparity
0.88	0.94	0.64	1.00	0.13

women_workforce_summary <- labor_force_every_5_years %>%
  summarize(
    Avg_Women_in_Workforce = round(mean(Women, na.rm = TRUE), 0),
    Min_Women_in_Workforce = min(Women, na.rm = TRUE),
    Max_Women_in_Workforce = max(Women, na.rm = TRUE)
  )

earnings_summary <- earnings %>%
  summarize(
    Avg_Disparity_Per_Dollar = round(mean(`Disparity Per Dollar`, na.rm = TRUE), 2),
    Min_Disparity_Per_Dollar = min(`Disparity Per Dollar`, na.rm = TRUE),
    Max_Disparity_Per_Dollar = max(`Disparity Per Dollar`, na.rm = TRUE)
  )

comparison_table <- cbind(women_workforce_summary, earnings_summary)

colnames(comparison_table) <- c(
  "Avg Women in Workforce", 
  "Min Women in Workforce", 
  "Max Women in Workforce",
  "Avg Disparity ($)", 
  "Min Disparity ($)", 
  "Max Disparity ($)"
)

datasummary_df(comparison_table, title = "Comparison of Women's Workforce Participation and Pay Disparity")

Comparison of Women's Workforce Participation and Pay Disparity
Avg Women in Workforce	Min Women in Workforce	Max Women in Workforce	Avg Disparity ($)	Min Disparity ($)	Max Disparity ($)
48550.00	18389.00	75538.00	0.88	0.64	1.00

A comparative summary table further emphasizes this by contrasting average female and male workforce participation with the average earnings gap, highlighting the fact that higher participation alone does not resolve pay disparities.

Visualizing Workforce and Earnings Disparity

library(gridExtra)

plot_workforce <- ggplot(labor_force_every_5_years, aes(x = Year, y = Women, Men)) + 
 geom_line(aes(x = Year, y = Men, color = "Men's Earnings"), color = "lightblue", size = 1.2) +
  geom_point(aes(x = Year, y = Men), color = "blue", size = 3) +
  geom_line(aes(x = Year, y = Women, color = "Women’s Earnings"), color = "pink", size = 1.2) +
  geom_point(aes(x = Year, y = Women), color = "hotpink", size = 3) +
  labs(
    title = "Earnings Over the Years",
    subtitle = "Earnings trend at five-year increments",
    x = "Year",
    y = "Earnings"
  ) +
  scale_color_manual(values = c("Men's Earnings" = "blue", "Women’s Earnings" = "pink")) +
  theme_minimal(base_size = 8) +
  theme(
    plot.title = element_text(face = "bold"),
    legend.title = element_blank(),
    axis.text.x = element_text(angle = 50, hjust = 1)
  )

plot_disparity <- ggplot(earnings, aes(x = `Disparity Per Dollar`)) + 
geom_histogram(binwidth = 0.05, fill = "lightblue", color = "black", alpha = 0.7) +
labs(title = "Earnings Disparity Across States", x = "Earnings Disparity (Dollars)", y = "Number of States") +
theme_minimal(base_size = 8)

grid.arrange(plot_workforce, plot_disparity, ncol = 2)

So far, this analysis supports the claim that while women have achieved greater participation in the workforce, a wage gap remains. States are trying to combat this; they have put into effect certain requirements for employers to follow. For example, certain states now require wage ranges to be shown on job offerings – salary transparency. The Women’s Policy Research also adds that by the year 2059, they predict we will have pay parity due to many of these efforts. Addressing this inequity requires targeted policies aimed at wage transparency, pay equity, and sector-specific interventions to ensure that women not only participate in the workforce but are also compensated equitably. The persistent gap, shown in the analysis below, indicates that without deliberate action, increased participation alone will not close the wage gap.

labor_force_summary <- labor_force %>%
  summarize(Avg_Percentage_Difference = mean(Percentage_Difference, na.rm = TRUE))

earnings_summary <- earnings %>%
  summarize(Avg_Disparity_Per_Dollar = mean(`Disparity Per Dollar`, na.rm = TRUE))

summary_data <- data.frame(
  Metric = c("Avg Percentage Difference in Workforce", "Avg Pay Disparity Per Dollar"),
  Value = c(labor_force_summary$Avg_Percentage_Difference, earnings_summary$Avg_Disparity_Per_Dollar)
)

datasummary_df(
  summary_data,
  title = "Summary of Average Workforce and Pay Disparity Metrics"
)

Summary of Average Workforce and Pay Disparity Metrics
Metric	Value
Avg Percentage Difference in Workforce	48.08
Avg Pay Disparity Per Dollar	0.88

earnings_clean <- earnings %>%
  filter(Gender %in% c("Male", "Female")) %>%
  mutate(
    `Disparity Per Dollar` = ifelse(`Disparity Per Dollar` >= 1, 0.999, `Disparity Per Dollar`),
    `Disparity Per Dollar` = ifelse(`Disparity Per Dollar` <= 0, 0.001, `Disparity Per Dollar`),
    Gender = factor(Gender)
  ) %>%
  rename(Number_of_Workers = `Number of Workers`)

raw_gender_gap <- earnings_clean %>%
  group_by(Gender) %>%
  summarize(
    Mean_Disparity = round(mean(`Disparity Per Dollar`, na.rm = TRUE), 3)
  )

print(raw_gender_gap)

## # A tibble: 2 × 2
##   Gender Mean_Disparity
##   <fct>           <dbl>
## 1 Female          0.755
## 2 Male            0.999

# Calculate the raw difference (Female - Male)
raw_diff <- diff(raw_gender_gap$Mean_Disparity)
print(paste("Women earn", round(-raw_diff, 3), "less per dollar than men on average."))

## [1] "Women earn -0.244 less per dollar than men on average."

Beta Regression: Modeling Gender Pay Disparity

earnings_clean <- earnings %>%
  filter(Gender %in% c("Male", "Female")) %>%
  mutate(
    `Disparity Per Dollar` = ifelse(`Disparity Per Dollar` >= 1, 0.999, `Disparity Per Dollar`),
    `Disparity Per Dollar` = ifelse(`Disparity Per Dollar` <= 0, 0.001, `Disparity Per Dollar`),
    Gender = factor(Gender)
  ) %>%
  rename(Number_of_Workers = `Number of Workers`)

beta_model1 <- betareg(`Disparity Per Dollar` ~ Gender, data = earnings_clean)
modelsummary(beta_model1, title = "Beta Regression Results", gof_omit = NULL
)

Beta Regression Results
	(1)
(Intercept)	1.132
	(0.021)
GenderMale	4.802
	(0.146)
Num.Obs.	104
R2	0.998
AIC	-741.7
BIC	-733.8
RMSE	0.03

beta_model2 <- betareg(`Disparity Per Dollar` ~ Gender + Number_of_Workers, data = earnings_clean)
modelsummary(beta_model2, title = "Beta Regression Results", gof_omit = NULL
)

Beta Regression Results
	(1)
(Intercept)	1.130
	(0.022)
GenderMale	4.803
	(0.146)
Number_of_Workers	0.000
	(0.000)
Num.Obs.	104
R2	0.998
AIC	-739.8
BIC	-729.3
RMSE	0.03

To model the relationship between gender and pay disparity, I used a beta regression. The coefficient for GenderMale is 4.803, indicating that being male is strongly associated with a higher predicted earnings ratio. This tells us that, even without accounting for other variables, women are still predicted to earn less than men, on average.

I did an additional beta regression model, adding the variable Number_of_Workers to see whether workforce size influenced the earnings gap. However, the number of workers did not substantially influence the disparity (coefficient ~ 0.000), which suggests that gender alone plays a major role in predicting earnings per dollar.

sim_beta <- clarify::sim(beta_model1, n = 1000)
cf <- data.frame(
  Gender = factor(c("Male", "Female"), levels = levels(earnings_clean$Gender)),
  Number_of_Workers = mean(earnings_clean$Number_of_Workers, na.rm = TRUE)
)
cf_pred <- clarify::sim_apply(sim_beta, newdata = cf, FUN = predict)

pred_summary_df <- data.frame(
  estimate = apply(cf_pred, 1, mean),
  conf.low = apply(cf_pred, 1, quantile, probs = 0.025),
  conf.high = apply(cf_pred, 1, quantile, probs = 0.975),
  Gender = cf$Gender
)

datasummary_df(head(pred_summary_df, title = "Predicted Earnings Disparity by Gender (with 95% CI)"))

estimate	conf.low	conf.high	Gender
0.87	0.76	0.99	Male
0.88	0.77	0.99	Female
0.88	0.76	0.99	Male
0.87	0.76	0.99	Female
0.88	0.77	0.99	Male
0.87	0.76	0.99	Female

# Arrange by Gender (optional)
pred_summary_df <- pred_summary_df %>%
  arrange(Gender)

# Calculate predicted difference (Female - Male)
pred_diff <- pred_summary_df$estimate[pred_summary_df$Gender == "Female"] - 
             pred_summary_df$estimate[pred_summary_df$Gender == "Male"]

# Round and extract head and tail
head_diff <- head(round(pred_diff, 3))
tail_diff <- tail(round(pred_diff, 3))

print(paste("Predicted difference (Female - Male):", head_diff))

## [1] "Predicted difference (Female - Male): 0.005" 
## [2] "Predicted difference (Female - Male): -0.004"
## [3] "Predicted difference (Female - Male): -0.003"
## [4] "Predicted difference (Female - Male): -0.007"
## [5] "Predicted difference (Female - Male): 0"     
## [6] "Predicted difference (Female - Male): 0.003"

print(paste("Predicted difference (Female - Male):", tail_diff))

## [1] "Predicted difference (Female - Male): 0.002" 
## [2] "Predicted difference (Female - Male): 0.008" 
## [3] "Predicted difference (Female - Male): 0.005" 
## [4] "Predicted difference (Female - Male): -0.004"
## [5] "Predicted difference (Female - Male): 0"     
## [6] "Predicted difference (Female - Male): -0.003"

# Summarize the 500 simulated differences
pred_diff_distribution <- data.frame(diff = pred_diff)

sim_summary <- pred_diff_distribution %>%
  summarize(
    Mean_Diff = round(mean(diff), 3),
    Lower_95_CI = round(quantile(diff, 0.025), 3),
    Upper_95_CI = round(quantile(diff, 0.975), 3)
  )

print(sim_summary)

##   Mean_Diff Lower_95_CI Upper_95_CI
## 1         0      -0.005       0.005

ggplot(pred_diff_distribution, aes(x = diff)) +
  geom_histogram(fill = "steelblue", color = "white", bins = 30) +
  geom_vline(aes(xintercept = mean(diff)), linetype = "dashed", color = "black") +
  labs(
    title = "Distribution of Simulated Predicted Differences (Female - Male)",
    x = "Predicted Difference",
    y = "Count"
  ) +
  theme_minimal()

When computing the simulated difference in predicted disparities (female minus male), the results showed:

Mean difference: ~ -0.001
95% Confidence Interval: -0.005, 0.006

The distribution is centered very close to zero, and the confidence interval crosses zero. This implies that while the direction of the difference favors men, the evidence for a statistically significant difference is weak in this model — but the pattern remains consistent with prior concerns about inequality.

Conclusion

Before running the beta regression model, I examined the raw data and found that, on average, women earned $0.244 less per dollar than men. This simple comparison highlighted a substantial gap in earnings that exists without adjusting for any other factors. To better understand the earnings gap between men and women, I used a beta regression model. This model examines the relationship between gender and earnings disparity per dollar without accounting for additional variables. The model results indicate that gender is an important predictor, with women earning slightly less than men on average. I also ran a version of the model that included the variable Number_of_Workers to test whether workforce size influenced the disparity, but it did not — further emphasizing that gender is the primary factor driving the difference in earnings.

I used the clarify package to simulate 1,000 predictions and compare expected earnings disparities by gender. The predicted earnings disparity for women was approximately $0.87 per dollar, compared to $0.88 per dollar for men. While the predicted difference was small (around 1 cent) and not statistically significant at the 95% confidence level, it consistently favored men.

These findings suggest that gender-based pay inequality persists, even when not accounting for other variables. While the wage gap has narrowed, it has not fully closed. Simply increasing the number of women in the workforce is not enough — pay equity requires deliberate and sustained action through policy, transparency, and cultural change.

References

(Research 2020)

(“Labor Force Participation Rate by Sex, State and County,” n.d.)

(“Median Annual Earnings by Sex, Race and Hispanic Ethnicity,” n.d.)

“Labor Force Participation Rate by Sex, State and County.” n.d. https://www.dol.gov/agencies/wb/data/lfp-rate-sex-state-county.

“Median Annual Earnings by Sex, Race and Hispanic Ethnicity.” n.d. https://www.dol.gov/agencies/wb/data/earnings/median-annual-sex-race-hispanic-ethnicity.

Research, Institute for Women’s Policy. 2020. “The Gender Wage Gap: 2019: Earnings Differences by Race and Ethnicity.” https://www.jstor.org/stable/resrep27242.

DATA 712 HW#8