DATA 712 HW#7

Introduction

Being a woman who is about to enter the workforce, pay parity is of the utmost importance. Using the U.S. Department of Labor and the U.S. Department of Labor Women’s Bureau data, we can analyze gender participation in the workforce and pay disparity. We used to live in a more patriarchal world, where the men were at work, and the women were tending to the children and the home. Now, that is not the case. Over the past recent decades, female representation in the workforce has steadily increased, with women now comprising 56.8% of the labor force, according to the Bureau of Labor Statistics. Despite this significant increase in participation, wage disparity between men and women remains an ongoing issue. It would be expected that when women entered the workforce, their median salaries would be lower than men. This is due to the fact that at the time women entered the workforce, men had more work experience and were therefore able to apply for and hold more senior positions than their female counterparts. Now that women are nearly fully integrated into the workforce has the differential leveled out?

knitr::opts_knit$set(root.dir = "/Users/ruthiemaurer/Desktop/DATA 710/DATA 710 Assignment 2")
library(readxl)
library(dplyr)
library(ggplot2)
library(MASS)       
library(clarify)    
library(texreg)     
library(tidyr)
library(purrr)
library(betareg)

Trends in Workforce Participation

labor_force <- read_excel("/Users/ruthiemaurer/Desktop/DATA 710/DATA 710 Assignment 2/Number in Civilian Labor Force.xlsx")
earnings <- read_excel("/Users/ruthiemaurer/Desktop/DATA 710/DATA 710 Assignment 2/Earnings Disparity Sex Data 2.xlsx")

labor_force$Percentage_Difference <- ((labor_force$Men - labor_force$Women) / labor_force$Women) * 100

print(labor_force)

## # A tibble: 75 × 4
##    Year  Women   Men Percentage_Difference
##    <chr> <dbl> <dbl>                 <dbl>
##  1 1948  17335 43286                  150.
##  2 1949  17788 43498                  145.
##  3 1950  18389 43819                  138.
##  4 1951  19016 43001                  126.
##  5 1952  19269 42869                  122.
##  6 1953  19382 43633                  125.
##  7 1954  19678 43965                  123.
##  8 1955  20548 44475                  116.
##  9 1956  21461 45091                  110.
## 10 1957  21732 45197                  108.
## # ℹ 65 more rows

labor_force_every_5_years <- labor_force %>%
mutate(Year = as.numeric(Year)) %>%
filter(Year %% 5 == 0)

print(labor_force_every_5_years)

## # A tibble: 15 × 4
##     Year Women   Men Percentage_Difference
##    <dbl> <dbl> <dbl>                 <dbl>
##  1  1950 18389 43819                 138. 
##  2  1955 20548 44475                 116. 
##  3  1960 23240 46388                  99.6
##  4  1965 26200 48255                  84.2
##  5  1970 31543 51228                  62.4
##  6  1975 37475 56299                  50.2
##  7  1980 45487 61453                  35.1
##  8  1985 51050 64411                  26.2
##  9  1990 56829 69011                  21.4
## 10  1995 60944 71360                  17.1
## 11  2000 66303 76280                  15.0
## 12  2005 69288 80033                  15.5
## 13  2010 71904 81985                  14.0
## 14  2015 73510 83620                  13.8
## 15  2020 75538 85204                  12.8

ggplot(labor_force_every_5_years, aes(x = Year, y = Women, group = 1)) + 
  
geom_point(color = "pink", size = 2) +
labs(title = "Women in the Workforce", 
subtitle = "Trends over the years",
x = "Year", y = "Number of Women in Workforce") +
theme_minimal(base_size = 15) +
theme(plot.title = element_text(face = "bold"),
axis.text.x = element_text(angle = 50, hjust = 1, size = 5.5))

The results from the labor force dataset highlight the upward trend in female workforce participation, evidenced by the clear and consistent rise in the number of women employed over time.

Summary Statistics: Labor and Earnigns

colnames(earnings)

## [1] "State"                   "Data Type"              
## [3] "Average Weekly Earnings" "Number of Workers"      
## [5] "Earnings Disparity"      "Employed Percent"

earnings <- earnings %>%
  rename(
    "Gender" = "Data Type",   
    "Disparity Per Dollar" = "Earnings Disparity"  
  )

print(earnings)

## # A tibble: 104 × 6
##    State    Gender `Average Weekly Earnings` `Number of Workers`
##    <chr>    <chr>                      <dbl>               <dbl>
##  1 NATIONAL Male                       1094.           82519194.
##  2 NATIONAL Female                      836.           73023354.
##  3 AK       Male                       1130.             175260.
##  4 AK       Female                      900.             156792.
##  5 AL       Male                       1011.            1129482.
##  6 AL       Female                      739.             995575.
##  7 AR       Male                        941.             677494.
##  8 AR       Female                      728.             629512.
##  9 AZ       Male                       1039.            1747963.
## 10 AZ       Female                      782.            1504803.
## # ℹ 94 more rows
## # ℹ 2 more variables: `Disparity Per Dollar` <dbl>, `Employed Percent` <dbl>

earnings <- earnings %>%
  mutate_if(is.numeric, ~ round(., 2))

print(earnings)

## # A tibble: 104 × 6
##    State    Gender `Average Weekly Earnings` `Number of Workers`
##    <chr>    <chr>                      <dbl>               <dbl>
##  1 NATIONAL Male                       1094.           82519194.
##  2 NATIONAL Female                      836.           73023354.
##  3 AK       Male                       1130.             175260.
##  4 AK       Female                      900.             156792.
##  5 AL       Male                       1011.            1129482.
##  6 AL       Female                      739.             995575.
##  7 AR       Male                        941.             677494.
##  8 AR       Female                      728.             629512.
##  9 AZ       Male                       1039.            1747963.
## 10 AZ       Female                      782.            1504803.
## # ℹ 94 more rows
## # ℹ 2 more variables: `Disparity Per Dollar` <dbl>, `Employed Percent` <dbl>

When examining workforce parity, the number of women has increased, but a gap still exists between male and female participation rates. This finding may be noteworthy, but it does not explain the pay inequity.

In terms of earnings disparity, the data below provides a summary of the differences in pay by calculating the average, median, minimum, and maximum disparity per dollar. These statistics paint a troubling picture: despite near parity in workforce numbers, the average pay disparity suggests that women continue to earn less per dollar earned than men for comparable work.

earnings_summary <- earnings %>%
summarize(
    Mean_Disparity = round(mean(`Disparity Per Dollar`, na.rm = TRUE), 2),
    Median_Disparity = round(median(`Disparity Per Dollar`, na.rm = TRUE), 2),
    Min_Disparity = round(min(`Disparity Per Dollar`, na.rm = TRUE), 2),
    Max_Disparity = round(max(`Disparity Per Dollar`, na.rm = TRUE), 2),
    SD_Disparity = round(sd(`Disparity Per Dollar`, na.rm = TRUE), 2)
  )

library(kableExtra)

kable(earnings_summary, caption = "Summary of Earnings Disparity Metrics") %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"), full_width = FALSE)

Summary of Earnings Disparity Metrics
Mean_Disparity	Median_Disparity	Min_Disparity	Max_Disparity	SD_Disparity
0.88	0.94	0.64	1	0.13

women_workforce_summary <- labor_force_every_5_years %>%
summarize(
Avg_Women_in_Workforce = round(mean(Women, na.rm = TRUE), 0),
Min_Women_in_Workforce = min(Women, na.rm = TRUE),
Max_Women_in_Workforce = max(Women, na.rm = TRUE)
)

earnings_summary <- earnings %>%
summarize(
Avg_Disparity_Per_Dollar = round(mean(`Disparity Per Dollar`, na.rm = TRUE), 2),
Min_Disparity_Per_Dollar = min(`Disparity Per Dollar`, na.rm = TRUE),
Max_Disparity_Per_Dollar = max(`Disparity Per Dollar`, na.rm = TRUE)
)

comparison_table <- cbind(women_workforce_summary, earnings_summary)

kable(comparison_table, 
      caption = "Comparison of Women's Workforce Participation and Pay Disparity") %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"), 
                full_width = FALSE, 
                position = "center")

Comparison of Women’s Workforce Participation and Pay Disparity
Avg_Women_in_Workforce	Min_Women_in_Workforce	Max_Women_in_Workforce	Avg_Disparity_Per_Dollar	Min_Disparity_Per_Dollar	Max_Disparity_Per_Dollar
48550	18389	75538	0.88	0.64	1

A comparative summary table further emphasizes this by contrasting average female and male workforce participation with the average earnings gap, highlighting the fact that higher participation alone does not resolve pay disparities.

Visualizing Workforce and Earnings Disparity

library(gridExtra)

plot_workforce <- ggplot(labor_force_every_5_years, aes(x = Year, y = Women, Men)) + 
 geom_line(aes(x = Year, y = Men, color = "Men's Earnings"), color = "lightblue", size = 1.2) +
  geom_point(aes(x = Year, y = Men), color = "blue", size = 3) +
  geom_line(aes(x = Year, y = Women, color = "Women’s Earnings"), color = "pink", size = 1.2) +
  geom_point(aes(x = Year, y = Women), color = "hotpink", size = 3) +
  labs(
    title = "Earnings Over the Years",
    subtitle = "Earnings trend at five-year increments",
    x = "Year",
    y = "Earnings"
  ) +
  scale_color_manual(values = c("Men's Earnings" = "blue", "Women’s Earnings" = "pink")) +
  theme_minimal(base_size = 8) +
  theme(
    plot.title = element_text(face = "bold"),
    legend.title = element_blank(),
    axis.text.x = element_text(angle = 50, hjust = 1)
  )

plot_disparity <- ggplot(earnings, aes(x = `Disparity Per Dollar`)) + 
geom_histogram(binwidth = 0.05, fill = "lightblue", color = "black", alpha = 0.7) +
labs(title = "Earnings Disparity Across States", x = "Earnings Disparity (Dollars)", y = "Number of States") +
theme_minimal(base_size = 8)

grid.arrange(plot_workforce, plot_disparity, ncol = 2)

So far, this analysis supports the claim that while women have achieved greater participation in the workforce, a wage gap remains. States are trying to combat this; they have put into effect certain requirements for employers to follow. For example, certain states now require wage ranges to be shown on job offerings – salary transparency. The Women’s Policy Research also adds that by the year 2058, they predict we will have pay parity due to many of these efforts. Addressing this inequity requires targeted policies aimed at wage transparency, pay equity, and sector-specific interventions to ensure that women not only participate in the workforce but are also compensated equitably. The persistent gap, shown in the analysis below, indicates that without deliberate action, increased participation alone will not close the wage gap.

labor_force_summary <- labor_force %>%
summarize(Avg_Percentage_Difference = mean(Percentage_Difference, na.rm = TRUE))

earnings_summary <- earnings %>%
summarize(Avg_Disparity_Per_Dollar = mean(`Disparity Per Dollar`, na.rm = TRUE))


summary_data <- data.frame(
Metric = c("Avg Percentage Difference in Workforce", "Avg Pay Disparity Per Dollar"),
Value = c(labor_force_summary$Avg_Percentage_Difference, earnings_summary$Avg_Disparity_Per_Dollar)
)

kable(summary_data, 
      caption = "Summary of Average Workforce and Pay Disparity Metrics") %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"),
                full_width = FALSE,
                position = "center")

Summary of Average Workforce and Pay Disparity Metrics
Metric	Value
Avg Percentage Difference in Workforce	48.0804671
Avg Pay Disparity Per Dollar	0.8776923

Beta Regression: Modeling Gender Pay Disparity

earnings_clean <- earnings %>%
  filter(Gender %in% c("Male", "Female")) %>%
  mutate(
    `Disparity Per Dollar` = ifelse(`Disparity Per Dollar` >= 1, 0.999, `Disparity Per Dollar`),
    `Disparity Per Dollar` = ifelse(`Disparity Per Dollar` <= 0, 0.001, `Disparity Per Dollar`),
    Gender = factor(Gender)
  ) %>%
  rename(Number_of_Workers = `Number of Workers`)

beta_model <- betareg(`Disparity Per Dollar` ~ Gender + Number_of_Workers, data = earnings_clean)
summary(beta_model)

## 
## Call:
## betareg(formula = `Disparity Per Dollar` ~ Gender + Number_of_Workers, 
##     data = earnings_clean)
## 
## Quantile residuals:
##     Min      1Q  Median      3Q     Max 
## -3.8683  0.0249  0.2220  0.2231  4.7335 
## 
## Coefficients (mean model with logit link):
##                    Estimate Std. Error z value Pr(>|z|)    
## (Intercept)       1.130e+00  2.163e-02  52.249   <2e-16 ***
## GenderMale        4.803e+00  1.455e-01  33.003   <2e-16 ***
## Number_of_Workers 7.410e-10  2.080e-09   0.356    0.722    
## 
## Phi coefficients (precision model with identity link):
##       Estimate Std. Error z value Pr(>|z|)    
## (phi)   238.76      36.22   6.592 4.35e-11 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 
## 
## Type of estimator: ML (maximum likelihood)
## Log-likelihood: 373.9 on 4 Df
## Pseudo R-squared: 0.9975
## Number of iterations: 449 (BFGS) + 3 (Fisher scoring)

beta_sim <- clarify::sim(beta_model, n = 1000)
betas_mat <- beta_sim$sim.coefs[, c("(Intercept)", "GenderMale", "Number_of_Workers")]

# --- Create Counterfactuals ---
cf <- data.frame(
  Gender = factor(c("Male", "Female"), levels = levels(earnings_clean$Gender)),
  Number_of_Workers = mean(earnings_clean$Number_of_Workers, na.rm = TRUE)
)

# --- Design Matrix and Prediction ---
X <- model.matrix(~ Gender + Number_of_Workers, data = cf)
linpred <- X %*% t(betas_mat)

# --- Inverse Logit to Convert to Mean Disparity Scale ---
inv_logit <- function(x) exp(x) / (1 + exp(x))
predicted_vals <- apply(linpred, 2, inv_logit)

# --- Summarize Simulation Results ---
pred_summary <- apply(predicted_vals, 1, function(x) c(
  mean = mean(x),
  lower = quantile(x, 0.025),
  upper = quantile(x, 0.975)
))

pred_summary_df <- as.data.frame(t(pred_summary))
pred_summary_df$Gender <- cf$Gender
colnames(pred_summary_df) <- c("Mean", "Lower", "Upper", "Gender")

kable(pred_summary_df, 
      caption = "Predicted Earnings Disparity by Gender (with 95% CI)") %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"),
                full_width = FALSE,
                position = "center")

Predicted Earnings Disparity by Gender (with 95% CI)
Mean	Lower	Upper	Gender
0.9973376	0.9965318	0.9979850	Male
0.7561399	0.7487745	0.7631532	Female

ggplot(pred_summary_df, aes(x = Gender, y = Mean, fill = Gender)) +
  geom_bar(stat = "identity", position = "dodge", width = 0.6) +
  geom_errorbar(aes(ymin = Lower, ymax = Upper), width = 0.2, position = position_dodge(0.6)) +
  scale_fill_manual(values = c("Male" = "skyblue", "Female" = "hotpink")) +
  labs(
    title = "Predicted Earnings Disparity by Gender",
    y = "Predicted Earnings Disparity (Mean)",
    x = "Gender"
  ) +
  scale_y_continuous(limits = c(0, 1)) +
  theme_minimal() +
  theme(
    text = element_text(size = 12),
    legend.position = "none"
  )

pred_summary_df <- pred_summary_df %>%
  arrange(Gender)

# Calculate difference (Female - Male)
pred_diff <- pred_summary_df$Mean[pred_summary_df$Gender == "Female"] - 
             pred_summary_df$Mean[pred_summary_df$Gender == "Male"]

print(paste("Predicted difference (Female - Male):", round(pred_diff, 3)))

## [1] "Predicted difference (Female - Male): -0.241"

Conclusion

To better understand the earnings gap between men and women, I used a beta regression model. This type of model is ideal when the outcome is a proportion between 0 and 1, like the earnings disparity per dollar in my data. The model helped me estimate how gender and the size of the workforce are related to pay disparity. I used the clarify package to simulate predictions from the model and compare expected earnings disparities for men and women, while keeping the number of workers the same. The results show that women are still predicted to earn less than men, with the model estimating an average disparity difference of approximately 0.241 dollars less per dollar earned. These findings suggest that pay inequality remains a problem, even though women’s participation in the workforce has increased. Simply having more women in the labor force is not enough to close the wage gap. Policies that focus on equal pay, transparency, and accountability are still needed.