DATA 712 HW#7

Introduction

Being a woman who is about to enter the workforce, pay parity is of the utmost importance. Using the U.S. Department of Labor and the U.S. Department of Labor Women’s Bureau data, we can analyze gender participation in the workforce and pay disparity. We used to live in a more patriarchal world, where the men were at work, and the women were tending to the children and the home. Now, that is not the case. Over the past recent decades, female representation in the workforce has steadily increased, with women now comprising 56.8% of the labor force, according to the Bureau of Labor Statistics. Despite this significant increase in participation, wage disparity between men and women remains an ongoing issue. It would be expected that when women entered the workforce, their median salaries would be lower than men. This is due to the fact that at the time women entered the workforce, men had more work experience and were therefore able to apply for and hold more senior positions than their female counterparts. Now that women are nearly fully integrated into the workforce has the differential leveled out?

knitr::opts_knit$set(root.dir = "/Users/ruthiemaurer/Desktop/DATA 710/DATA 710 Assignment 2")
library(readxl)
library(dplyr)
library(ggplot2)
library(MASS)       
library(clarify)    
library(texreg)     
library(tidyr)
library(purrr)
library(betareg)

Summary Statistics: Labor and Earnigns

colnames(earnings)
## [1] "State"                   "Data Type"              
## [3] "Average Weekly Earnings" "Number of Workers"      
## [5] "Earnings Disparity"      "Employed Percent"
earnings <- earnings %>%
  rename(
    "Gender" = "Data Type",   
    "Disparity Per Dollar" = "Earnings Disparity"  
  )

print(earnings)
## # A tibble: 104 × 6
##    State    Gender `Average Weekly Earnings` `Number of Workers`
##    <chr>    <chr>                      <dbl>               <dbl>
##  1 NATIONAL Male                       1094.           82519194.
##  2 NATIONAL Female                      836.           73023354.
##  3 AK       Male                       1130.             175260.
##  4 AK       Female                      900.             156792.
##  5 AL       Male                       1011.            1129482.
##  6 AL       Female                      739.             995575.
##  7 AR       Male                        941.             677494.
##  8 AR       Female                      728.             629512.
##  9 AZ       Male                       1039.            1747963.
## 10 AZ       Female                      782.            1504803.
## # ℹ 94 more rows
## # ℹ 2 more variables: `Disparity Per Dollar` <dbl>, `Employed Percent` <dbl>
earnings <- earnings %>%
  mutate_if(is.numeric, ~ round(., 2))

print(earnings)
## # A tibble: 104 × 6
##    State    Gender `Average Weekly Earnings` `Number of Workers`
##    <chr>    <chr>                      <dbl>               <dbl>
##  1 NATIONAL Male                       1094.           82519194.
##  2 NATIONAL Female                      836.           73023354.
##  3 AK       Male                       1130.             175260.
##  4 AK       Female                      900.             156792.
##  5 AL       Male                       1011.            1129482.
##  6 AL       Female                      739.             995575.
##  7 AR       Male                        941.             677494.
##  8 AR       Female                      728.             629512.
##  9 AZ       Male                       1039.            1747963.
## 10 AZ       Female                      782.            1504803.
## # ℹ 94 more rows
## # ℹ 2 more variables: `Disparity Per Dollar` <dbl>, `Employed Percent` <dbl>

When examining workforce parity, the number of women has increased, but a gap still exists between male and female participation rates. This finding may be noteworthy, but it does not explain the pay inequity.

In terms of earnings disparity, the data below provides a summary of the differences in pay by calculating the average, median, minimum, and maximum disparity per dollar. These statistics paint a troubling picture: despite near parity in workforce numbers, the average pay disparity suggests that women continue to earn less per dollar earned than men for comparable work.

earnings_summary <- earnings %>%
summarize(
    Mean_Disparity = round(mean(`Disparity Per Dollar`, na.rm = TRUE), 2),
    Median_Disparity = round(median(`Disparity Per Dollar`, na.rm = TRUE), 2),
    Min_Disparity = round(min(`Disparity Per Dollar`, na.rm = TRUE), 2),
    Max_Disparity = round(max(`Disparity Per Dollar`, na.rm = TRUE), 2),
    SD_Disparity = round(sd(`Disparity Per Dollar`, na.rm = TRUE), 2)
  )

library(kableExtra)

kable(earnings_summary, caption = "Summary of Earnings Disparity Metrics") %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"), full_width = FALSE)
Summary of Earnings Disparity Metrics
Mean_Disparity Median_Disparity Min_Disparity Max_Disparity SD_Disparity
0.88 0.94 0.64 1 0.13
women_workforce_summary <- labor_force_every_5_years %>%
summarize(
Avg_Women_in_Workforce = round(mean(Women, na.rm = TRUE), 0),
Min_Women_in_Workforce = min(Women, na.rm = TRUE),
Max_Women_in_Workforce = max(Women, na.rm = TRUE)
)

earnings_summary <- earnings %>%
summarize(
Avg_Disparity_Per_Dollar = round(mean(`Disparity Per Dollar`, na.rm = TRUE), 2),
Min_Disparity_Per_Dollar = min(`Disparity Per Dollar`, na.rm = TRUE),
Max_Disparity_Per_Dollar = max(`Disparity Per Dollar`, na.rm = TRUE)
)

comparison_table <- cbind(women_workforce_summary, earnings_summary)

kable(comparison_table, 
      caption = "Comparison of Women's Workforce Participation and Pay Disparity") %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"), 
                full_width = FALSE, 
                position = "center")
Comparison of Women’s Workforce Participation and Pay Disparity
Avg_Women_in_Workforce Min_Women_in_Workforce Max_Women_in_Workforce Avg_Disparity_Per_Dollar Min_Disparity_Per_Dollar Max_Disparity_Per_Dollar
48550 18389 75538 0.88 0.64 1

A comparative summary table further emphasizes this by contrasting average female and male workforce participation with the average earnings gap, highlighting the fact that higher participation alone does not resolve pay disparities.

Visualizing Workforce and Earnings Disparity

library(gridExtra)

plot_workforce <- ggplot(labor_force_every_5_years, aes(x = Year, y = Women, Men)) + 
 geom_line(aes(x = Year, y = Men, color = "Men's Earnings"), color = "lightblue", size = 1.2) +
  geom_point(aes(x = Year, y = Men), color = "blue", size = 3) +
  geom_line(aes(x = Year, y = Women, color = "Women’s Earnings"), color = "pink", size = 1.2) +
  geom_point(aes(x = Year, y = Women), color = "hotpink", size = 3) +
  labs(
    title = "Earnings Over the Years",
    subtitle = "Earnings trend at five-year increments",
    x = "Year",
    y = "Earnings"
  ) +
  scale_color_manual(values = c("Men's Earnings" = "blue", "Women’s Earnings" = "pink")) +
  theme_minimal(base_size = 8) +
  theme(
    plot.title = element_text(face = "bold"),
    legend.title = element_blank(),
    axis.text.x = element_text(angle = 50, hjust = 1)
  )

plot_disparity <- ggplot(earnings, aes(x = `Disparity Per Dollar`)) + 
geom_histogram(binwidth = 0.05, fill = "lightblue", color = "black", alpha = 0.7) +
labs(title = "Earnings Disparity Across States", x = "Earnings Disparity (Dollars)", y = "Number of States") +
theme_minimal(base_size = 8)

grid.arrange(plot_workforce, plot_disparity, ncol = 2)

So far, this analysis supports the claim that while women have achieved greater participation in the workforce, a wage gap remains. States are trying to combat this; they have put into effect certain requirements for employers to follow. For example, certain states now require wage ranges to be shown on job offerings – salary transparency. The Women’s Policy Research also adds that by the year 2058, they predict we will have pay parity due to many of these efforts. Addressing this inequity requires targeted policies aimed at wage transparency, pay equity, and sector-specific interventions to ensure that women not only participate in the workforce but are also compensated equitably. The persistent gap, shown in the analysis below, indicates that without deliberate action, increased participation alone will not close the wage gap.

labor_force_summary <- labor_force %>%
summarize(Avg_Percentage_Difference = mean(Percentage_Difference, na.rm = TRUE))

earnings_summary <- earnings %>%
summarize(Avg_Disparity_Per_Dollar = mean(`Disparity Per Dollar`, na.rm = TRUE))


summary_data <- data.frame(
Metric = c("Avg Percentage Difference in Workforce", "Avg Pay Disparity Per Dollar"),
Value = c(labor_force_summary$Avg_Percentage_Difference, earnings_summary$Avg_Disparity_Per_Dollar)
)

kable(summary_data, 
      caption = "Summary of Average Workforce and Pay Disparity Metrics") %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"),
                full_width = FALSE,
                position = "center")
Summary of Average Workforce and Pay Disparity Metrics
Metric Value
Avg Percentage Difference in Workforce 48.0804671
Avg Pay Disparity Per Dollar 0.8776923

Beta Regression: Modeling Gender Pay Disparity

earnings_clean <- earnings %>%
  filter(Gender %in% c("Male", "Female")) %>%
  mutate(
    `Disparity Per Dollar` = ifelse(`Disparity Per Dollar` >= 1, 0.999, `Disparity Per Dollar`),
    `Disparity Per Dollar` = ifelse(`Disparity Per Dollar` <= 0, 0.001, `Disparity Per Dollar`),
    Gender = factor(Gender)
  ) %>%
  rename(Number_of_Workers = `Number of Workers`)

beta_model <- betareg(`Disparity Per Dollar` ~ Gender + Number_of_Workers, data = earnings_clean)
summary(beta_model)
## 
## Call:
## betareg(formula = `Disparity Per Dollar` ~ Gender + Number_of_Workers, 
##     data = earnings_clean)
## 
## Quantile residuals:
##     Min      1Q  Median      3Q     Max 
## -3.8683  0.0249  0.2220  0.2231  4.7335 
## 
## Coefficients (mean model with logit link):
##                    Estimate Std. Error z value Pr(>|z|)    
## (Intercept)       1.130e+00  2.163e-02  52.249   <2e-16 ***
## GenderMale        4.803e+00  1.455e-01  33.003   <2e-16 ***
## Number_of_Workers 7.410e-10  2.080e-09   0.356    0.722    
## 
## Phi coefficients (precision model with identity link):
##       Estimate Std. Error z value Pr(>|z|)    
## (phi)   238.76      36.22   6.592 4.35e-11 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 
## 
## Type of estimator: ML (maximum likelihood)
## Log-likelihood: 373.9 on 4 Df
## Pseudo R-squared: 0.9975
## Number of iterations: 449 (BFGS) + 3 (Fisher scoring)
beta_sim <- clarify::sim(beta_model, n = 1000)
betas_mat <- beta_sim$sim.coefs[, c("(Intercept)", "GenderMale", "Number_of_Workers")]

# --- Create Counterfactuals ---
cf <- data.frame(
  Gender = factor(c("Male", "Female"), levels = levels(earnings_clean$Gender)),
  Number_of_Workers = mean(earnings_clean$Number_of_Workers, na.rm = TRUE)
)

# --- Design Matrix and Prediction ---
X <- model.matrix(~ Gender + Number_of_Workers, data = cf)
linpred <- X %*% t(betas_mat)

# --- Inverse Logit to Convert to Mean Disparity Scale ---
inv_logit <- function(x) exp(x) / (1 + exp(x))
predicted_vals <- apply(linpred, 2, inv_logit)

# --- Summarize Simulation Results ---
pred_summary <- apply(predicted_vals, 1, function(x) c(
  mean = mean(x),
  lower = quantile(x, 0.025),
  upper = quantile(x, 0.975)
))

pred_summary_df <- as.data.frame(t(pred_summary))
pred_summary_df$Gender <- cf$Gender
colnames(pred_summary_df) <- c("Mean", "Lower", "Upper", "Gender")

kable(pred_summary_df, 
      caption = "Predicted Earnings Disparity by Gender (with 95% CI)") %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"),
                full_width = FALSE,
                position = "center")
Predicted Earnings Disparity by Gender (with 95% CI)
Mean Lower Upper Gender
0.9973376 0.9965318 0.9979850 Male
0.7561399 0.7487745 0.7631532 Female
ggplot(pred_summary_df, aes(x = Gender, y = Mean, fill = Gender)) +
  geom_bar(stat = "identity", position = "dodge", width = 0.6) +
  geom_errorbar(aes(ymin = Lower, ymax = Upper), width = 0.2, position = position_dodge(0.6)) +
  scale_fill_manual(values = c("Male" = "skyblue", "Female" = "hotpink")) +
  labs(
    title = "Predicted Earnings Disparity by Gender",
    y = "Predicted Earnings Disparity (Mean)",
    x = "Gender"
  ) +
  scale_y_continuous(limits = c(0, 1)) +
  theme_minimal() +
  theme(
    text = element_text(size = 12),
    legend.position = "none"
  )

pred_summary_df <- pred_summary_df %>%
  arrange(Gender)

# Calculate difference (Female - Male)
pred_diff <- pred_summary_df$Mean[pred_summary_df$Gender == "Female"] - 
             pred_summary_df$Mean[pred_summary_df$Gender == "Male"]

print(paste("Predicted difference (Female - Male):", round(pred_diff, 3)))
## [1] "Predicted difference (Female - Male): -0.241"

Conclusion

To better understand the earnings gap between men and women, I used a beta regression model. This type of model is ideal when the outcome is a proportion between 0 and 1, like the earnings disparity per dollar in my data. The model helped me estimate how gender and the size of the workforce are related to pay disparity. I used the clarify package to simulate predictions from the model and compare expected earnings disparities for men and women, while keeping the number of workers the same. The results show that women are still predicted to earn less than men, with the model estimating an average disparity difference of approximately 0.241 dollars less per dollar earned. These findings suggest that pay inequality remains a problem, even though women’s participation in the workforce has increased. Simply having more women in the labor force is not enough to close the wage gap. Policies that focus on equal pay, transparency, and accountability are still needed.