DATA 712 HW#7
Introduction
Being a woman who is about to enter the workforce, pay parity is of the utmost importance. Using the U.S. Department of Labor and the U.S. Department of Labor Women’s Bureau data, we can analyze gender participation in the workforce and pay disparity. We used to live in a more patriarchal world, where the men were at work, and the women were tending to the children and the home. Now, that is not the case. Over the past recent decades, female representation in the workforce has steadily increased, with women now comprising 56.8% of the labor force, according to the Bureau of Labor Statistics. Despite this significant increase in participation, wage disparity between men and women remains an ongoing issue. It would be expected that when women entered the workforce, their median salaries would be lower than men. This is due to the fact that at the time women entered the workforce, men had more work experience and were therefore able to apply for and hold more senior positions than their female counterparts. Now that women are nearly fully integrated into the workforce has the differential leveled out?
Trends in Workforce Participation
labor_force <- read_excel("/Users/ruthiemaurer/Desktop/DATA 710/DATA 710 Assignment 2/Number in Civilian Labor Force.xlsx")
earnings <- read_excel("/Users/ruthiemaurer/Desktop/DATA 710/DATA 710 Assignment 2/Earnings Disparity Sex Data 2.xlsx")labor_force$Percentage_Difference <- ((labor_force$Men - labor_force$Women) / labor_force$Women) * 100
print(labor_force)## # A tibble: 75 × 4
## Year Women Men Percentage_Difference
## <chr> <dbl> <dbl> <dbl>
## 1 1948 17335 43286 150.
## 2 1949 17788 43498 145.
## 3 1950 18389 43819 138.
## 4 1951 19016 43001 126.
## 5 1952 19269 42869 122.
## 6 1953 19382 43633 125.
## 7 1954 19678 43965 123.
## 8 1955 20548 44475 116.
## 9 1956 21461 45091 110.
## 10 1957 21732 45197 108.
## # ℹ 65 more rows
labor_force_every_5_years <- labor_force %>%
mutate(Year = as.numeric(Year)) %>%
filter(Year %% 5 == 0)
print(labor_force_every_5_years)## # A tibble: 15 × 4
## Year Women Men Percentage_Difference
## <dbl> <dbl> <dbl> <dbl>
## 1 1950 18389 43819 138.
## 2 1955 20548 44475 116.
## 3 1960 23240 46388 99.6
## 4 1965 26200 48255 84.2
## 5 1970 31543 51228 62.4
## 6 1975 37475 56299 50.2
## 7 1980 45487 61453 35.1
## 8 1985 51050 64411 26.2
## 9 1990 56829 69011 21.4
## 10 1995 60944 71360 17.1
## 11 2000 66303 76280 15.0
## 12 2005 69288 80033 15.5
## 13 2010 71904 81985 14.0
## 14 2015 73510 83620 13.8
## 15 2020 75538 85204 12.8
ggplot(labor_force_every_5_years, aes(x = Year, y = Women, group = 1)) +
geom_point(color = "pink", size = 2) +
labs(title = "Women in the Workforce",
subtitle = "Trends over the years",
x = "Year", y = "Number of Women in Workforce") +
theme_minimal(base_size = 15) +
theme(plot.title = element_text(face = "bold"),
axis.text.x = element_text(angle = 50, hjust = 1, size = 5.5))
The results from the labor force dataset highlight the upward trend in
female workforce participation, evidenced by the clear and consistent
rise in the number of women employed over time.
Summary Statistics: Labor and Earnigns
## [1] "State" "Data Type"
## [3] "Average Weekly Earnings" "Number of Workers"
## [5] "Earnings Disparity" "Employed Percent"
earnings <- earnings %>%
rename(
"Gender" = "Data Type",
"Disparity Per Dollar" = "Earnings Disparity"
)
print(earnings)## # A tibble: 104 × 6
## State Gender `Average Weekly Earnings` `Number of Workers`
## <chr> <chr> <dbl> <dbl>
## 1 NATIONAL Male 1094. 82519194.
## 2 NATIONAL Female 836. 73023354.
## 3 AK Male 1130. 175260.
## 4 AK Female 900. 156792.
## 5 AL Male 1011. 1129482.
## 6 AL Female 739. 995575.
## 7 AR Male 941. 677494.
## 8 AR Female 728. 629512.
## 9 AZ Male 1039. 1747963.
## 10 AZ Female 782. 1504803.
## # ℹ 94 more rows
## # ℹ 2 more variables: `Disparity Per Dollar` <dbl>, `Employed Percent` <dbl>
## # A tibble: 104 × 6
## State Gender `Average Weekly Earnings` `Number of Workers`
## <chr> <chr> <dbl> <dbl>
## 1 NATIONAL Male 1094. 82519194.
## 2 NATIONAL Female 836. 73023354.
## 3 AK Male 1130. 175260.
## 4 AK Female 900. 156792.
## 5 AL Male 1011. 1129482.
## 6 AL Female 739. 995575.
## 7 AR Male 941. 677494.
## 8 AR Female 728. 629512.
## 9 AZ Male 1039. 1747963.
## 10 AZ Female 782. 1504803.
## # ℹ 94 more rows
## # ℹ 2 more variables: `Disparity Per Dollar` <dbl>, `Employed Percent` <dbl>
When examining workforce parity, the number of women has increased, but a gap still exists between male and female participation rates. This finding may be noteworthy, but it does not explain the pay inequity.
In terms of earnings disparity, the data below provides a summary of the differences in pay by calculating the average, median, minimum, and maximum disparity per dollar. These statistics paint a troubling picture: despite near parity in workforce numbers, the average pay disparity suggests that women continue to earn less per dollar earned than men for comparable work.
earnings_summary <- earnings %>%
summarize(
Mean_Disparity = round(mean(`Disparity Per Dollar`, na.rm = TRUE), 2),
Median_Disparity = round(median(`Disparity Per Dollar`, na.rm = TRUE), 2),
Min_Disparity = round(min(`Disparity Per Dollar`, na.rm = TRUE), 2),
Max_Disparity = round(max(`Disparity Per Dollar`, na.rm = TRUE), 2),
SD_Disparity = round(sd(`Disparity Per Dollar`, na.rm = TRUE), 2)
)
library(kableExtra)
kable(earnings_summary, caption = "Summary of Earnings Disparity Metrics") %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed"), full_width = FALSE)| Mean_Disparity | Median_Disparity | Min_Disparity | Max_Disparity | SD_Disparity |
|---|---|---|---|---|
| 0.88 | 0.94 | 0.64 | 1 | 0.13 |
women_workforce_summary <- labor_force_every_5_years %>%
summarize(
Avg_Women_in_Workforce = round(mean(Women, na.rm = TRUE), 0),
Min_Women_in_Workforce = min(Women, na.rm = TRUE),
Max_Women_in_Workforce = max(Women, na.rm = TRUE)
)
earnings_summary <- earnings %>%
summarize(
Avg_Disparity_Per_Dollar = round(mean(`Disparity Per Dollar`, na.rm = TRUE), 2),
Min_Disparity_Per_Dollar = min(`Disparity Per Dollar`, na.rm = TRUE),
Max_Disparity_Per_Dollar = max(`Disparity Per Dollar`, na.rm = TRUE)
)
comparison_table <- cbind(women_workforce_summary, earnings_summary)
kable(comparison_table,
caption = "Comparison of Women's Workforce Participation and Pay Disparity") %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed"),
full_width = FALSE,
position = "center")| Avg_Women_in_Workforce | Min_Women_in_Workforce | Max_Women_in_Workforce | Avg_Disparity_Per_Dollar | Min_Disparity_Per_Dollar | Max_Disparity_Per_Dollar |
|---|---|---|---|---|---|
| 48550 | 18389 | 75538 | 0.88 | 0.64 | 1 |
A comparative summary table further emphasizes this by contrasting average female and male workforce participation with the average earnings gap, highlighting the fact that higher participation alone does not resolve pay disparities.
Visualizing Workforce and Earnings Disparity
library(gridExtra)
plot_workforce <- ggplot(labor_force_every_5_years, aes(x = Year, y = Women, Men)) +
geom_line(aes(x = Year, y = Men, color = "Men's Earnings"), color = "lightblue", size = 1.2) +
geom_point(aes(x = Year, y = Men), color = "blue", size = 3) +
geom_line(aes(x = Year, y = Women, color = "Women’s Earnings"), color = "pink", size = 1.2) +
geom_point(aes(x = Year, y = Women), color = "hotpink", size = 3) +
labs(
title = "Earnings Over the Years",
subtitle = "Earnings trend at five-year increments",
x = "Year",
y = "Earnings"
) +
scale_color_manual(values = c("Men's Earnings" = "blue", "Women’s Earnings" = "pink")) +
theme_minimal(base_size = 8) +
theme(
plot.title = element_text(face = "bold"),
legend.title = element_blank(),
axis.text.x = element_text(angle = 50, hjust = 1)
)
plot_disparity <- ggplot(earnings, aes(x = `Disparity Per Dollar`)) +
geom_histogram(binwidth = 0.05, fill = "lightblue", color = "black", alpha = 0.7) +
labs(title = "Earnings Disparity Across States", x = "Earnings Disparity (Dollars)", y = "Number of States") +
theme_minimal(base_size = 8)
grid.arrange(plot_workforce, plot_disparity, ncol = 2)So far, this analysis supports the claim that while women have achieved greater participation in the workforce, a wage gap remains. States are trying to combat this; they have put into effect certain requirements for employers to follow. For example, certain states now require wage ranges to be shown on job offerings – salary transparency. The Women’s Policy Research also adds that by the year 2058, they predict we will have pay parity due to many of these efforts. Addressing this inequity requires targeted policies aimed at wage transparency, pay equity, and sector-specific interventions to ensure that women not only participate in the workforce but are also compensated equitably. The persistent gap, shown in the analysis below, indicates that without deliberate action, increased participation alone will not close the wage gap.
labor_force_summary <- labor_force %>%
summarize(Avg_Percentage_Difference = mean(Percentage_Difference, na.rm = TRUE))
earnings_summary <- earnings %>%
summarize(Avg_Disparity_Per_Dollar = mean(`Disparity Per Dollar`, na.rm = TRUE))
summary_data <- data.frame(
Metric = c("Avg Percentage Difference in Workforce", "Avg Pay Disparity Per Dollar"),
Value = c(labor_force_summary$Avg_Percentage_Difference, earnings_summary$Avg_Disparity_Per_Dollar)
)
kable(summary_data,
caption = "Summary of Average Workforce and Pay Disparity Metrics") %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed"),
full_width = FALSE,
position = "center")| Metric | Value |
|---|---|
| Avg Percentage Difference in Workforce | 48.0804671 |
| Avg Pay Disparity Per Dollar | 0.8776923 |
Beta Regression: Modeling Gender Pay Disparity
earnings_clean <- earnings %>%
filter(Gender %in% c("Male", "Female")) %>%
mutate(
`Disparity Per Dollar` = ifelse(`Disparity Per Dollar` >= 1, 0.999, `Disparity Per Dollar`),
`Disparity Per Dollar` = ifelse(`Disparity Per Dollar` <= 0, 0.001, `Disparity Per Dollar`),
Gender = factor(Gender)
) %>%
rename(Number_of_Workers = `Number of Workers`)
beta_model <- betareg(`Disparity Per Dollar` ~ Gender + Number_of_Workers, data = earnings_clean)
summary(beta_model)##
## Call:
## betareg(formula = `Disparity Per Dollar` ~ Gender + Number_of_Workers,
## data = earnings_clean)
##
## Quantile residuals:
## Min 1Q Median 3Q Max
## -3.8683 0.0249 0.2220 0.2231 4.7335
##
## Coefficients (mean model with logit link):
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 1.130e+00 2.163e-02 52.249 <2e-16 ***
## GenderMale 4.803e+00 1.455e-01 33.003 <2e-16 ***
## Number_of_Workers 7.410e-10 2.080e-09 0.356 0.722
##
## Phi coefficients (precision model with identity link):
## Estimate Std. Error z value Pr(>|z|)
## (phi) 238.76 36.22 6.592 4.35e-11 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Type of estimator: ML (maximum likelihood)
## Log-likelihood: 373.9 on 4 Df
## Pseudo R-squared: 0.9975
## Number of iterations: 449 (BFGS) + 3 (Fisher scoring)
beta_sim <- clarify::sim(beta_model, n = 1000)
betas_mat <- beta_sim$sim.coefs[, c("(Intercept)", "GenderMale", "Number_of_Workers")]
# --- Create Counterfactuals ---
cf <- data.frame(
Gender = factor(c("Male", "Female"), levels = levels(earnings_clean$Gender)),
Number_of_Workers = mean(earnings_clean$Number_of_Workers, na.rm = TRUE)
)
# --- Design Matrix and Prediction ---
X <- model.matrix(~ Gender + Number_of_Workers, data = cf)
linpred <- X %*% t(betas_mat)
# --- Inverse Logit to Convert to Mean Disparity Scale ---
inv_logit <- function(x) exp(x) / (1 + exp(x))
predicted_vals <- apply(linpred, 2, inv_logit)
# --- Summarize Simulation Results ---
pred_summary <- apply(predicted_vals, 1, function(x) c(
mean = mean(x),
lower = quantile(x, 0.025),
upper = quantile(x, 0.975)
))
pred_summary_df <- as.data.frame(t(pred_summary))
pred_summary_df$Gender <- cf$Gender
colnames(pred_summary_df) <- c("Mean", "Lower", "Upper", "Gender")
kable(pred_summary_df,
caption = "Predicted Earnings Disparity by Gender (with 95% CI)") %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed"),
full_width = FALSE,
position = "center")| Mean | Lower | Upper | Gender |
|---|---|---|---|
| 0.9973376 | 0.9965318 | 0.9979850 | Male |
| 0.7561399 | 0.7487745 | 0.7631532 | Female |
ggplot(pred_summary_df, aes(x = Gender, y = Mean, fill = Gender)) +
geom_bar(stat = "identity", position = "dodge", width = 0.6) +
geom_errorbar(aes(ymin = Lower, ymax = Upper), width = 0.2, position = position_dodge(0.6)) +
scale_fill_manual(values = c("Male" = "skyblue", "Female" = "hotpink")) +
labs(
title = "Predicted Earnings Disparity by Gender",
y = "Predicted Earnings Disparity (Mean)",
x = "Gender"
) +
scale_y_continuous(limits = c(0, 1)) +
theme_minimal() +
theme(
text = element_text(size = 12),
legend.position = "none"
)pred_summary_df <- pred_summary_df %>%
arrange(Gender)
# Calculate difference (Female - Male)
pred_diff <- pred_summary_df$Mean[pred_summary_df$Gender == "Female"] -
pred_summary_df$Mean[pred_summary_df$Gender == "Male"]
print(paste("Predicted difference (Female - Male):", round(pred_diff, 3)))## [1] "Predicted difference (Female - Male): -0.241"
Conclusion
To better understand the earnings gap between men and women, I used a
beta regression model. This type of model is ideal when the outcome is a
proportion between 0 and 1, like the earnings disparity per dollar in my
data. The model helped me estimate how gender and the size of the
workforce are related to pay disparity. I used the clarify
package to simulate predictions from the model and compare expected
earnings disparities for men and women, while keeping the number of
workers the same. The results show that women are still predicted to
earn less than men, with the model estimating an average disparity
difference of approximately 0.241 dollars less per dollar earned. These
findings suggest that pay inequality remains a problem, even though
women’s participation in the workforce has increased. Simply having more
women in the labor force is not enough to close the wage gap. Policies
that focus on equal pay, transparency, and accountability are still
needed.