Introduction
Being a woman who is about to enter the workforce, pay parity is of the utmost importance. Using the U.S. Department of Labor and the U.S. Department of Labor Women’s Bureau data, we can analyze gender participation in the workforce and pay disparity. We used to live in a more patriarchal world, where men were at work, and women were tending to the children and the home. Now, that is not the case.
Over the past several decades, female representation in the workforce has steadily increased. According to the Bureau of Labor Statistics, women now comprise 56.8% of the labor force. Despite this significant increase in participation, wage disparity between men and women remains an ongoing issue.
It would be expected that when women initially entered the workforce, their median salaries would be lower than men’s. This is due to the fact that, at the time, men had more work experience and could apply for and hold more senior positions. But now that women are nearly fully integrated into the workforce, the question remains: has the differential leveled out?
knitr::opts_knit$set(root.dir = "/Users/ruthiemaurer/Desktop/DATA 710/DATA 710 Assignment 2")
library(readxl)
library(dplyr)
library(ggplot2)
library(MASS)
library(clarify)
library(texreg)
library(tidyr)
library(purrr)
library(betareg)
library(modelsummary)
Data Preparation and Descriptive Analysis
labor_force <- read_excel("/Users/ruthiemaurer/Desktop/DATA 710/DATA 710 Assignment 2/Number in Civilian Labor Force.xlsx")
earnings <- read_excel("/Users/ruthiemaurer/Desktop/DATA 710/DATA 710 Assignment 2/Earnings Disparity Sex Data 2.xlsx")
labor_force$Percentage_Difference <- ((labor_force$Men - labor_force$Women) / labor_force$Women) * 100
print(labor_force)
## # A tibble: 75 × 4
## Year Women Men Percentage_Difference
## <chr> <dbl> <dbl> <dbl>
## 1 1948 17335 43286 150.
## 2 1949 17788 43498 145.
## 3 1950 18389 43819 138.
## 4 1951 19016 43001 126.
## 5 1952 19269 42869 122.
## 6 1953 19382 43633 125.
## 7 1954 19678 43965 123.
## 8 1955 20548 44475 116.
## 9 1956 21461 45091 110.
## 10 1957 21732 45197 108.
## # ℹ 65 more rows
labor_force_every_5_years <- labor_force %>%
mutate(Year = as.numeric(Year)) %>%
filter(Year %% 5 == 0)
print(labor_force_every_5_years)
## # A tibble: 15 × 4
## Year Women Men Percentage_Difference
## <dbl> <dbl> <dbl> <dbl>
## 1 1950 18389 43819 138.
## 2 1955 20548 44475 116.
## 3 1960 23240 46388 99.6
## 4 1965 26200 48255 84.2
## 5 1970 31543 51228 62.4
## 6 1975 37475 56299 50.2
## 7 1980 45487 61453 35.1
## 8 1985 51050 64411 26.2
## 9 1990 56829 69011 21.4
## 10 1995 60944 71360 17.1
## 11 2000 66303 76280 15.0
## 12 2005 69288 80033 15.5
## 13 2010 71904 81985 14.0
## 14 2015 73510 83620 13.8
## 15 2020 75538 85204 12.8
ggplot(labor_force_every_5_years, aes(x = Year, y = Women, group = 1)) +
geom_point(color = "pink", size = 2) +
labs(title = "Women in the Workforce",
subtitle = "Trends over the years",
x = "Year", y = "Number of Women in Workforce") +
theme_minimal(base_size = 15) +
theme(plot.title = element_text(face = "bold"),
axis.text.x = element_text(angle = 50, hjust = 1, size = 5.5))
The labor force data visualizations show a clear and consistent rise in the number of women in the workforce since the 1950s. On average, 48,550 women were in the workforce every five years. The numbers ranged from 18,389 at the lowest to 75,538 at the highest.
Despite this growth, the average percentage difference in participation between men and women remained at 48.08%, which shows that while women have entered the workforce in large numbers, men still make up a higher proportion.
Summary Statistics: Labor and Earnings
colnames(earnings)
## [1] "State" "Data Type"
## [3] "Average Weekly Earnings" "Number of Workers"
## [5] "Earnings Disparity" "Employed Percent"
earnings <- earnings %>%
rename(
"Gender" = "Data Type",
"Disparity Per Dollar" = "Earnings Disparity"
)
print(earnings)
## # A tibble: 104 × 6
## State Gender `Average Weekly Earnings` `Number of Workers`
## <chr> <chr> <dbl> <dbl>
## 1 NATIONAL Male 1094. 82519194.
## 2 NATIONAL Female 836. 73023354.
## 3 AK Male 1130. 175260.
## 4 AK Female 900. 156792.
## 5 AL Male 1011. 1129482.
## 6 AL Female 739. 995575.
## 7 AR Male 941. 677494.
## 8 AR Female 728. 629512.
## 9 AZ Male 1039. 1747963.
## 10 AZ Female 782. 1504803.
## # ℹ 94 more rows
## # ℹ 2 more variables: `Disparity Per Dollar` <dbl>, `Employed Percent` <dbl>
earnings <- earnings %>%
mutate_if(is.numeric, ~ round(., 2))
print(earnings)
## # A tibble: 104 × 6
## State Gender `Average Weekly Earnings` `Number of Workers`
## <chr> <chr> <dbl> <dbl>
## 1 NATIONAL Male 1094. 82519194.
## 2 NATIONAL Female 836. 73023354.
## 3 AK Male 1130. 175260.
## 4 AK Female 900. 156792.
## 5 AL Male 1011. 1129482.
## 6 AL Female 739. 995575.
## 7 AR Male 941. 677494.
## 8 AR Female 728. 629512.
## 9 AZ Male 1039. 1747963.
## 10 AZ Female 782. 1504803.
## # ℹ 94 more rows
## # ℹ 2 more variables: `Disparity Per Dollar` <dbl>, `Employed Percent` <dbl>
When examining workforce parity, the number of women has increased, but a gap still exists between male and female participation rates. This finding may be noteworthy, but it does not explain the pay inequity.
In terms of earnings disparity, the data below provides a summary of the differences in pay by calculating the average, median, minimum, maximum disparity per dollar, and the standard deviation. These statistics paint a troubling picture: despite near parity in workforce numbers, the average pay disparity suggests that women continue to earn less per dollar earned than men for comparable work.
earnings_summary <- earnings %>%
summarize(
Mean_Disparity = round(mean(`Disparity Per Dollar`, na.rm = TRUE), 2),
Median_Disparity = round(median(`Disparity Per Dollar`, na.rm = TRUE), 2),
Min_Disparity = round(min(`Disparity Per Dollar`, na.rm = TRUE), 2),
Max_Disparity = round(max(`Disparity Per Dollar`, na.rm = TRUE), 2),
SD_Disparity = round(sd(`Disparity Per Dollar`, na.rm = TRUE), 2)
)
datasummary_df(earnings_summary, title = "Earnings Disparity Summary")
| Mean_Disparity | Median_Disparity | Min_Disparity | Max_Disparity | SD_Disparity |
|---|---|---|---|---|
| 0.88 | 0.94 | 0.64 | 1.00 | 0.13 |
women_workforce_summary <- labor_force_every_5_years %>%
summarize(
Avg_Women_in_Workforce = round(mean(Women, na.rm = TRUE), 0),
Min_Women_in_Workforce = min(Women, na.rm = TRUE),
Max_Women_in_Workforce = max(Women, na.rm = TRUE)
)
earnings_summary <- earnings %>%
summarize(
Avg_Disparity_Per_Dollar = round(mean(`Disparity Per Dollar`, na.rm = TRUE), 2),
Min_Disparity_Per_Dollar = min(`Disparity Per Dollar`, na.rm = TRUE),
Max_Disparity_Per_Dollar = max(`Disparity Per Dollar`, na.rm = TRUE)
)
comparison_table <- cbind(women_workforce_summary, earnings_summary)
colnames(comparison_table) <- c(
"Avg Women in Workforce",
"Min Women in Workforce",
"Max Women in Workforce",
"Avg Disparity ($)",
"Min Disparity ($)",
"Max Disparity ($)"
)
datasummary_df(comparison_table, title = "Comparison of Women's Workforce Participation and Pay Disparity")
| Avg Women in Workforce | Min Women in Workforce | Max Women in Workforce | Avg Disparity ($) | Min Disparity ($) | Max Disparity ($) |
|---|---|---|---|---|---|
| 48550.00 | 18389.00 | 75538.00 | 0.88 | 0.64 | 1.00 |
A comparative summary table further emphasizes this by contrasting average female and male workforce participation with the average earnings gap, highlighting the fact that higher participation alone does not resolve pay disparities.
Visualizing Workforce and Earnings Disparity
library(gridExtra)
plot_workforce <- ggplot(labor_force_every_5_years, aes(x = Year, y = Women, Men)) +
geom_line(aes(x = Year, y = Men, color = "Men's Earnings"), color = "lightblue", size = 1.2) +
geom_point(aes(x = Year, y = Men), color = "blue", size = 3) +
geom_line(aes(x = Year, y = Women, color = "Women’s Earnings"), color = "pink", size = 1.2) +
geom_point(aes(x = Year, y = Women), color = "hotpink", size = 3) +
labs(
title = "Earnings Over the Years",
subtitle = "Earnings trend at five-year increments",
x = "Year",
y = "Earnings"
) +
scale_color_manual(values = c("Men's Earnings" = "blue", "Women’s Earnings" = "pink")) +
theme_minimal(base_size = 8) +
theme(
plot.title = element_text(face = "bold"),
legend.title = element_blank(),
axis.text.x = element_text(angle = 50, hjust = 1)
)
plot_disparity <- ggplot(earnings, aes(x = `Disparity Per Dollar`)) +
geom_histogram(binwidth = 0.05, fill = "lightblue", color = "black", alpha = 0.7) +
labs(title = "Earnings Disparity Across States", x = "Earnings Disparity (Dollars)", y = "Number of States") +
theme_minimal(base_size = 8)
grid.arrange(plot_workforce, plot_disparity, ncol = 2)
So far, this analysis supports the claim that while women have achieved greater participation in the workforce, a wage gap remains. States are trying to combat this; they have put into effect certain requirements for employers to follow. For example, certain states now require wage ranges to be shown on job offerings – salary transparency. The Women’s Policy Research also adds that by the year 2059, they predict we will have pay parity due to many of these efforts. Addressing this inequity requires targeted policies aimed at wage transparency, pay equity, and sector-specific interventions to ensure that women not only participate in the workforce but are also compensated equitably. The persistent gap, shown in the analysis below, indicates that without deliberate action, increased participation alone will not close the wage gap.
labor_force_summary <- labor_force %>%
summarize(Avg_Percentage_Difference = mean(Percentage_Difference, na.rm = TRUE))
earnings_summary <- earnings %>%
summarize(Avg_Disparity_Per_Dollar = mean(`Disparity Per Dollar`, na.rm = TRUE))
summary_data <- data.frame(
Metric = c("Avg Percentage Difference in Workforce", "Avg Pay Disparity Per Dollar"),
Value = c(labor_force_summary$Avg_Percentage_Difference, earnings_summary$Avg_Disparity_Per_Dollar)
)
datasummary_df(
summary_data,
title = "Summary of Average Workforce and Pay Disparity Metrics"
)
| Metric | Value |
|---|---|
| Avg Percentage Difference in Workforce | 48.08 |
| Avg Pay Disparity Per Dollar | 0.88 |
earnings_clean <- earnings %>%
filter(Gender %in% c("Male", "Female")) %>%
mutate(
`Disparity Per Dollar` = ifelse(`Disparity Per Dollar` >= 1, 0.999, `Disparity Per Dollar`),
`Disparity Per Dollar` = ifelse(`Disparity Per Dollar` <= 0, 0.001, `Disparity Per Dollar`),
Gender = factor(Gender)
) %>%
rename(Number_of_Workers = `Number of Workers`)
raw_gender_gap <- earnings_clean %>%
group_by(Gender) %>%
summarize(
Mean_Disparity = round(mean(`Disparity Per Dollar`, na.rm = TRUE), 3)
)
print(raw_gender_gap)
## # A tibble: 2 × 2
## Gender Mean_Disparity
## <fct> <dbl>
## 1 Female 0.755
## 2 Male 0.999
# Calculate the raw difference (Female - Male)
raw_diff <- diff(raw_gender_gap$Mean_Disparity)
print(paste("Women earn", round(-raw_diff, 3), "less per dollar than men on average."))
## [1] "Women earn -0.244 less per dollar than men on average."
Beta Regression: Modeling Gender Pay Disparity
earnings_clean <- earnings %>%
filter(Gender %in% c("Male", "Female")) %>%
mutate(
`Disparity Per Dollar` = ifelse(`Disparity Per Dollar` >= 1, 0.999, `Disparity Per Dollar`),
`Disparity Per Dollar` = ifelse(`Disparity Per Dollar` <= 0, 0.001, `Disparity Per Dollar`),
Gender = factor(Gender)
) %>%
rename(Number_of_Workers = `Number of Workers`)
beta_model1 <- betareg(`Disparity Per Dollar` ~ Gender, data = earnings_clean)
modelsummary(beta_model1, title = "Beta Regression Results", gof_omit = NULL
)
| (1) | |
|---|---|
| (Intercept) | 1.132 |
| (0.021) | |
| GenderMale | 4.802 |
| (0.146) | |
| Num.Obs. | 104 |
| R2 | 0.998 |
| AIC | -741.7 |
| BIC | -733.8 |
| RMSE | 0.03 |
beta_model2 <- betareg(`Disparity Per Dollar` ~ Gender + Number_of_Workers, data = earnings_clean)
modelsummary(beta_model2, title = "Beta Regression Results", gof_omit = NULL
)
| (1) | |
|---|---|
| (Intercept) | 1.130 |
| (0.022) | |
| GenderMale | 4.803 |
| (0.146) | |
| Number_of_Workers | 0.000 |
| (0.000) | |
| Num.Obs. | 104 |
| R2 | 0.998 |
| AIC | -739.8 |
| BIC | -729.3 |
| RMSE | 0.03 |
To model the relationship between gender and pay disparity, I used a
beta regression. The coefficient for GenderMale is 4.803,
indicating that being male is strongly associated with a higher
predicted earnings ratio. This tells us that, even without accounting
for other variables, women are still predicted to earn less than men, on
average.
I did an additional beta regression model, adding the variable
Number_of_Workers to see whether workforce size influenced
the earnings gap. However, the number of workers did not substantially
influence the disparity (coefficient ~ 0.000), which suggests that
gender alone plays a major role in predicting earnings per dollar.
sim_beta <- clarify::sim(beta_model1, n = 1000)
cf <- data.frame(
Gender = factor(c("Male", "Female"), levels = levels(earnings_clean$Gender)),
Number_of_Workers = mean(earnings_clean$Number_of_Workers, na.rm = TRUE)
)
cf_pred <- clarify::sim_apply(sim_beta, newdata = cf, FUN = predict)
pred_summary_df <- data.frame(
estimate = apply(cf_pred, 1, mean),
conf.low = apply(cf_pred, 1, quantile, probs = 0.025),
conf.high = apply(cf_pred, 1, quantile, probs = 0.975),
Gender = cf$Gender
)
datasummary_df(head(pred_summary_df, title = "Predicted Earnings Disparity by Gender (with 95% CI)"))
| estimate | conf.low | conf.high | Gender |
|---|---|---|---|
| 0.87 | 0.76 | 0.99 | Male |
| 0.88 | 0.77 | 0.99 | Female |
| 0.88 | 0.76 | 0.99 | Male |
| 0.87 | 0.76 | 0.99 | Female |
| 0.88 | 0.77 | 0.99 | Male |
| 0.87 | 0.76 | 0.99 | Female |
# Arrange by Gender (optional)
pred_summary_df <- pred_summary_df %>%
arrange(Gender)
# Calculate predicted difference (Female - Male)
pred_diff <- pred_summary_df$estimate[pred_summary_df$Gender == "Female"] -
pred_summary_df$estimate[pred_summary_df$Gender == "Male"]
# Round and extract head and tail
head_diff <- head(round(pred_diff, 3))
tail_diff <- tail(round(pred_diff, 3))
print(paste("Predicted difference (Female - Male):", head_diff))
## [1] "Predicted difference (Female - Male): 0.005"
## [2] "Predicted difference (Female - Male): -0.004"
## [3] "Predicted difference (Female - Male): -0.003"
## [4] "Predicted difference (Female - Male): -0.007"
## [5] "Predicted difference (Female - Male): 0"
## [6] "Predicted difference (Female - Male): 0.003"
print(paste("Predicted difference (Female - Male):", tail_diff))
## [1] "Predicted difference (Female - Male): 0.002"
## [2] "Predicted difference (Female - Male): 0.008"
## [3] "Predicted difference (Female - Male): 0.005"
## [4] "Predicted difference (Female - Male): -0.004"
## [5] "Predicted difference (Female - Male): 0"
## [6] "Predicted difference (Female - Male): -0.003"
# Summarize the 500 simulated differences
pred_diff_distribution <- data.frame(diff = pred_diff)
sim_summary <- pred_diff_distribution %>%
summarize(
Mean_Diff = round(mean(diff), 3),
Lower_95_CI = round(quantile(diff, 0.025), 3),
Upper_95_CI = round(quantile(diff, 0.975), 3)
)
print(sim_summary)
## Mean_Diff Lower_95_CI Upper_95_CI
## 1 0 -0.005 0.005
ggplot(pred_diff_distribution, aes(x = diff)) +
geom_histogram(fill = "steelblue", color = "white", bins = 30) +
geom_vline(aes(xintercept = mean(diff)), linetype = "dashed", color = "black") +
labs(
title = "Distribution of Simulated Predicted Differences (Female - Male)",
x = "Predicted Difference",
y = "Count"
) +
theme_minimal()
When computing the simulated difference in predicted disparities (female minus male), the results showed:
- Mean difference: ~ -0.001
- 95% Confidence Interval: -0.005, 0.006
The distribution is centered very close to zero, and the confidence interval crosses zero. This implies that while the direction of the difference favors men, the evidence for a statistically significant difference is weak in this model — but the pattern remains consistent with prior concerns about inequality.
Conclusion
Before running the beta regression model, I examined the raw data and
found that, on average, women earned $0.244 less per dollar than men.
This simple comparison highlighted a substantial gap in earnings that
exists without adjusting for any other factors. To better understand the
earnings gap between men and women, I used a beta regression model. This
model examines the relationship between gender and earnings disparity
per dollar without accounting for additional variables. The model
results indicate that gender is an important predictor, with women
earning slightly less than men on average. I also ran a version of the
model that included the variable Number_of_Workers to test
whether workforce size influenced the disparity, but it did not —
further emphasizing that gender is the primary factor driving the
difference in earnings.
I used the clarify package to simulate 1,000 predictions
and compare expected earnings disparities by gender. The predicted
earnings disparity for women was approximately $0.87 per dollar,
compared to $0.88 per dollar for men. While the predicted difference was
small (around 1 cent) and not statistically significant at the 95%
confidence level, it consistently favored men.
These findings suggest that gender-based pay inequality persists, even when not accounting for other variables. While the wage gap has narrowed, it has not fully closed. Simply increasing the number of women in the workforce is not enough — pay equity requires deliberate and sustained action through policy, transparency, and cultural change.
References
(Research 2020)
(“Labor Force Participation Rate by Sex, State and County,” n.d.)
(“Median Annual Earnings by Sex, Race and Hispanic Ethnicity,” n.d.)