TidyVerse Create - Analyzing Gender Pay Gaps by Job Title

This vignette explores gender-based wage differences using a dataset sourced from Kaggle. It demonstrates how to use readr, dplyr, tidyr, and ggplot2 to group, summarize, reshape, and visualize salary trends. Specifically, it analyzes average BasePay and Bonus by Gender and highlights the job titles with the largest wage gaps.

url <- "https://raw.githubusercontent.com/tcgraham-data/data-607-tidyverse-project/refs/heads/main/Glassdoor%20Gender%20Pay%20Gap.csv"
df <- read_csv(url)

The data set has been taken from glassdoor as of 2020 and focuses on income for various job titles based on gender. As there have been many studies showcasing that women are paid less than men for the same job titles, this data set will be helpful in identifying the depth of the gender-based pay gap.

Quick data overview:

glimpse(df)
## Rows: 1,000
## Columns: 9
## $ JobTitle  <chr> "Graphic Designer", "Software Engineer", "Warehouse Associat…
## $ Gender    <chr> "Female", "Male", "Female", "Male", "Male", "Female", "Femal…
## $ Age       <dbl> 18, 21, 19, 20, 26, 20, 20, 18, 33, 35, 24, 18, 19, 30, 35, …
## $ PerfEval  <dbl> 5, 5, 4, 5, 5, 5, 5, 4, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, …
## $ Education <chr> "College", "College", "PhD", "Masters", "Masters", "PhD", "C…
## $ Dept      <chr> "Operations", "Management", "Administration", "Sales", "Engi…
## $ Seniority <dbl> 2, 5, 5, 4, 5, 4, 4, 5, 5, 5, 5, 3, 3, 5, 4, 3, 5, 5, 5, 5, …
## $ BasePay   <dbl> 42363, 108476, 90208, 108080, 99464, 70890, 67585, 97523, 11…
## $ Bonus     <dbl> 9938, 11128, 9268, 10154, 9319, 10126, 10541, 10240, 9836, 9…
summary(df)
##    JobTitle            Gender               Age           PerfEval    
##  Length:1000        Length:1000        Min.   :18.00   Min.   :1.000  
##  Class :character   Class :character   1st Qu.:29.00   1st Qu.:2.000  
##  Mode  :character   Mode  :character   Median :41.00   Median :3.000  
##                                        Mean   :41.39   Mean   :3.037  
##                                        3rd Qu.:54.25   3rd Qu.:4.000  
##                                        Max.   :65.00   Max.   :5.000  
##   Education             Dept             Seniority        BasePay      
##  Length:1000        Length:1000        Min.   :1.000   Min.   : 34208  
##  Class :character   Class :character   1st Qu.:2.000   1st Qu.: 76850  
##  Mode  :character   Mode  :character   Median :3.000   Median : 93328  
##                                        Mean   :2.971   Mean   : 94473  
##                                        3rd Qu.:4.000   3rd Qu.:111558  
##                                        Max.   :5.000   Max.   :179726  
##      Bonus      
##  Min.   : 1703  
##  1st Qu.: 4850  
##  Median : 6507  
##  Mean   : 6467  
##  3rd Qu.: 8026  
##  Max.   :11293

Average Base Pay and Bonus by Gender:

df %>%
  group_by(Gender) %>%
  summarise(
    avg_base = mean(BasePay, na.rm = TRUE),
    avg_bonus = mean(Bonus, na.rm = TRUE),
    count = n()
  )
## # A tibble: 2 × 4
##   Gender avg_base avg_bonus count
##   <chr>     <dbl>     <dbl> <int>
## 1 Female   89943.     6474.   468
## 2 Male     98458.     6461.   532

This informs us that on average, men make nearly $10,000 more than women at the average base pay level.

Wage Gap by Job Title:

wage_gap <- df %>%
  group_by(JobTitle, Gender) %>%
  summarise(
    avg_base = mean(BasePay, na.rm = TRUE),
    .groups = "drop"
  ) %>%
  pivot_wider(names_from = Gender, values_from = avg_base) %>%
  mutate(gap = Male - Female) %>%
  arrange(desc(gap))

head(wage_gap, 10)
## # A tibble: 10 × 4
##    JobTitle             Female    Male    gap
##    <chr>                 <dbl>   <dbl>  <dbl>
##  1 Software Engineer    94701  106371. 11670.
##  2 Marketing Associate  76119.  81882.  5763.
##  3 Driver               86868.  91953.  5085.
##  4 Sales Associate      91894.  94663.  2769.
##  5 IT                   90476.  91022.   546.
##  6 Financial Analyst    95458.  94607.  -851.
##  7 Manager             127252. 124849. -2403.
##  8 Graphic Designer     92243.  89596. -2647.
##  9 Warehouse Associate  92428.  86553. -5875.
## 10 Data Scientist       95705.  89223. -6482.

Interestingly, this data cluster shows us the extremity of data, highlighting the jobs where men make substantially more and where women also make substantially more. It is noteworthy that at the outlier level, men outpace women by two to one.

Visualization of Largest Wage Gaps:

wage_gap %>%
  filter(!is.na(gap)) %>%
  slice_max(abs(gap), n = 10) %>%
  ggplot(aes(x = reorder(JobTitle, gap), y = gap)) +
  geom_col(fill = "tomato") +
  coord_flip() +
  labs(
    title = "Top 10 Job Titles by Gender Wage Gap",
    x = "Job Title",
    y = "Base Pay Gap (Male - Female)"
  )

This cluster of data is interesting to look at, but it fails to tell the entire story. Let us look at the percentage of job titles where men earn more than women:

gap_summary <- wage_gap %>%
  filter(!is.na(Male), !is.na(Female)) %>%
  mutate(gap = Male - Female)

total_jobs <- nrow(gap_summary)
jobs_favor_men <- gap_summary %>%
  filter(gap > 0) %>%
  nrow()

percent_favor_men <- round((jobs_favor_men / total_jobs) * 100, 1)

cat("Percentage of job titles where men earn more:", percent_favor_men, "%\n")
## Percentage of job titles where men earn more: 50 %

And how would it look if we look at the percentage of jobs where men make more than 5% over women:

gap_summary_pct <- gap_summary %>%
  mutate(pct_gap = (Male - Female) / Female)

jobs_favor_men_5pct <- gap_summary_pct %>%
  filter(pct_gap >= 0.05) %>%
  nrow()

percent_favor_men_5pct <- round((jobs_favor_men_5pct / total_jobs) * 100, 1)

cat("Percentage of job titles where men earn 5% or more than women:", percent_favor_men_5pct, "%\n")
## Percentage of job titles where men earn 5% or more than women: 30 %

Conclusion

This analysis confirms that gender-based wage disparities persist across a wide range of job titles. While a few positions show women earning more, the majority still favor men. Specifically, r percent_favor_men% of job titles in this dataset show higher average base pay for men. Even more striking, r percent_favor_men_5pct% of job titles show men earning at least 5% more than women in the same role.

Although visualizations of extreme gaps offer helpful context, these summary statistics reinforce that the issue is widespread—not just isolated to a handful of high-paying positions. The findings support ongoing concerns about equitable compensation and highlight the need for continued transparency and organizational accountability in addressing wage inequality.