Data 110 Final Project

Author

Ameer Adegun

Data 110 Final Project

By: Ameer Adegun

#1 Introduction

For my final project I used the Employee Salaries 2020 dataset from the Montgomery County Open Data Portal. The dataset was collected by Montgomery County Government and contains the salary information for county employees in 2020. Each row is meant to represent a single employee and includes information about their department, division, gender, base salary, overtime pay, longevity pay, and pay grade. The variables I will use are: Department Name, Division, Gender, Base Salary, 2020 Overtime Pay, 2020 Longevity Pay, and Grade. I chose this dataset because I wanted to explore how salaries change across different departments and genders in Montgomery County. Since I live in Montgomery County if I were to get an understanding for how employees are paid would help me know what deparetment I want to try and get into out of college if I stay in the area. The data was collected by Montgomery County Government as part of their open data transparency initiative. There is no ReadMe file available with the dataset, so the exact methodology for data collection is not described.

Source: Montgomery County Open Data Portal: Employee Salaries 2020.

https://data.montgomerycountymd.gov

#2 Background Research

According to the American Association of University Women, more than half a century after pay discrimination became illegal in the United States, the pay gap between men and women continues to hurt workers and the national economy. The AAUW report that this gap affects women across industries and job types, including government employment. This is directly relevant to my analysis because the Montgomery County salary data gives a real local example of the broader national the AAUW is talking about.

Source: American Association of University Women: The Simple Truth About the Gender Pay Gap. https://www.aauw.org/resources/research/simple-truth/

#3 Data Cleaning

I will first import my libraries, set my directory, and load the dataset

library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.6
✔ forcats   1.0.1     ✔ stringr   1.6.0
✔ ggplot2   4.0.2     ✔ tibble    3.3.1
✔ lubridate 1.9.4     ✔ tidyr     1.3.2
✔ purrr     1.2.1     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(plotly)

Attaching package: 'plotly'

The following object is masked from 'package:ggplot2':

    last_plot

The following object is masked from 'package:stats':

    filter

The following object is masked from 'package:graphics':

    layout
setwd("C:/Users/SwagD/Downloads/Data 110")

salaries <- read_csv("Employee_Salaries_-_2020_20260507.csv")
Rows: 9958 Columns: 8
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (8): Department, Department Name, Division, Gender, Base Salary, 2020 Ov...

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
salaries
# A tibble: 9,958 × 8
   Department `Department Name`         Division            Gender `Base Salary`
   <chr>      <chr>                     <chr>               <chr>  <chr>        
 1 ABS        Alcohol Beverage Services Wholesale Administ… F      $78,902      
 2 ABS        Alcohol Beverage Services Administrative Ser… F      $35,926      
 3 ABS        Alcohol Beverage Services Administration      M      $167,345     
 4 ABS        Alcohol Beverage Services Wholesale Operatio… F      $90,848      
 5 ABS        Alcohol Beverage Services Administration      F      $78,902      
 6 ABS        Alcohol Beverage Services Marketing           F      $109,761     
 7 ABS        Alcohol Beverage Services Beer Delivery Oper… M      $68,575      
 8 ABS        Alcohol Beverage Services Liquor and Wine De… M      $50,604.28   
 9 ABS        Alcohol Beverage Services Beer Delivery Oper… M      $50,865.5    
10 ABS        Alcohol Beverage Services Liquor and Wine De… M      $49,359.65   
# ℹ 9,948 more rows
# ℹ 3 more variables: `2020 Overtime Pay` <chr>, `2020 Longevity Pay` <chr>,
#   Grade <chr>

Now I will clean the dataset

# Clean column names
names(salaries) <- tolower(names(salaries))
# Remove spaces from the column names
names(salaries) <- gsub(" ", "_", names(salaries))
# Fix all the columns that start with numbers 
names(salaries)[names(salaries) == "2020_overtime_pay"]  <- "overtime_pay"
names(salaries)[names(salaries) == "2020_longevity_pay"] <- "longevity_pay"

# Remove dollar signs and commas from the salary columns then convert to numeric
salaries_clean <- salaries |>
  mutate(
    base_salary     = as.numeric(gsub("[$,]", "", base_salary)),
    overtime_pay    = as.numeric(gsub("[$,]", "", overtime_pay)),
    longevity_pay   = as.numeric(gsub("[$,]", "", longevity_pay)),
    department_name = as.factor(department_name),
    division        = as.factor(division),
    gender          = as.factor(gender),
    grade           = as.factor(grade)
  ) |>
  filter(!is.na(base_salary), !is.na(gender)) |>
  filter(base_salary > 0)

nrow(salaries_clean)
[1] 9958

#4 Exploring the Dataset

summary(salaries_clean)
  department                                       department_name
 Length:9958        Department of Police                   :1940  
 Class :character   Department of Health and Human Services:1734  
 Mode  :character   Fire and Rescue Services               :1358  
                    Department of Transportation           :1323  
                    Correction and Rehabilitation          : 525  
                    Alcohol Beverage Services              : 438  
                    (Other)                                :2640  
                                           division    gender  
 School Health Services                        : 361   F:4091  
 Transit Gaithersburg Ride On                  : 343   M:5867  
 Transit Silver Spring Ride On                 : 299           
 Office of Eligibility and Support Services    : 243           
 Highway Services                              : 228           
 Behavioral Health and Crisis Services Division: 220           
 (Other)                                       :8264           
  base_salary      overtime_pay    longevity_pay         grade     
 Min.   : 11147   Min.   :     0   Min.   :    0.0   15     :1125  
 1st Qu.: 56994   1st Qu.:     0   1st Qu.:    0.0   P4     : 758  
 Median : 75290   Median :   415   Median :    0.0   24     : 578  
 Mean   : 78771   Mean   :  5182   Mean   :  923.6   21     : 531  
 3rd Qu.: 94668   3rd Qu.:  5394   3rd Qu.:    0.0   23     : 478  
 Max.   :280000   Max.   :141998   Max.   :12471.8   16     : 455  
                                                     (Other):6033  
# Count employees by gender
salaries_clean |>
  count(gender)
# A tibble: 2 × 2
  gender     n
  <fct>  <int>
1 F       4091
2 M       5867
# Average base salary by what department they work at
salaries_clean |>
  group_by(department_name) |>
  summarize(
    avg_salary = mean(base_salary, na.rm = TRUE),
    count      = n()
  ) |>
  arrange(desc(avg_salary))
# A tibble: 40 × 3
   department_name                                  avg_salary count
   <fct>                                                 <dbl> <int>
 1 Office of Intergovernmental Relations Department    148088.     5
 2 Ethics Commission                                   131574.     2
 3 Office of Zoning and Administrative Hearings        131329.     5
 4 Non-Departmental Account                            126963.    20
 5 Department of Technology Services                   125009.   147
 6 Offices of the County Executive                     120121.    37
 7 Office of Management and Budget                     118137.    35
 8 Office of Legislative Oversight                     116642.    15
 9 Office of Labor Relations                           115765.     7
10 County Attorney's Office                            115513.    74
# ℹ 30 more rows
# Average salary for Males and Females
salaries_clean |>
  group_by(gender) |>
  summarize(
    avg_salary    = mean(base_salary, na.rm = TRUE),
    median_salary = median(base_salary, na.rm = TRUE),
    count         = n()
  )
# A tibble: 2 × 4
  gender avg_salary median_salary count
  <fct>       <dbl>         <dbl> <int>
1 F          76764.        72898.  4091
2 M          80171.        77292.  5867

#5 Multiple Linear Regression

I will now run a multiple linear regression to see what factors predict an employee’s base salary. I chose gender, department name, and overtime pay as predictors because I want to understand how each factor influences salary while controlling for the others.

# Fit multiple linear regression model
model <- lm(base_salary ~ gender + department_name + overtime_pay, data = salaries_clean)

summary(model)

Call:
lm(formula = base_salary ~ gender + department_name + overtime_pay, 
    data = salaries_clean)

Residuals:
    Min      1Q  Median      3Q     Max 
-100277  -18730   -2362   14484  162765 

Coefficients:
                                                                     Estimate
(Intercept)                                                         5.004e+04
genderM                                                             7.194e+03
department_nameBoard of Appeals Department                          4.400e+04
department_nameBoard of Elections                                   2.672e+04
department_nameCommunity Engagement Cluster                         1.415e+04
department_nameCommunity Use of Public Facilities                   3.592e+04
department_nameCorrection and Rehabilitation                        2.043e+04
department_nameCounty Attorney's Office                             6.332e+04
department_nameCounty Council                                       4.234e+04
department_nameDepartment of Environmental Protection               4.173e+04
department_nameDepartment of Finance                                5.034e+04
department_nameDepartment of General Services                       2.771e+04
department_nameDepartment of Health and Human Services              2.884e+04
department_nameDepartment of Housing and Community Affairs          3.967e+04
department_nameDepartment of Permitting Services                    4.220e+04
department_nameDepartment of Police                                 2.191e+04
department_nameDepartment of Public Libraries                       5.328e+03
department_nameDepartment of Recreation                             2.528e+04
department_nameDepartment of Technology Services                    7.016e+04
department_nameDepartment of Transportation                         7.635e+03
department_nameEthics Commission                                    7.794e+04
department_nameFire and Rescue Services                             2.374e+04
department_nameMerit System Protection Board Department             3.023e+04
department_nameNon-Departmental Account                             7.404e+04
department_nameOffice of Agriculture                                2.606e+04
department_nameOffice of Animal Services                            1.121e+04
department_nameOffice of Consumer Protection                        4.989e+04
department_nameOffice of Emergency Management and Homeland Security 5.173e+04
department_nameOffice of Human Resources                            5.350e+04
department_nameOffice of Human Rights                               5.104e+04
department_nameOffice of Intergovernmental Relations Department     9.805e+04
department_nameOffice of Labor Relations                            5.956e+04
department_nameOffice of Legislative Oversight                      6.372e+04
department_nameOffice of Management and Budget                      6.481e+04
department_nameOffice of Procurement                                4.439e+04
department_nameOffice of Public Information                         2.406e+04
department_nameOffice of Racial Equity and Social Justice           6.224e+04
department_nameOffice of the Inspector General                      6.275e+04
department_nameOffice of Zoning and Administrative Hearings         7.841e+04
department_nameOffices of the County Executive                      6.755e+04
department_nameSheriff's Office                                     2.485e+04
overtime_pay                                                        1.743e-01
                                                                    Std. Error
(Intercept)                                                          1.386e+03
genderM                                                              6.524e+02
department_nameBoard of Appeals Department                           1.560e+04
department_nameBoard of Elections                                    5.198e+03
department_nameCommunity Engagement Cluster                          3.223e+03
department_nameCommunity Use of Public Facilities                    5.439e+03
department_nameCorrection and Rehabilitation                         1.792e+03
department_nameCounty Attorney's Office                              3.398e+03
department_nameCounty Council                                        3.005e+03
department_nameDepartment of Environmental Protection                2.485e+03
department_nameDepartment of Finance                                 2.802e+03
department_nameDepartment of General Services                        1.843e+03
department_nameDepartment of Health and Human Services               1.498e+03
department_nameDepartment of Housing and Community Affairs           3.108e+03
department_nameDepartment of Permitting Services                     2.202e+03
department_nameDepartment of Police                                  1.430e+03
department_nameDepartment of Public Libraries                        1.911e+03
department_nameDepartment of Recreation                              2.572e+03
department_nameDepartment of Technology Services                     2.568e+03
department_nameDepartment of Transportation                          1.487e+03
department_nameEthics Commission                                     1.907e+04
department_nameFire and Rescue Services                              1.531e+03
department_nameMerit System Protection Board Department              1.907e+04
department_nameNon-Departmental Account                              6.159e+03
department_nameOffice of Agriculture                                 8.608e+03
department_nameOffice of Animal Services                             3.438e+03
department_nameOffice of Consumer Protection                         7.309e+03
department_nameOffice of Emergency Management and Homeland Security  7.068e+03
department_nameOffice of Human Resources                             3.775e+03
department_nameOffice of Human Rights                                9.063e+03
department_nameOffice of Intergovernmental Relations Department      1.211e+04
department_nameOffice of Labor Relations                             1.025e+04
department_nameOffice of Legislative Oversight                       7.071e+03
department_nameOffice of Management and Budget                       4.732e+03
department_nameOffice of Procurement                                 5.165e+03
department_nameOffice of Public Information                          3.493e+03
department_nameOffice of Racial Equity and Social Justice            1.560e+04
department_nameOffice of the Inspector General                       8.220e+03
department_nameOffice of Zoning and Administrative Hearings          1.211e+04
department_nameOffices of the County Executive                       4.616e+03
department_nameSheriff's Office                                      2.259e+03
overtime_pay                                                         2.821e-02
                                                                    t value
(Intercept)                                                          36.106
genderM                                                              11.026
department_nameBoard of Appeals Department                            2.821
department_nameBoard of Elections                                     5.140
department_nameCommunity Engagement Cluster                           4.391
department_nameCommunity Use of Public Facilities                     6.605
department_nameCorrection and Rehabilitation                         11.398
department_nameCounty Attorney's Office                              18.634
department_nameCounty Council                                        14.090
department_nameDepartment of Environmental Protection                16.790
department_nameDepartment of Finance                                 17.968
department_nameDepartment of General Services                        15.035
department_nameDepartment of Health and Human Services               19.255
department_nameDepartment of Housing and Community Affairs           12.767
department_nameDepartment of Permitting Services                     19.167
department_nameDepartment of Police                                  15.330
department_nameDepartment of Public Libraries                         2.787
department_nameDepartment of Recreation                               9.828
department_nameDepartment of Technology Services                     27.326
department_nameDepartment of Transportation                           5.133
department_nameEthics Commission                                      4.086
department_nameFire and Rescue Services                              15.508
department_nameMerit System Protection Board Department               1.585
department_nameNon-Departmental Account                              12.022
department_nameOffice of Agriculture                                  3.027
department_nameOffice of Animal Services                              3.261
department_nameOffice of Consumer Protection                          6.826
department_nameOffice of Emergency Management and Homeland Security   7.320
department_nameOffice of Human Resources                             14.169
department_nameOffice of Human Rights                                 5.632
department_nameOffice of Intergovernmental Relations Department       8.093
department_nameOffice of Labor Relations                              5.809
department_nameOffice of Legislative Oversight                        9.012
department_nameOffice of Management and Budget                       13.695
department_nameOffice of Procurement                                  8.593
department_nameOffice of Public Information                           6.887
department_nameOffice of Racial Equity and Social Justice             3.990
department_nameOffice of the Inspector General                        7.634
department_nameOffice of Zoning and Administrative Hearings           6.477
department_nameOffices of the County Executive                       14.633
department_nameSheriff's Office                                      10.998
overtime_pay                                                          6.178
                                                                    Pr(>|t|)
(Intercept)                                                          < 2e-16
genderM                                                              < 2e-16
department_nameBoard of Appeals Department                           0.00480
department_nameBoard of Elections                                   2.81e-07
department_nameCommunity Engagement Cluster                         1.14e-05
department_nameCommunity Use of Public Facilities                   4.17e-11
department_nameCorrection and Rehabilitation                         < 2e-16
department_nameCounty Attorney's Office                              < 2e-16
department_nameCounty Council                                        < 2e-16
department_nameDepartment of Environmental Protection                < 2e-16
department_nameDepartment of Finance                                 < 2e-16
department_nameDepartment of General Services                        < 2e-16
department_nameDepartment of Health and Human Services               < 2e-16
department_nameDepartment of Housing and Community Affairs           < 2e-16
department_nameDepartment of Permitting Services                     < 2e-16
department_nameDepartment of Police                                  < 2e-16
department_nameDepartment of Public Libraries                        0.00533
department_nameDepartment of Recreation                              < 2e-16
department_nameDepartment of Technology Services                     < 2e-16
department_nameDepartment of Transportation                         2.90e-07
department_nameEthics Commission                                    4.42e-05
department_nameFire and Rescue Services                              < 2e-16
department_nameMerit System Protection Board Department              0.11301
department_nameNon-Departmental Account                              < 2e-16
department_nameOffice of Agriculture                                 0.00247
department_nameOffice of Animal Services                             0.00111
department_nameOffice of Consumer Protection                        9.22e-12
department_nameOffice of Emergency Management and Homeland Security 2.68e-13
department_nameOffice of Human Resources                             < 2e-16
department_nameOffice of Human Rights                               1.83e-08
department_nameOffice of Intergovernmental Relations Department     6.49e-16
department_nameOffice of Labor Relations                            6.48e-09
department_nameOffice of Legislative Oversight                       < 2e-16
department_nameOffice of Management and Budget                       < 2e-16
department_nameOffice of Procurement                                 < 2e-16
department_nameOffice of Public Information                         6.06e-12
department_nameOffice of Racial Equity and Social Justice           6.65e-05
department_nameOffice of the Inspector General                      2.49e-14
department_nameOffice of Zoning and Administrative Hearings         9.81e-11
department_nameOffices of the County Executive                       < 2e-16
department_nameSheriff's Office                                      < 2e-16
overtime_pay                                                        6.76e-10
                                                                       
(Intercept)                                                         ***
genderM                                                             ***
department_nameBoard of Appeals Department                          ** 
department_nameBoard of Elections                                   ***
department_nameCommunity Engagement Cluster                         ***
department_nameCommunity Use of Public Facilities                   ***
department_nameCorrection and Rehabilitation                        ***
department_nameCounty Attorney's Office                             ***
department_nameCounty Council                                       ***
department_nameDepartment of Environmental Protection               ***
department_nameDepartment of Finance                                ***
department_nameDepartment of General Services                       ***
department_nameDepartment of Health and Human Services              ***
department_nameDepartment of Housing and Community Affairs          ***
department_nameDepartment of Permitting Services                    ***
department_nameDepartment of Police                                 ***
department_nameDepartment of Public Libraries                       ** 
department_nameDepartment of Recreation                             ***
department_nameDepartment of Technology Services                    ***
department_nameDepartment of Transportation                         ***
department_nameEthics Commission                                    ***
department_nameFire and Rescue Services                             ***
department_nameMerit System Protection Board Department                
department_nameNon-Departmental Account                             ***
department_nameOffice of Agriculture                                ** 
department_nameOffice of Animal Services                            ** 
department_nameOffice of Consumer Protection                        ***
department_nameOffice of Emergency Management and Homeland Security ***
department_nameOffice of Human Resources                            ***
department_nameOffice of Human Rights                               ***
department_nameOffice of Intergovernmental Relations Department     ***
department_nameOffice of Labor Relations                            ***
department_nameOffice of Legislative Oversight                      ***
department_nameOffice of Management and Budget                      ***
department_nameOffice of Procurement                                ***
department_nameOffice of Public Information                         ***
department_nameOffice of Racial Equity and Social Justice           ***
department_nameOffice of the Inspector General                      ***
department_nameOffice of Zoning and Administrative Hearings         ***
department_nameOffices of the County Executive                      ***
department_nameSheriff's Office                                     ***
overtime_pay                                                        ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 26910 on 9916 degrees of freedom
Multiple R-squared:  0.2068,    Adjusted R-squared:  0.2035 
F-statistic: 63.04 on 41 and 9916 DF,  p-value: < 2.2e-16
# R-squared and Adjusted R-squared
r_squared     <- summary(model)$r.squared
adj_r_squared <- summary(model)$adj.r.squared

cat("R-squared:", round(r_squared, 3),
    "\nAdjusted R-squared:", round(adj_r_squared, 3))
R-squared: 0.207 
Adjusted R-squared: 0.203

The model equation is: Base Salary = intercept + gender coefficient + department coefficient + overtime pay coefficient. The R-squared tells us how much of the variation in base salary is explained by the model. The adjusted R-squared accounts for the number of predictors. Each coefficient tells us how much the base salary changes for that category compared to the reference group, while holding everything else constant.

# Check assumptions: Residuals compared to Fitted
plot(model$fitted.values, model$residuals,
     main = "Residuals vs Fitted",
     xlab = "Fitted Values",
     ylab = "Residuals")
abline(h = 0)

# Check normality of residuals
qqnorm(model$residuals)
qqline(model$residuals)

# The RMSE
residuals <- model$residuals
rmse <- sqrt(mean(residuals^2))
cat("RMSE:", round(rmse, 2))
RMSE: 26854.33

The residuals vs fitted plot checks linearity and homoscedasticity, and I want a random scatter around zero. The Q-Q plot checks normality of residuals, and the points should follow the diagonal line. Independence is reasonable since each row is a separate employee. No multicollinearity is expected since gender, department, and overtime pay measure different things.

#6 Visualization 1: Average Base Salary by Department and Gender

# Get the top 6 departments by employee count
top_6_depts <- salaries_clean |>
  count(department_name, sort = TRUE) |>
  head(6) |>
  pull(department_name)

# Filter to only those 6 departments then summarize
dept_summary <- salaries_clean |>
  filter(department_name %in% top_6_depts) |>
  group_by(department_name, gender) |>
  summarize(
    avg_salary = mean(base_salary, na.rm = TRUE),
    count      = n()
  )
`summarise()` has grouped output by 'department_name'. You can override using
the `.groups` argument.
# Plot average salary by top 6 departments filled by gender
ggplot(dept_summary, aes(x = reorder(department_name, -avg_salary, sum),
                          y = avg_salary,
                          fill = gender)) +
  geom_col(position = "dodge", color = "black") +
  scale_fill_brewer(palette = "Set4") +
  scale_y_continuous(labels = scales::comma) +
  annotate("text", x = 1, y = 145000,
           label = "Highest avg\nsalary dept",
           size = 3, color = "black", fontface = "italic") +
  labs(
    title   = "Average Base Salary by Department and Gender:Top 6 Departments (2020)",
    x       = "Department",
    y       = "Average Base Salary ($)",
    fill    = "Gender",
    caption = "Source: Montgomery County Open Data Portal:  Employee Salaries 2020"
  ) +
  theme_minimal() +
  theme(
    legend.position = "bottom",
    axis.text.x = element_text(angle = 45, hjust = 1)
  )
Warning: Unknown palette: "Set4"

This bar chart compares average base salaries across the six largest departments in Montgomery County broken down by gender. I made it only look at the top 6 departments by employee count to keep the chart readable. It shows that public safety departments like Police and Fire and Rescue tend to have higher average salaries overall. In most departments, male employees have a higher average base salary than female employees, which is consistent with the gender pay gap research I did in background research. Departments like Health and Human Services, which have more female workers, tend to have lower average salaries overall.

#7 Visualization 2: Base Salary compared to Overtime Pay by Gender

# Sample down to 500 points to keep the scatter plot clean
set.seed(42)
salaries_plot <- salaries_clean |>
  slice_sample(n = 500)

# My graph
plot_ly(
  data   = salaries_plot,
  x      = ~overtime_pay,
  y      = ~base_salary,
  color  = ~gender,
  colors = c("F" = "red", "M" = "blue"),
  type   = "scatter",
  mode   = "markers",
  marker = list(size = 7, opacity = 0.7),
  text   = ~paste0("Department: ", department_name,
                   "<br>Gender: ", gender,
                   "<br>Base Salary: $", format(base_salary, big.mark = ","),
                   "<br>Overtime Pay: $", format(overtime_pay, big.mark = ",")),
  hoverinfo = "text"
) |>
  layout(
    title  = "Base Salary vs Overtime Pay by Gender (2020)",
    xaxis  = list(title = "Overtime Pay ($)"),
    yaxis  = list(title = "Base Salary ($)"),
    legend = list(title = list(text = "Gender")),
    annotations = list(
      list(
        x = max(salaries_plot$overtime_pay, na.rm = TRUE) * 0.7,
        y = max(salaries_plot$base_salary, na.rm = TRUE) * 0.95,
        text = "Source: Montgomery County Open Data Portal",
        showarrow = FALSE,
        font = list(size = 10, color = "black")
      )
    )
  )

The scatter plot shows the relationship between overtime pay and base salary for a random sample of 500 employees, colored by gender. When I hover over any point it will reveal the department, gender, base salary, and overtime pay for that employee. The plot shows that higher overtime pay does not always mean a higher base salary as many employees with high overtime are in departments like Police and Fire where base salaries vary. Male employees are more spread across the overtime range, while female employees are at lower overtime levels. This is follows my research showing women are less likely to work overtime in public safety roles.

#8 Conclusion

Process:

I cleaned the dataset by removing dollar signs and commas from the salary columns using gsub() so they could be converted to numeric values. I then filtered out rows with missing salary values or zero salaries. I used tolower() and gsub() to clean the column names, and converted categorical variables to factors. I used group_by() and summarize() to explore salary patterns by department and gender, filter() to focus on the top departments, and arrange() to sort by average salary. I used slice_sample() to sample down to 500 points for the scatter plot.

Visualization Insight:

The bar chart shows that public safety departments have the highest average salaries and that male employees earn more on average than female employees in most departments. The scatter plot adds onto that by showing the relationship between overtime pay and base salary for employees. The most interesting thing I saw was that the gender pay gap is not always the same. It is the widest in departments like Police and Fire, and smaller in departments like Health and Human Services. This suggests the gap is partly driven by which roles men and women play rather than direct discrimination. One thing I wish the dataset had so I could add it was years of experience for each employee, which would help explain salary differences more accurately.

Limitations:

This dataset only covers 2020 and does not include years of experience, job title, or hours worked, which are all factors that would help explain salary differences more accurately. A future version of this analysis could include multiple years to track salary trends over time.

Sources/AI Usage:

Dataset: Montgomery County Open Data Portal: Employee Salaries 2020. https://data.montgomerycountymd.gov

American Association of University Women.The Simple Truth About the Gender Pay Gap. https://www.aauw.org/resources/research/simple-truth/

I used Gemini to get gsub([$]) to remove dollar signs and commas from the salary columns so they could be converted to numeric. I did not know how to clean it normally

I used Gemini to get pull() inside the filter step to extract the top department names as a vector to use in the filter.

I used Gemini to get plot_ly() from the plotly library to make the scatter plot interactive. I saw you suggest the function in things to do for our interactive and decided to search up what it did. After finding out what it did I saw that it would fit my purposes and used it.

I used Gemini to get scale_y_continuous(labels = scales::comma) because my salary numbers were showing as scientific notation. I did not know how to fix it so I asked Gemini and it gave me that command to turn them into normal numbers like 80,000 and 100,000.

I used Gemini to get reorder() because I wanted my bar chart to go from highest average salary to lowest instead of alphabetically. I did not know how to sort bars in ggplot so I asked Gemini.

I used Gemini to get annotate() because I wanted to add a text label directly onto my bar chart pointing out the highest salary department. I did not know how to add text to a ggplot without it being part of the data so I asked Gemini.

I used Gemini to get slice_sample() because I needed to randomly pick 500 rows from my dataset so the scatter plot wouldn’t be too cluttered and the data wouldn’t be biased toward specific rows I picked manually.