Capstone Project - Employee Salaries

Introduction

I chose the employee salaries dataset for this project. This dataset lists government employee salaries and is divided by department, gender, base salary, and overtime pay. The data is from the Data Montgomery website.

Outline

Q1. Salaries by Department

Q2. Correlation between Base Salary and Overtime Pay

Q3. Top 10 Overtime Department

Q4. Relationship between Gender and Base Salary

Q5. US Yearly Inflation Rate vs. Base Salary Increase Rate in 2019 - 2021

library(dplyr)

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

library(ggplot2)
library(tidyverse)

## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──

## ✓ tibble  3.1.2     ✓ purrr   0.3.4
## ✓ tidyr   1.1.3     ✓ stringr 1.4.0
## ✓ readr   1.4.0     ✓ forcats 0.5.1

## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()

Exploratory Data Analysis (EDA) is the process of observing and understanding the characteristics of data by examining, converting, and organizing data prior to performing actual analysis. Without this process, the analysis methodology selection may not be appropriate and the data required for the modeling performed may be insufficient. If EDA is largely divided, it can be divided into preprocessing and visualization.

First, let’s bring in the data and take a look at the outline.

setwd("~/Desktop/학교/Data 205")
ES2020 <- read_csv("Employee_Salaries_-_2020.csv")

## 
## ── Column specification ────────────────────────────────────────────────────────
## cols(
##   Department = col_character(),
##   `Department Name` = col_character(),
##   Division = col_character(),
##   Gender = col_character(),
##   `Base Salary` = col_double(),
##   `2020 Overtime Pay` = col_double(),
##   `2020 Longevity Pay` = col_double(),
##   Grade = col_character()
## )

str(ES2020)       # Gets the top few rows.

## spec_tbl_df [9,958 × 8] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
##  $ Department        : chr [1:9958] "ABS" "ABS" "ABS" "ABS" ...
##  $ Department Name   : chr [1:9958] "Alcohol Beverage Services" "Alcohol Beverage Services" "Alcohol Beverage Services" "Alcohol Beverage Services" ...
##  $ Division          : chr [1:9958] "Wholesale Administration" "Administrative Services" "Administration" "Wholesale Operations" ...
##  $ Gender            : chr [1:9958] "F" "F" "M" "F" ...
##  $ Base Salary       : num [1:9958] 78902 35926 167345 90848 78902 ...
##  $ 2020 Overtime Pay : num [1:9958] 199 0 0 0 205 ...
##  $ 2020 Longevity Pay: num [1:9958] 0 4039 0 5718 2460 ...
##  $ Grade             : chr [1:9958] "18" "16" "M2" "21" ...
##  - attr(*, "spec")=
##   .. cols(
##   ..   Department = col_character(),
##   ..   `Department Name` = col_character(),
##   ..   Division = col_character(),
##   ..   Gender = col_character(),
##   ..   `Base Salary` = col_double(),
##   ..   `2020 Overtime Pay` = col_double(),
##   ..   `2020 Longevity Pay` = col_double(),
##   ..   Grade = col_character()
##   .. )

Data Cleaning

ES20 <- ES2020 %>%
  select(Department, 
         `Department Name`, 
         Gender, 
         `Base Salary`, 
         Grade, 
         `2020 Overtime Pay`, 
         `2020 Longevity Pay`)

names(ES20) <- gsub(" ", "", names(ES20))   # remove spaces
str(ES20)

## tibble [9,958 × 7] (S3: tbl_df/tbl/data.frame)
##  $ Department      : chr [1:9958] "ABS" "ABS" "ABS" "ABS" ...
##  $ DepartmentName  : chr [1:9958] "Alcohol Beverage Services" "Alcohol Beverage Services" "Alcohol Beverage Services" "Alcohol Beverage Services" ...
##  $ Gender          : chr [1:9958] "F" "F" "M" "F" ...
##  $ BaseSalary      : num [1:9958] 78902 35926 167345 90848 78902 ...
##  $ Grade           : chr [1:9958] "18" "16" "M2" "21" ...
##  $ 2020OvertimePay : num [1:9958] 199 0 0 0 205 ...
##  $ 2020LongevityPay: num [1:9958] 0 4039 0 5718 2460 ...

#If the number is located at the front of the string, assign the name using the 'names' function because it will cause an error.

names(ES20) <- c('Department', 
                 'DepartmentName', 
                 'Gender', 
                 'BaseSalary', 
                 'Grade', 
                 'OvertimePay', 
                 'LongevityPay')
ES20

## # A tibble: 9,958 x 7
##    Department DepartmentName    Gender BaseSalary Grade OvertimePay LongevityPay
##    <chr>      <chr>             <chr>       <dbl> <chr>       <dbl>        <dbl>
##  1 ABS        Alcohol Beverage… F          78902  18           199.           0 
##  2 ABS        Alcohol Beverage… F          35926  16             0         4039.
##  3 ABS        Alcohol Beverage… M         167345  M2             0            0 
##  4 ABS        Alcohol Beverage… F          90848  21             0         5718.
##  5 ABS        Alcohol Beverage… F          78902  18           205.        2460.
##  6 ABS        Alcohol Beverage… F         109761  25             0            0 
##  7 ABS        Alcohol Beverage… M          68575  15          6606.           0 
##  8 ABS        Alcohol Beverage… M          50604. 10           805.           0 
##  9 ABS        Alcohol Beverage… M          50866. 10          4140.           0 
## 10 ABS        Alcohol Beverage… M          49360. 10          4084.           0 
## # … with 9,948 more rows

ES20 %>%
  arrange(desc(BaseSalary))

## # A tibble: 9,958 x 7
##    Department DepartmentName    Gender BaseSalary Grade OvertimePay LongevityPay
##    <chr>      <chr>             <chr>       <dbl> <chr>       <dbl>        <dbl>
##  1 CEX        Offices of the C… M          280000 0               0           0 
##  2 CEX        Offices of the C… M          250000 0               0           0 
##  3 POL        Department of Po… M          225000 0               0           0 
##  4 HHS        Department of He… M          223953 MD4             0        4653.
##  5 ABS        Alcohol Beverage… M          220000 0               0           0 
##  6 HHS        Department of He… M          218500 0               0           0 
##  7 DTS        Department of Te… M          215120 0               0           0 
##  8 IGR        Office of Interg… F          212556 0               0           0 
##  9 CAT        County Attorney'… M          210143 0               0           0 
## 10 DGS        Department of Ge… M          210143 0               0           0 
## # … with 9,948 more rows

Q1. Salaries by Department

BSTOP <- ES20 %>%
  group_by(DepartmentName) %>%
  summarise(BSAverage = mean(BaseSalary)) %>%
  arrange(desc(BSAverage)) %>%
  head(10)
BSTOP

## # A tibble: 10 x 2
##    DepartmentName                                   BSAverage
##    <chr>                                                <dbl>
##  1 Office of Intergovernmental Relations Department   148088.
##  2 Ethics Commission                                  131574.
##  3 Office of Zoning and Administrative Hearings       131329.
##  4 Non-Departmental Account                           126963.
##  5 Department of Technology Services                  125009.
##  6 Offices of the County Executive                    120121.
##  7 Office of Management and Budget                    118137.
##  8 Office of Legislative Oversight                    116642.
##  9 Office of Labor Relations                          115765.
## 10 County Attorney's Office                           115513.

BSTAIL <- ES20 %>%
  group_by(DepartmentName) %>%
  summarise(BSAverage = mean(BaseSalary)) %>%
  arrange(BSAverage) %>%
  head(10)
BSTAIL

## # A tibble: 10 x 2
##    DepartmentName                          BSAverage
##    <chr>                                       <dbl>
##  1 Alcohol Beverage Services                  56071.
##  2 Department of Public Libraries             56980.
##  3 Office of Animal Services                  63544.
##  4 Department of Transportation               64448.
##  5 Community Engagement Cluster               69482.
##  6 Department of Police                       77313.
##  7 Office of Public Information               77382.
##  8 Correction and Rehabilitation              77746.
##  9 Department of Recreation                   78356.
## 10 Department of Health and Human Services    80121.

library(scales)

## 
## Attaching package: 'scales'

## The following object is masked from 'package:purrr':
## 
##     discard

## The following object is masked from 'package:readr':
## 
##     col_factor

TOP <- ggplot(BSTOP, aes(x = BSAverage, y = DepartmentName)) +
  geom_col(fill = 'darkgreen', alpha = .5) +
  geom_text(aes(label = dollar(BSAverage)), size = 3, hjust = 1.1) +
  ggtitle("Top 10 Salaries by Department") +
  ylab("Department") +
  xlab("Base Salary") +
  scale_x_continuous(labels = dollar)

TAIL <- ggplot(BSTAIL, aes(x = BSAverage, y = DepartmentName)) +
  geom_col(fill = 'darkorange', alpha = .5) +
  geom_text(aes(label = dollar(BSAverage)), size = 3, hjust = -.1) +
  scale_fill_manual(values = c('darkgreen', 'darkorange')) +
  ggtitle("Tail 10 Salaries by Department") +
  scale_x_reverse(labels = dollar) +
  scale_y_discrete(position = 'right') +
  ylab("Department") +
  xlab("Base Salary")

library(gridExtra)

## 
## Attaching package: 'gridExtra'

## The following object is masked from 'package:dplyr':
## 
##     combine

grid.arrange(TOP, TAIL, nrow = 2)

The green graph is the top 10 departments with a high base salary, and the orange graph is the tail 10 departments with a low base salary. Looking at this top 10 graph, the average base salary of employees in the Office of Intergovernmental Relations Department seems to be remarkably high. And looking at this tail 10 graph, the average base salary of employees at Alcohol Beverage Services and Department of Public Libraries seems to be significantly lower.

Q2. Correlation between Base Salary and Overtime Pay

BS <- ES20 %>%

  ggplot(aes(x = BaseSalary)) +
  geom_density(aes(fill = BaseSalary)) +
  geom_area(stat = 'bin', fill = 'pink') +
  scale_x_continuous(labels = dollar)

OT <- ES20 %>%
  filter(OvertimePay != 0) %>%
  
  ggplot(aes(x = OvertimePay)) +
  geom_density(aes(fill = OvertimePay)) +  
  geom_area(stat = 'bin', fill = 'light blue') +
  scale_x_continuous(labels = dollar)

library(ggpubr)

ggarrange(BS, OT, labels = c("Base Salary", "Overtime Pay"),
         common.legend = TRUE, legend = "bottom")

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

I wanted to know the correlation between Base Salary and Overtime Pay. There were a lot of 0 data, so I removed all 0 data. Most employees earn between $50,000 and $100,000 a year. And in the Overtime Pay graph, many employees were found to do overtime for too little money, even though I removed zero data. I looked at this graph and wondered which department gets this low overtime pay, so I drew the following graph.

CO <- ES20 %>%
  select(OvertimePay, BaseSalary, DepartmentName) %>%
  filter(OvertimePay != 0, DepartmentName == "Fire and Rescue Services" | DepartmentName == "Correction and Rehabilitation" | DepartmentName == "Department of Police") 

ggplot(CO, aes(x = BaseSalary, y = OvertimePay, col = DepartmentName)) +
  geom_point(alpha = .2) +
  geom_smooth() +
  ggtitle("Correlation between Base Salary and Overtime Pay") +
  scale_x_continuous(labels = dollar) +
  scale_y_continuous(labels = dollar)

## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'

I found that the most overtime departments are Fire and Rescue Services, Department of Police, and Correction and Rehabilitation. And I wanted to see the difference between these three departments in overtime pay. The Correction and Rehabilitation Department and the Fire Department still seem to be getting paid quite a bit as much as they overtime work. However, the Department of Police was found to have significantly lower overtime pay compared to the two departments, despite having a lot of overtime. Also, Fire and Rescue Services has a positive correlation, this means that Fire Department employees earn the most overtime compared to other two departments. And the Correction and Rehabilitation also have a positive correlation since the shape tends to increase, but away from a straight line. Finally, the Department of Police looks a little confusing, but the data analysis shows that there are both positive and negative correlations. It also looks a kind of strong correlation since the shape looks close to a line. And Department of Police employees earn the least overtime compared to other two departments.

A side note: I remembered hearing that Fire and Rescue Services and Correction and Rehabilitation need multiple employees to be at the Fire Station and detention facilities 24/7 (taking turns of course), and the overtime pay takes place on these shifts that are out of the normal business hours. That’s why I see the significant amount of overtime pay for Fire and Rescue Services and Correction and Rehabilitation employees.

Q3. Top 10 Overtime Department

OT10 <- ES20 %>%
  filter(OvertimePay != 0) %>%
  group_by(DepartmentName) %>%
  summarise(OTPAverage = mean(OvertimePay), count = n()) %>%
  mutate(OTaverage = round(OTPAverage, 2)) %>%
  arrange(desc(count))
OT101 <- head(OT10, 10)
OT101

## # A tibble: 10 x 4
##    DepartmentName                          OTPAverage count OTaverage
##    <chr>                                        <dbl> <int>     <dbl>
##  1 Department of Police                         5647.  1511     5647 
##  2 Fire and Rescue Services                    18279.  1168    18279.
##  3 Department of Transportation                 6791.  1096     6791.
##  4 Correction and Rehabilitation               19065.   440    19065.
##  5 Department of Health and Human Services      2965.   383     2965.
##  6 Alcohol Beverage Services                    2272.   348     2272.
##  7 Department of General Services               6178.   245     6178.
##  8 Sheriff's Office                             2134.   172     2134.
##  9 Department of Permitting Services            2138.    73     2138.
## 10 Office of Animal Services                    1601.    50     1601.

OT101 %>%
ggplot() +
  geom_col(aes(x = reorder(OTaverage, -count), y = count, fill = DepartmentName),
       position = "dodge", stat = "identity") +
  xlab("Over Time Pay Average ($)") +
  ggtitle("Top 10 Overtime Department") +
  coord_flip()

## Warning: Ignoring unknown parameters: stat

The top 10 departments that do the most overtime work are shown in a bar graph. Department of Police is also the largest with 1,500 cases, followed by Fire and Rescue Services. At a glance, the Department of Police seems to have low overtime pay.

library(plotly)

## 
## Attaching package: 'plotly'

## The following object is masked from 'package:ggplot2':
## 
##     last_plot

## The following object is masked from 'package:stats':
## 
##     filter

## The following object is masked from 'package:graphics':
## 
##     layout

pie <- plot_ly(OT101, labels = ~DepartmentName, values = ~OTaverage, type = "pie") %>%
  layout(title = "Overtime Pay Proportion of Top 10 Overtime Department")
pie

Even in this pie chart, the police department is found that they do not receive much overtime pay. Fire and Rescue Services are paid as much as they are over time, and Correction and Rehabilitation Department is found to have higher pay than overtime work.

Q4. Relationship between Gender and Base Salary

ggplot(ES20, aes(x = BaseSalary, y = Gender)) +
  geom_jitter(col = 'gray') +
  geom_boxplot(alpha = .5, outlier.color = 'red') +
  stat_summary(fun.y = "mean", geom = "point", shape = 23, size = 3, fill = "red") +
  ggtitle("Relationship between Gender and Base Salary") +
  scale_x_continuous(labels = dollar) +
  theme_classic()

## Warning: `fun.y` is deprecated. Use `fun` instead.

I expressed the relationship between gender and base salary in a box plot. The male’s and female’s midian income don’t show much difference, but if you see the red diamond, the male’s incomes look a little higher than the female’s.

However, the red diamond is the mean and the mean is affected by the outliers. So, I decided to compare the mean with outliers and the mean without outliers.

mES20 <- ES20 %>%
  group_by(Gender) %>%
  summarise(mean = mean(BaseSalary))

names(mES20) <- c('Gender', 'Mean w/ Outliers')
mES20

## # A tibble: 2 x 2
##   Gender `Mean w/ Outliers`
##   <chr>               <dbl>
## 1 F                  76764.
## 2 M                  80171.

library(outliers)

## Warning: package 'outliers' was built under R version 4.1.2

MES20 <- ES20 %>%
  #select(BaseSalary, Gender) %>%
  group_by(Gender) %>%
  filter(!BaseSalary %in% c(outlier(BaseSalary))) %>%
  summarise(Mean = mean(BaseSalary, na.rm = TRUE)) 

names(MES20) <- c('Gender', 'Mean w/o Outliers')
MES20

## # A tibble: 2 x 2
##   Gender `Mean w/o Outliers`
##   <chr>                <dbl>
## 1 F                   76731.
## 2 M                   80137.

Even removing the outliers does not seem to make a significant difference, so it was concluded that males receive a higher salary.

Q5. Inflation Rate vs. Base Salary Increase Rate in 2019 - 2021

ES19 <- read_csv("Employee_Salaries_-_2019.csv")

## 
## ── Column specification ────────────────────────────────────────────────────────
## cols(
##   Department = col_character(),
##   `Department Name` = col_character(),
##   Division = col_character(),
##   Gender = col_character(),
##   `Base Salary` = col_double(),
##   `2019 Overtime Pay` = col_double(),
##   `2019 Longevity Pay` = col_double(),
##   Grade = col_character()
## )

ES21 <- read_csv("Employee_Salaries_-_2021.csv")

## 
## ── Column specification ────────────────────────────────────────────────────────
## cols(
##   Department = col_character(),
##   `Department Name` = col_character(),
##   Division = col_character(),
##   Gender = col_character(),
##   `Base Salary` = col_double(),
##   `2021 Overtime Pay` = col_double(),
##   `2021 Longevity Pay` = col_double(),
##   Grade = col_character()
## )

names(ES19) <- gsub(" ", "", names(ES19))   # remove spaces
str(ES19)

## spec_tbl_df [10,105 × 8] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
##  $ Department      : chr [1:10105] "BOA" "BOA" "BOA" "BOE" ...
##  $ DepartmentName  : chr [1:10105] "Board of Appeals Department" "Board of Appeals Department" "Board of Appeals Department" "Board of Elections" ...
##  $ Division        : chr [1:10105] "Board of Appeals Division" "Board of Appeals Division" "Executive Director" "Director" ...
##  $ Gender          : chr [1:10105] "F" "F" "F" "F" ...
##  $ BaseSalary      : num [1:10105] 78902 58482 144751 183654 62488 ...
##  $ 2019OvertimePay : num [1:10105] 0 0 0 0 0 0 0 0 0 0 ...
##  $ 2019LongevityPay: num [1:10105] 0 0 0 0 0 ...
##  $ Grade           : chr [1:10105] "18" "16" "M3" "M1" ...
##  - attr(*, "spec")=
##   .. cols(
##   ..   Department = col_character(),
##   ..   `Department Name` = col_character(),
##   ..   Division = col_character(),
##   ..   Gender = col_character(),
##   ..   `Base Salary` = col_double(),
##   ..   `2019 Overtime Pay` = col_double(),
##   ..   `2019 Longevity Pay` = col_double(),
##   ..   Grade = col_character()
##   .. )

names(ES21) <- gsub(" ", "", names(ES21))   # remove spaces
str(ES21)

## spec_tbl_df [9,907 × 8] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
##  $ Department      : chr [1:9907] "ABS" "ABS" "ABS" "ABS" ...
##  $ DepartmentName  : chr [1:9907] "Alcohol Beverage Services" "Alcohol Beverage Services" "Alcohol Beverage Services" "Alcohol Beverage Services" ...
##  $ Division        : chr [1:9907] "Beer Loading" "Liquor and Wine Delivery Operations" "Beer Delivery Operations" "Beer Delivery Operations" ...
##  $ Gender          : chr [1:9907] "M" "M" "M" "M" ...
##  $ BaseSalary      : num [1:9907] 87969 80086 80086 70814 76419 ...
##  $ 2021OvertimePay : num [1:9907] 32953 32656 31369 29838 21379 ...
##  $ 2021LongevityPay: num [1:9907] NA 1105 1591 NA NA ...
##  $ Grade           : chr [1:9907] "20" "18" "18" "18" ...
##  - attr(*, "spec")=
##   .. cols(
##   ..   Department = col_character(),
##   ..   `Department Name` = col_character(),
##   ..   Division = col_character(),
##   ..   Gender = col_character(),
##   ..   `Base Salary` = col_double(),
##   ..   `2021 Overtime Pay` = col_double(),
##   ..   `2021 Longevity Pay` = col_double(),
##   ..   Grade = col_character()
##   .. )

MES19 <- ES19 %>%
  select(BaseSalary) %>%
  summarise(mean(BaseSalary))

MES20 <- ES20 %>%
  select(BaseSalary) %>%
  summarise(mean(BaseSalary))

MES21 <- ES21 %>%
  select(BaseSalary) %>%
  summarise(mean(BaseSalary))

mean <- rbind(MES19, MES20, MES21)
mean$new <- c(2019, 2020, 2021)
BS <- cbind(mean)

names(BS) <- c('Average Base Salary', 'Year')
BS

##   Average Base Salary Year
## 1            78936.02 2019
## 2            78771.46 2020
## 3            81339.53 2021

mean$ratenew <- c(0, ((78771.46 - 78936.02)/78936.02) * 100, (81339.53 - 78771.46)/78771.46 * 100)
format(round(mean$ratenew, 2), nsmall = 2)

## [1] " 0.00" "-0.21" " 3.26"

BS1 <- cbind(mean)

names(BS1) <- c('Average Base Salary', 'Year', 'Increase_Rate')
cbind(BS1)

##   Average Base Salary Year Increase_Rate
## 1            78936.02 2019     0.0000000
## 2            78771.46 2020    -0.2084726
## 3            81339.53 2021     3.2601529

BS2 <- round(BS1, 2)
BS2

##   Average Base Salary Year Increase_Rate
## 1            78936.02 2019          0.00
## 2            78771.46 2020         -0.21
## 3            81339.53 2021          3.26

IFL <- read_csv("inflation_data.csv")  # https://www.in2013dollars.com/us/inflation/2019

## 
## ── Column specification ────────────────────────────────────────────────────────
## cols(
##   year = col_double(),
##   amount = col_double(),
##   `inflation rate` = col_double()
## )

names(IFL) <- c('Year', 'Amount', 'Inflation_Rate')
IFL     # 2022 data is compared to previous annual rate. Not final.

## # A tibble: 4 x 3
##    Year Amount Inflation_Rate
##   <dbl>  <dbl>          <dbl>
## 1  2019   100            1.76
## 2  2020   101.           1.23
## 3  2021   106.           4.7 
## 4  2022   112.           6.1

According to https://www.in2013dollars.com/us/inflation/2019, the inflation rate for 2022 is calculated by comparing previous annual rates and is not final.

inner <- merge(IFL, BS2, all.x = TRUE) %>%
  select(Year, Inflation_Rate, Increase_Rate) #%>%
  #filter(Year != 2022)
inner

##   Year Inflation_Rate Increase_Rate
## 1 2019           1.76          0.00
## 2 2020           1.23         -0.21
## 3 2021           4.70          3.26
## 4 2022           6.10            NA

ggplot(inner) +
  geom_line(aes(x = Year, y = Inflation_Rate, col = 'Inflation_Rate')) +
  geom_line(aes(x = Year, y = Increase_Rate, col = 'Increase_Rate')) +
  geom_point(aes(x = Year, y = Inflation_Rate, col = 'Inflation_Rate')) +
  geom_point(aes(x = Year, y = Increase_Rate, col = 'Increase_Rate')) +
  xlab('Year') +
  ylab('Rate (%)') +
  ggtitle('US Yearly Inflation Rate vs. Base Salary Increase Rate in 2019 - 2021') +
  theme(legend.position = 'top')

## Warning: Removed 1 row(s) containing missing values (geom_path).

## Warning: Removed 1 rows containing missing values (geom_point).

According to the website that I downloaded a csv file, the inflation rate for 2022 is calculated by comparing previous annual rates and is not final.

The blue line represents the inflation rate for 2019-2022, and the red line represents the base salary increase rate for 2019-2021. Looking at this graph, it can be predicted that employees’ salaries will increase by the inflation rate this year in 2022.

Conclusion

In conclusion, it was found that employees working in Fire and Rescue Services had a lot of overtime pay and a relatively high base salary among government employees. On the other hand, it was found that the work of Department of Police employees is remarkably large, but they do not receive much corresponding pay. But the reason is that Fire Department and Correction and Rehabilitation require multiple employees who have to be on-site at each station 24/7, which incurs overtime pay outside of working hours.

Also, the median base salary between genders is almost the same, but it seems that males are slightly higher than females.

And finally, compared to the inflation rate of 2019-2022, it is predicted that this year’s employee base salary will increase by more than 1.5%.

Datasets

https://data.montgomerycountymd.gov/Human-Resources/Employee-Salaries-2020/he7s-ebwb

https://data.montgomerycountymd.gov/Human-Resources/Employee-Salaries-2019/qatd-z57d

https://data.montgomerycountymd.gov/Human-Resources/Employee-Salaries-2021/kmkb-bmhe

Reference

My mentor is Ms. Luh who is a senior IT specialist at Data Montgomery. She gave me a guideline to compare US yealy inflation rate and base salary increases using an outside source which is below.

Capstone Project - Employee Salaries

Soojin Kim

5/9/2022

Introduction

I chose the employee salaries dataset for this project. This dataset lists government employee salaries and is divided by department, gender, base salary, and overtime pay. The data is from the Data Montgomery website.

Outline

Q1. Salaries by Department

Q2. Correlation between Base Salary and Overtime Pay

Q3. Top 10 Overtime Department

Q4. Relationship between Gender and Base Salary

Q5. US Yearly Inflation Rate vs. Base Salary Increase Rate in 2019 - 2021

First, let’s bring in the data and take a look at the outline.

Data Cleaning

Q1. Salaries by Department

Q2. Correlation between Base Salary and Overtime Pay

Q3. Top 10 Overtime Department

The top 10 departments that do the most overtime work are shown in a bar graph. Department of Police is also the largest with 1,500 cases, followed by Fire and Rescue Services. At a glance, the Department of Police seems to have low overtime pay.

Even in this pie chart, the police department is found that they do not receive much overtime pay. Fire and Rescue Services are paid as much as they are over time, and Correction and Rehabilitation Department is found to have higher pay than overtime work.

Q4. Relationship between Gender and Base Salary

I expressed the relationship between gender and base salary in a box plot. The male’s and female’s midian income don’t show much difference, but if you see the red diamond, the male’s incomes look a little higher than the female’s.

However, the red diamond is the mean and the mean is affected by the outliers. So, I decided to compare the mean with outliers and the mean without outliers.

Even removing the outliers does not seem to make a significant difference, so it was concluded that males receive a higher salary.

Q5. Inflation Rate vs. Base Salary Increase Rate in 2019 - 2021

According to https://www.in2013dollars.com/us/inflation/2019, the inflation rate for 2022 is calculated by comparing previous annual rates and is not final.

According to the website that I downloaded a csv file, the inflation rate for 2022 is calculated by comparing previous annual rates and is not final.

The blue line represents the inflation rate for 2019-2022, and the red line represents the base salary increase rate for 2019-2021. Looking at this graph, it can be predicted that employees’ salaries will increase by the inflation rate this year in 2022.

Conclusion

Also, the median base salary between genders is almost the same, but it seems that males are slightly higher than females.

And finally, compared to the inflation rate of 2019-2022, it is predicted that this year’s employee base salary will increase by more than 1.5%.

Datasets

https://data.montgomerycountymd.gov/Human-Resources/Employee-Salaries-2020/he7s-ebwb

https://data.montgomerycountymd.gov/Human-Resources/Employee-Salaries-2019/qatd-z57d

https://data.montgomerycountymd.gov/Human-Resources/Employee-Salaries-2021/kmkb-bmhe

Reference

My mentor is Ms. Luh who is a senior IT specialist at Data Montgomery. She gave me a guideline to compare US yealy inflation rate and base salary increases using an outside source which is below.

https://www.in2013dollars.com/us/inflation/2019

Thank you. :)