Data Overview

The dataset that I will be analyzing is the San Francisco Employee Salary Compensation dataset. This dataset provides all compensation data from 2013-2021. For the purposes of this analysis, I will only be using the compensation data between 2018-2021. This timeframe provided the cleanest data regarding salary data for all the current departments. The whole dataset contains a total of 758,604 records and 20 columns.The columns we will be mainly using for this analysis will be the year,department, salary, overtime, and total salary.Here is a list of all of the columns.

str(df)
## Classes 'data.table' and 'data.frame':   758604 obs. of  22 variables:
##  $ Organization Group Code: int  1 1 1 1 1 1 1 1 1 1 ...
##  $ Job Family Code        : chr  "1000" "1000" "1000" "1000" ...
##  $ Job Code               : chr  "1021" "1023" "1031" "1054" ...
##  $ Year Type              : chr  "Calendar" "Calendar" "Calendar" "Calendar" ...
##  $ Year                   : int  2013 2013 2013 2013 2013 2013 2013 2013 2013 2013 ...
##  $ Organization Group     : chr  "Public Protection" "Public Protection" "Public Protection" "Public Protection" ...
##  $ Department Code        : chr  "ADP" "ADP" "ADP" "ADP" ...
##  $ Department             : chr  "ADP Adult Probation" "ADP Adult Probation" "ADP Adult Probation" "ADP Adult Probation" ...
##  $ Union Code             : int  21 21 21 21 21 21 21 21 21 21 ...
##  $ Union                  : chr  "Prof & Tech Engineers - Miscellaneous, Local 21" "Prof & Tech Engineers - Miscellaneous, Local 21" "Prof & Tech Engineers - Miscellaneous, Local 21" "Prof & Tech Engineers - Miscellaneous, Local 21" ...
##  $ Job Family             : chr  "Information Systems" "Information Systems" "Information Systems" "Information Systems" ...
##  $ Job                    : chr  "IS Administrator 1" "IS Administrator 3" "IS Trainer-Assistant" "IS Business Analyst-Principal" ...
##  $ Employee Identifier    : int  37730 26574 8148 27436 37730 36223 51426 26574 37983 16858 ...
##  $ Salaries               : num  57535 57679 63533 101275 5084 ...
##  $ Overtime               : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ Other Salaries         : num  0 0 0 -7059 0 ...
##  $ Total Salary           : num  57535 57679 63533 94216 5084 ...
##  $ Retirement             : num  11253 10285 12496 19644 1084 ...
##  $ Health and Dental      : num  11961 7177 12918 12918 957 ...
##  $ Other Benefits         : num  4533 4755 4703 7459 388 ...
##  $ Total Benefits         : num  27747 22217 30117 40021 2428 ...
##  $ Total Compensation     : num  85282 79895 93650 134237 7512 ...
##  - attr(*, ".internal.selfref")=<externalptr>

Total Salaries by Department in 2021

In this first chart, we are looking at the total salary of each department compared to the overall budget for 2021.When creating this chart, I combined all the departments who had a percentage less than 1%. This combined a total of 37 departments into the “Other” department. This shows us that the over 50% of the total salary budget is going to six departments. They are Public Health, Municipal Transportation Agency, Police, Public Utilities Commission, and Fire Department. This would give us a starting point to look at those departments when it comes to reducing the overall budget.

plot_ly(mysalaries2021_df, labels = ~myDepartment, values = ~totalsalaries) %>%
  add_pie(hole = 0.6) %>%
  layout(title="Total Salaries by Department in 2021") %>%
  layout(annotations = list(text=paste0("Total Salary Budget: \n", scales::comma(sum(mysalaries2021_df$totalsalaries))),
                            "showarrow"=F))

Number of Employees by Year by Department

In this second chart, I decide to look at the overall headcount of each department to see if the departments with large budgets can be contributed to a large headcount. It does show departments like Public Health and Municipal Transportation do have a large headcount. Also, the number of employees in each department has decreased over the last few years. This lets us know that San Francisco has been either laying off employees or they have not back filled open positions. This could be due to a cost savings measure or being unable to find employees for those positions.

ggplot(count_df, aes(x = Year, y = n, fill=Department)) +
  geom_bar(stat = "identity", position = "dodge") +
  theme_dark() +
  theme(plot.title = element_text(hjust = 0.5)) +
  scale_y_continuous(labels = comma) +
  labs(title = "Multiple Bar Charts - Number of Employees by Year by Department",
       x= "Years",
       y = "Count of Employees",
       fill = "Department") +
  scale_fill_manual(values = mycolors) +
  facet_wrap(~Department, ncol = 5, nrow = 3)

Average Salary per Department (2018-2021)

In the third chart, I wanted to see what was the average salary per department over a few years. This allowed me to evaluate the departments, if they were keeping up with the cost of living and inflation.All the departments had a steady increase except for the Public Library. They are the only department that had a relatively flat line that only went up about $3,000.00 per year. I was also surprised to see that the Fire Department has the highest average salary with $116,000.This shows us an opportunity to reduce salaries by reducing the annual increases.

plot_ly(Sal2018_2021Ave_df, x = ~Year, y = ~n, name = Sal2018_2021Ave_df$Department, 
        type = 'scatter', mode = 'lines') %>%
  layout(title = "Average Salary per Department (2018-2021)")

Salary and Overtime for 2021 by Department

In my fourth chart, I wanted to compare the salaries and overtime for the top departments for 2021. This would allow us to see how departments are utilizing their budgets. It will also inform us which departments need to increase their staffing levels to reduce overtime. This shows us that Municipal Transportation Agency and Police are spending more than half a million on overtime. By hiring more employees, they would be able to reduce their overall budget.

ggplot(salaries2021_df, aes(x = reorder(Department, totsalaries), y = totsalaries, fill = NULL)) +
  geom_bar(stat = "identity", color = "darkblue", fill = "green") +
  coord_flip() +
  labs(title = "2021 San Francisco Total Salary", x ="Departments over 1000 Employees", y ="") +
  theme(plot.title = element_text(hjust = 0.5)) +
  geom_text(data = salaries2021_df, aes(x = Department, y = totsalaries, label = scales::comma(totsalaries), fill = NULL), hjust = -0.2, size = 3) +
  scale_y_continuous(labels = comma,
                     breaks = seq(0, 1e10, by = 2e10),
                     limits = c(0, 1e10))

ggplot(overtime2021_df, aes(x = reorder(Department, totovertime), y = totovertime, fill = NULL)) +
  geom_bar(stat = "identity", color = "darkred", fill = "orange") +
  coord_flip() +
  labs(title = "2021 San Francisco Total Overtime", x ="Departments over 1000 Employees", y ="") +
  theme(plot.title = element_text(hjust = 0.5)) +
  geom_text(data = overtime2021_df, aes(x = Department, y = totovertime, label = scales::comma(totovertime), fill = NULL), hjust = -0.2, size = 3) +
  scale_y_continuous(labels = comma,
                     breaks = seq(0, 1e9, by = 2e9),
                     limits = c(0, 1e9))

Department Salary Quartiles 2021

In this fifth and final chart, I wanted to see the salary quartiles of each department and see if there were many outliers for each department and if each department quartile ranges were similar. All the departments were very similar except for the Police and Sheriff departments. This informs us that they need a job study to reevalute the salaries for their positions.

plot_ly(everysalaries2021_df, y = ~Salaries, type = "box", quartilemethod= "exclusive", 
        name = everysalaries2021_df$Department)%>%
  layout(title ="Department Salary Quartiles 2021")

Summary

Overall after looking at the data, there were many places that could be evaluated to help reduce the overall budget for San Francisco. Unfortunately, with many public positions, they are run by unions with salary contracts in place regulating all hiring, firing, and salary increase. This would make it very difficult to implement any of these changes but it could be a step in the right direction.

This data is located on the San Francisco Open Data website at https://data.sfgov.org/City-Management-and-Ethics/Employee-Compensation/88g8-5mnd