Baltimore City MD Employee Salary Analysis

Introduction

The demand for salary transparency is greater now than ever. This analysis aims to understand trends in Baltimore City, MD employee salary ranges. This analysis studies trends of salaries determined by job class and total capital deployed for hiring investments throughout the years.

This analysis is strictly run with the intent of understanding patterns in salary estimates and job class.

Data Set

The particular data set used for this analysis comes from Baltimore City’s Open Baltimore database (“Baltimore City Employee Salaries”). “Baltimore City Employee Salaries” logs the characteristics of city employees throughout the district. In it, there are 154,187 observations made across 10 different variables. Several of which variables were dropped in order to uphold high ethical standards and respect anonymity. Different variables used in the analysis include ‘jobClass’, ‘agencyName’, ‘annualSalary’ and more. This analysis will take a closer look at how these salary characteristics relate to job class and capital deployed for hiring usage.

colnames:

‘jobClass’, ‘agencyName’, ‘agencyID’, ‘annualSalary’, ‘grossPay’, ‘hireDate’, ‘fiscalYear’, ‘ObjectId’

It’s structure is comprised of numeric, integer, and character components. The data was rearranged to make over 5 different data frames that compare salaries to job class and the time in which capital was deployed throughout the year.

import os
os.environ['QT_QPA_PLATFORM_PLUGIN_PATH'] = 'c:/users/pptallon/Anaconda3/Library/plugins/platforms'

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import warnings 

warnings.filterwarnings("ignore")

df = pd.read_csv('c:/users/pptallon/Baltimore_City_Employee_Salaries.csv')

df['hireDate'] = pd.Series(df['hireDate'])
df['hireDate']= pd.to_datetime(df['hireDate'])

df['Day'] = df['hireDate'].dt.day
df['Month'] = df['hireDate'].dt.month
df['Year'] = df['hireDate'].dt.year
df['Day'] = df['hireDate'].dt.strftime('%a')
df['MonthName'] = df['hireDate'].dt.strftime('%b')
df['Quarter'] = 'Quarter ' + df['hireDate'].dt.quarter.astype('string')

df.jobClass.fillna("Not Available", inplace = True)

df.hireDate.fillna("Not Available", inplace = True)
df['hireDate'] = pd.to_datetime(df['hireDate'], errors = 'coerce')

Findings

The findings of this analysis include a majority salary range for Baltimore City employees (Range 1), a top earning job class (State’s Attorney), a timeline for when most capital is deployed for hiring (Q3, August), and which city department utilizes the most capital for hiring (Baltimore City Police Department). Together, this information provides valuable insight that could encourage public leaders to drive investment resources into undervalued Job Classes.

Tab 1

Salary Range by Fiscal Year.

This scatter plot visualization depicts employee count by salary range. In other words, we can identify how many employees are paid within a specified range year after year. Five ranges were set as a precursor for the final visualization.

Range 1 30,000 - 50,000
Range 2 50,001 - 70,000
Range 3 70,001 - 90,000
Range 4 90,001 - 110,000
Range 5 110,001 - 300,000

There is a notable decrease in employees from salary range 1 to range 5, meaning the lower the salary range, the higher number of employees within that range. Conversely, the higher the salary range, the fewer employees. Understanding this trend is important because it allows us to conceptualize how the mean salary might be affected by these ranges. There is a vast difference between 300,000 and 30,000. Understanding where a majority of salaries fall in Baltimore City enables us to see how outliers may impact adjacent data reports.


scatter_df = df.groupby(['MonthName','fiscalYear', 'annualSalary'])['annualSalary'].sum().reset_index(name='TotalSalaries')
ranges = {
    'Range 1': scatter_df['annualSalary'].between(30000, 50000),
    'Range 2': scatter_df['annualSalary'].between(50001, 70000),
    'Range 3': scatter_df['annualSalary'].between(70001, 90000),
    'Range 4': scatter_df['annualSalary'].between(90001, 110000),
    'Range 5': scatter_df['annualSalary'].between(110001, 300000),
}
scatter_df['Range'] = np.select(ranges.values(), ranges.keys(), default='No range')

scatter2_df =scatter_df[scatter_df['Range']!= 'No range']

scatter3_df = scatter2_df.groupby(['MonthName', 'fiscalYear', 'Range'])['Range'].count().reset_index(name = 'Range Count')
scatter3_df = pd.DataFrame(scatter3_df)
scatter3_df = scatter3_df.groupby(['MonthName', 'Range', 'fiscalYear'])['Range Count'].sum().reset_index()
scatter3_df['Hundreds'] = round(scatter3_df['Range Count']/100,0)

plt.figure(figsize=(18,10))

plt.scatter(scatter3_df['Range'], scatter3_df['fiscalYear'], marker = '8', cmap = 'viridis', 
            c= scatter3_df['Range Count'], s=scatter3_df['Range Count'], edgecolors = 'black')
plt.title('Baltimore City Employee Salary Range by Fiscal Year', fontsize = 18)
plt.xlabel('Employee Salary Range', fontsize = 14)
plt.ylabel('Fiscal Year', fontsize = 14)

cbar = plt.colorbar()
cbar.set_label('Number of Employees Within Salary Range', rotation = 270, fontsize = 14, color = 'black', labelpad = 30)

my_colorbar_ticks = [*range(15, int(scatter3_df['Range Count'].max()),15 )]
cbar.set_ticks(my_colorbar_ticks)

plt.show()

Tab 2

Top Salaries by Job Class.

This data visualization is a horizontal bar graph. It shows Top Baltimore City Employee Salaries by Job Class. Additionally, the colors of the graph are representative of salary range. Light blue indicates this particular job class is paid above the City average. Green indicates the salary pay is within 1% of the average. Finally, yellow represents job classes paid below the City average. The most outstanding observation is that of the Top Salaries in Baltimore City, the lowest is the Mayors.


bar_df = df.groupby(['jobClass']).agg({'jobClass': ['count'], 'annualSalary': ['mean']}).reset_index()
bar_df.columns = ['JobClass','Count','AvgSalary']

bar_df = bar_df.sort_values(['JobClass', 'AvgSalary'], ascending = [False, False])
bar_df = bar_df.reset_index(drop = True)
bar_df.sort_values("AvgSalary", axis=0,  ascending=False, inplace=True)
bar_df = bar_df.reset_index(drop = True)

def pick_colors_according_to_mean_salary(this_data):
    colors = []
    avg = this_data.AvgSalary.mean()
    for each in this_data.AvgSalary:
        if each > avg*1.01:
            colors.append('lightblue')
        elif each < avg*0.99:
            colors.append('green')
        else:
            colors.append('yellow')
    return colors
  
import matplotlib.patches as mpatches

bottom7 = 1
top7 = 10
d7 = bar_df.loc[bottom7:top7]
my_colors7 = pick_colors_according_to_mean_salary(d7)

Above = mpatches.Patch(color = 'lightblue', label = 'Above Average Salary')
At = mpatches.Patch(color = 'green', label = 'Within 1% of Average Salary')
Below = mpatches.Patch(color = 'yellow', label = 'Below Average Salary')

fig = plt.figure(figsize= (18,10))
ax7 = fig.add_subplot(1,1,1)

ax7.barh(d7.JobClass, d7.AvgSalary, label = 'JobClass', color = my_colors7);
ax7.set_title('Baltimore City Employee Top Salaries by Job Class', size = 18)

current_values = ax7.get_xticks()
ax7.set_xticklabels(['{:,.0f}'.format(d7) for d7 in current_values])

ax7.set_ylabel('Job Class', fontsize = 18)
ax7.set_xlabel('Salary', fontsize = 18)

ax7.invert_yaxis()

ax7.legend(handles = [Above, At, Below], fontsize = 16, loc='lower right')

plt.show()

Tab 3

Average Salary Analysis.

The Average Salary Analysis takes all Employee City Salaries into consideration, whereas the previous chart looked at the City’s top earners and their respective job class. There are over 250+ salaries documented in the “Baltimore City Employee Salary” data set. These charts convey how the top 250 jobs and the top 10 jobs differ in average salary. The mean salary for 250 city jobs is ~$106k whereas the mean salary for the top 10 jobs is ~$196k.


import matplotlib.patches as mpatches

bottom1 = 1
top1 = 250
d1 = bar_df.loc[bottom1:top1]
my_colors1 = pick_colors_according_to_mean_salary(d1)

bottom2 = 1
top2 = 10
d2 = bar_df.loc[bottom2:top2]
my_colors2 = pick_colors_according_to_mean_salary(d2)

Above = mpatches.Patch(color = 'lightblue', label = 'Above Average Salary')
At = mpatches.Patch(color = 'green', label = 'Within 1% of Average Salary')
Below = mpatches.Patch(color = 'yellow', label = 'Below Average Salary')


fig = plt.figure(figsize= (18,10))
fig.suptitle('Average Salary Analysis Top ' + str(top1) + ' and Top ' + str(top2) + ' Salaries',
            fontsize = 18, fontweight = 'bold')

ax1 = fig.add_subplot(2,1,1)
ax1.bar(d1.JobClass, d1.AvgSalary, label = 'Average Salary', color = my_colors1);

current_values = plt.gca().get_yticks()
plt.gca().set_yticklabels(['{:,.0f}'.format(d1) for d1 in current_values])

ax1.legend(handles = [Above, At, Below], fontsize = 18)

ax1.spines['right'].set_visible(False)
ax1.spines['top'].set_visible(False)
ax1.axes.xaxis.set_visible(False)
ax1.set_title('Baltimore City Employee Top '+ str(top1)  +' Salaries', size = 18);
ax1.text(top1-10, d1.AvgSalary.mean()+5, 'Mean = ' + str(d1.AvgSalary.mean()), rotation=0, fontsize=14)

ax2 = fig.add_subplot(2,1,2)
ax2.bar(d2.JobClass, d2.AvgSalary, label = 'Average Salary', color = my_colors2);

ax2.spines['right'].set_visible(False)
ax2.spines['top'].set_visible(False)
ax2.axes.xaxis.set_visible(True)
ax2.set_title('Baltimore City Employee Top '+ str(top2)  +' Salaries', size = 18)
ax2.text(top2-1, d2.AvgSalary.mean()+5, 'Mean = ' + str(d2.AvgSalary.mean()), rotation=0, fontsize=14)

current_values2 = plt.gca().get_yticks()
plt.gca().set_yticklabels(['{:,.0f}'.format(d2) for d2 in current_values2])

plt.xticks(rotation = 80);

plt.show()

Tab 4

Capital Deployed by Quarter.

The pie chart and nested pie charts convey how much Baltimore City spends annually towards hiring. Each salary count or tally is representative of a new hire. Therefore, this chart represents the total capital deployed from the city budget in order to bring on new employees to their respective departments. This first visualization tells us Baltimore City pays out ~$82M annually for hiring.


pie_df = df.groupby(['Quarter','MonthName'])['annualSalary'].sum().reset_index(name='TotalSalaries')

number_outside_colors = len(pie_df.Quarter.unique())
outside_color_ref_number = np.arange(number_outside_colors)*4

fig = plt.figure(figsize = (10,10))
ax = fig.add_subplot(1,1,1)

colormap = plt.get_cmap("tab20c")
outer_colors = colormap(outside_color_ref_number)

all_salaries = pie_df.TotalSalaries.sum()

pie_df.groupby(['Quarter'])['TotalSalaries'].sum().plot(
       kind = 'pie', radius = 1, colors = outer_colors, pctdistance = 0.75, labeldistance = 1.1,
       wedgeprops = dict(edgecolor = 'white'), textprops = dict(fontsize = 18), 
       autopct = lambda p: '{:.2f}%\n(${:.1f}M)'.format(p, (p/100)*all_salaries/1e+8),
       startangle = 90)

hole = plt.Circle((0,0), 0.3, fc= 'white')
fig1 = plt.gcf()
fig1.gca().add_artist(hole)

ax.yaxis.set_visible(False)
plt.title('Total Salaries by Quarter', fontsize = 18)

ax.text(0,0, 'Total Salaries\n' + '$' + str(round(all_salaries/1e+8, 2)) + 'M', size=18, ha= 'center', va= 'center')
ax.axis('equal');
plt.tight_layout()

plt.show()

Capital Deployed Quarter and by Month.

This nested pie chart breaks down how capital is deployed even further by showing what percent of money has been agreed to be paid towards new hires. We observe August as the month in which the greatest amount of capital is deployed. This is unsurprising given the amount of recruiting and hiring that takes place within Q3 nationwide.

This analysis does not take into consideration bonus pay, which would call for additional due diligence and data cleaning.


npie_df = df.groupby(['Quarter','MonthName', 'Month'])['annualSalary'].sum().reset_index(name='TotalSalaries')
npie_df.sort_values(by=['Month'], inplace = True)
npie_df.reset_index(inplace = True, drop = True)
del npie_df['Month']

number_colors = len(npie_df.Quarter.unique())
outside_number = np.arange(number_colors)*4
number_colors = len(npie_df.MonthName.unique())
all_number = np.arange(number_colors + number_colors)

inside_ref = []
for each in all_number:
    if each not in outside_number:
        inside_ref.append(each)

fig = plt.figure(figsize = (10,10))
ax = fig.add_subplot(1,1,1)

colormap = plt.get_cmap("tab20c")
outer_colors = colormap(outside_number)

all_salaries = npie_df.TotalSalaries.sum()

npie_df.groupby(['Quarter'])['TotalSalaries'].sum().plot(
       kind = 'pie', radius = 1, colors = outer_colors, pctdistance = 0.84, labeldistance = 1.1,
       wedgeprops = dict(edgecolor = 'white'), textprops = dict(fontsize = 18), 
       autopct = lambda p: '{:.2f}%\n(${:.1f}M)'.format(p, (p/100)*all_salaries/1e+8),
       startangle = 90)

inner_colors = colormap(inside_ref)
npie_df.TotalSalaries.plot(
       kind = 'pie', radius = 0.69, colors = inner_colors, pctdistance = 0.55, labeldistance = 0.8,
       wedgeprops = dict(edgecolor = 'white'), textprops = dict(fontsize = 12),
       labels = npie_df.MonthName, 
       autopct = '%1.2f%%',
       startangle = 90)

hole = plt.Circle((0,0), 0.3, fc= 'white')
fig1 = plt.gcf()
fig1.gca().add_artist(hole)

ax.yaxis.set_visible(False)
plt.title('Total Salaries by Quarter and Month', fontsize = 18)

ax.text(0,0, 'Total Salaries\n' + '$' + str(round(all_salaries/1e+8, 2)) + 'M', size=18, ha= 'center', va= 'center')
ax.axis('equal');
plt.tight_layout()

plt.show()

Tab 5

Total Capital Deployed.

This multi-line plot visualization identifies the top 5 agency spenders. In other words, we are able to identify who the top agencies are in terms of hiring percentage and how much they are effectively spending annually.


spending_df = df.groupby(['agencyName'])['annualSalary'].sum().reset_index(name='TotalSalaries')
spending_df.sort_values(by=['TotalSalaries'], inplace = True, ascending= False)
spending_df.reset_index(inplace = True, drop = True)

top_spending_agencies = spending_df.loc[0:4, 'agencyName']

spending2_df = df[ df['agencyName'].isin(top_spending_agencies) ]
spending2_df = spending2_df.groupby(['agencyName','MonthName'])['annualSalary'].sum().reset_index(name='TotalSalaries')

fig = plt.figure(figsize = (12,8))
ax = fig.add_subplot(1,1,1)

num_plots = len(pd.unique(spending2_df['agencyName']))
 
colormap = plt.cm.gist_ncar
plt.gca().set_prop_cycle(plt.cycler('color', plt.cm.jet(np.linspace(0, 1, num_plots))))

for key, grp in spending2_df.groupby(['agencyName']):
    grp.plot(ax=ax,kind = 'line', x='MonthName', y='TotalSalaries', label= key, marker='8', markersize=12)
    
plt.title('Total Deployed Capital for Top Employee Salaries by Department', fontsize = 18)
ax.set_xlabel('Month', fontsize = 18)
ax.set_ylabel('Total Deployed Capital ($M)', fontsize = 18, labelpad=20)


plt.show()

Conclusion

Salary trends in Baltimore City build the framework for subsequent analyses regarding City budget, spending, and supplementary research.

Next steps and recommendations include subsequent research and investment in these job classes within Baltimore City to ensure appropriate salary negotiations for years to come.