library(tidyverse)
library(readxl)
library(ggalt)
library(scales)
library(ineq)

DATA COLLECTION

The salary data comes from the Texas Tribune Government Salary Explorer website. I extracted police department specific data from the cities’ data set and read them into R, keeping all employees of these police departments, not just uniformed officers. These non-uniform or civilian employees include administration staff, human resources, accounting, etc. I feel these employees are a vital part of their respective police departments and should be represented in this analysis.

SA_police.data = read_excel("/Users/uax742/Documents/San Antonio Police.xlsx")
Houston_police.data = read_excel("/Users/uax742/Documents/Houston Police.xlsx") 
Austin_police.data = read_excel("/Users/uax742/Documents/Austin Police.xlsx")
Dallas_police.data = read_excel("/Users/uax742/Documents/Dallas Police.xlsx")
FW_police.data = read_excel("/Users/uax742/Documents/Fort Worth Police.xlsx")

The data sets were then cut down to just salaries and job titles, and combined them into one data set, adding a third column to identify the city.

variables_sa = list(salary=sym("FY16 ANNUAL SALARY2"),  jobtitle=sym("JOB TITLE"))
SA_police.data %>%
  select(., !!!variables_sa) -> SA_salary.only

variables_houston = list(salary=sym("Annual Salary"),
jobtitle=sym("Title"))
Houston_police.data %>%
  select(., !!!variables_houston) -> Houston_salary.only

variables_austin = list(salary=sym("Annual Salary"),
jobtitle=sym("Title"))
Austin_police.data %>%
  select(., !!!variables_austin) -> Austin_salary.only

variables_dallas = list(salary=sym("Annual Salary"),
jobtitle=sym("Job Code Description"))
Dallas_police.data %>%
  select(., !!!variables_dallas) -> Dallas_salary.only

variables_fw = list(salary=sym("Annual Rt"),
jobtitle=sym("Job"))
FW_police.data %>%
  select(., !!!variables_fw) -> FW_salary.only
SA_salary.only <- mutate(SA_salary.only, city="San Antonio")
Houston_salary.only <- mutate(Houston_salary.only, city="Houston")
Austin_salary.only <- mutate(Austin_salary.only, city="Austin")
Dallas_salary.only <- mutate(Dallas_salary.only, city="Dallas")
FW_salary.only <- mutate(FW_salary.only, city="Fort Worth")
Cities_salary <- bind_rows(SA_salary.only, Houston_salary.only, Austin_salary.only, Dallas_salary.only, FW_salary.only)

DATA EXPLORATION

To summarize the data, I looked at the average annual salary and the salary distribution of Texas police departments by city. The Gini coeffecient was calculated for each city’s police department, and visualized it using Lorenz curves. I am looking to compare the San Antonio police department to the four other largest departments in the state of Texas.

Average Annual Salary by City

Cities_salary %>%
  group_by(., city) %>%
  summarise(., jobtitle = n(), MeanAnnual=mean(salary, na.rm=TRUE)) %>%
  print.data.frame(., digits=3)
##          city jobtitle MeanAnnual
## 1      Austin     2508      77588
## 2      Dallas     3778      62287
## 3  Fort Worth     2345      63364
## 4     Houston     7371      62966
## 5 San Antonio     3013      59681

The average annual salary for the San Antonio police department is inline with the other large Texas cities police departments outside of Austin, whose average is significantly higher than the rest. The two police departments with the least amount of employees, Austin and Fort Worth, have the two highest annual salaries. Although, Fort Worth is closer to the other four departments then it is to Austin’s.

Salary Distribution by City

Cities_salary %>%
  ggplot(., aes(city, salary)) +
    geom_boxplot() + scale_y_continuous(labels = comma) +
    coord_flip() +
    labs(title ="Texas Police Department Salary Distributions")

The San Antonio police department’s salary distribution is similar to that of the Houston and Dallas police departments. The Austin and Fort Worth police departments’ are more spread out, indicating a larger variablility in salaries. The largest outliers for the San Antonio and Fort Worth police deapartments are their respective police chiefs. The Houston police department has two police chiefs, most likely due to the fact that they have over 7300 employees, nearly double that of each of the other four police departments. The Austin and Dallas police departments data sets did not list a police chief. The highest paid employee of the Austin PD is a Lab director, and the highest paid employee of the Dallas PD is an assistant police chief.

Gini Coefficients by City

City_gini = matrix(c(0.1507, 0.1486, 0.1509, 0.1779, 0.2310), ncol = 5, byrow = T)
colnames(City_gini) = c("San Antonio ", "Dallas ", "Houston ", "Austin ", "Fort Worth")
rownames(City_gini) = c("Gini")
City_gini = as.table(City_gini)
City_gini
##      San Antonio  Dallas  Houston  Austin  Fort Worth
## Gini       0.1507  0.1486   0.1509  0.1779     0.2310

The Gini coefficient is a measure of statistical dispersion intended to represent income distribution, and is the most commonly used measurement of inequality. The San Antonio police department’s Gini coefficeint ranks second among the five Texas departments, and indicates a high level of income equality within the department. The Houston, Dallas, and Austin police departments are similar and also indicate equality. The Fort Worth police department is noticieably higher than the other four departments, but is still a relatively low in terms of Gini coefficients.

Cities_salary %>%
  group_by(., city) %>%
  summarise(., Gini=ineq(salary, type="Gini")) %>%
  ggplot(., aes(reorder(city, Gini), Gini)) +
    geom_lollipop(point.colour="blue", point.size=3) +
    coord_flip() +
    labs(title="Texas Police Department Salary Inequity")

Lorenz Curves

A Lorenz curve is a graphical representation of the distribution of income. The diagonal line represents perfect equality, a Gini coeffecient of 1.

plot(Lc(SA_salary.only$salary), col = "blue", lwd = 2, sub = "San Antonio")

par(mfrow = c(2,2))
plot(Lc(Houston_salary.only$salary), col = "blue", lwd = 2, sub = "Houston")
plot(Lc(Austin_salary.only$salary), col = "blue", lwd = 2, sub = "Austin")
plot(Lc(Dallas_salary.only$salary), col = "blue", lwd = 2, sub = "Dallas")
plot(Lc(FW_salary.only$salary), col = "blue", lwd = 2, sub = "Fort Worth")

CONCLUSION

Overall, the San Antonio police department compares favorably with the other four large police departments in the state of Texas. The average annual salary for the San Antonio police department is the lowest of the five departments, but not by much. The Salary distribution for the San Antonio PD is similar to that of Houston and Dallas, which are the three largest departments in the state. The San Antonio police department’s Gini coefficient is the second lowest among the departments, indicating a high level of equality in terms of salary across the department.