Data wrangling

Use the original data files: IL.TXT, census_2010_race.csv, and pa446_chicago_full.csv. Clean all of them and create 1 master table. The master table need to have the following columns: last_name, first_name, job_titles, department, annual_salary (for both salaried and hourly employees), race, gender. SHOW YOUR WORK.

The final master table is titled “finaldf” and you can find it in line 174.

library(readr)
library(here)
library(tidyverse)
library(report) # useful to find statistical significance results
library(nortest) # normality test 


df <- read_csv("pa446_chicago_full.csv")
df_gender<- read.csv("IL.Txt", header=FALSE)
df_race <- read_csv("census_2010_race.csv")

Clean up the Chicago_Full dataset

#separate name column
df_sep <- df %>%
  separate(Name, c('last_name', 'first_mid_name'), sep= "," )

df_sep2 <- df_sep %>%
  mutate(first_mid_name = str_trim(first_mid_name)) %>% 
  separate(first_mid_name, c('first_name', 'mid_name'), sep= " ")

df_sep2$first_name <- tolower(df_sep2$first_name) # change to lower




#unite annual_salary into one


df_sep2$annual_salary = 
  paste(df_sep2$`Annual Salary`, df_sep2$`Hourly Rate`)


df_sep2$annual_salary <- gsub("[^0-9.-]", "", df_sep2$annual_salary) #remove NA values

Clean up gender dataset

colnames(df_gender) <- c("state","gender", "born_year", "first_name", "count")

# change classes
df_gender= 
  df_gender %>%
  mutate(state = as.character(state),
         gender = as.character(gender),
         born_year = as.integer(born_year),
         first_name = as.character(first_name),
         count = as.integer(count))

# distinct name regardless of year
df_gender_distinct <- df_gender %>% 
  distinct(gender, first_name) # distinct by name regardless of year

dim(df_gender_distinct) #12232 x 2
## [1] 12232     2
# remove duplicates
df_gender_slice <- df_gender %>%
  group_by(first_name) %>% 
  slice(which.max(count))     #

dim(df_gender_slice) # 11,402 x 5
## [1] 11402     5
df_gender_slice$first_name <- tolower(df_gender_slice$first_name)

Merge salary and gender dataset

df_joined_3= left_join(df_sep2, df_gender_slice)
df_joined_3= select(df_joined_3, -born_year, -count) # remove born_year, count column

Clean up race column

df_race$pctaian <- as.numeric(df_race$pctaian) # change classes
df_race$pctblack <- as.numeric(df_race$pctblack)
df_race$pctwhite <- as.numeric(df_race$pctwhite)
df_race$pctapi <- as.numeric(df_race$pctapi)
df_race$pct2prace <- as.numeric(df_race$pct2prace)
df_race$pcthispanic <- as.numeric(df_race$pcthispanic)



final_col <- c('name', 'pctwhite', 'pctblack', 'pctapi', 'pctaian', 'pct2prace', 'pcthispanic')

race_col <- c('pctwhite', 'pctblack', 'pctapi', 'pctaian', 'pct2prace', 'pcthispanic')

# select cols above

df_race_filter <- df_race %>%
  select(final_col)

# pivot longer
df_long <- 
  df_race_filter %>%
  pivot_longer(
    col= !name,  # or race_col
    names_to= 'race', 
    values_to= 'percent'
  )



# slice max

df_long_slice <- df_long %>%
  group_by(name) %>%
  slice(which.max(percent))

# remove pct
df_long_slice$race <- str_replace(df_long_slice$race, 
                                      '^pct', 
                                      '' )

# change to lower
df_long_slice$name <- tolower(df_long_slice$name)

# FINAL Dataset

df_finalrace = select(df_long_slice, name, race)

Merge gender and race

DATA= 
  left_join(df_joined_3, df_finalrace, by = c("first_name" = "name"))
#rename 
DATA= rename(DATA, c("job_titles" = "Job Titles"))

#make salary numeric

DATA$annual_salary <- as.numeric(DATA$annual_salary)

FINAL Master Table with the following columns: last_name, first_name, job_titles, department, annual_salary (for both salaried and hourly employees), race, gender.

finaldf= 
  DATA %>% select(last_name, first_name, job_titles, Department, annual_salary, race, gender)

sum(is.na(finaldf$race)) #5174 missing race values
## [1] 5174
sum(is.na(finaldf$gender)) #2683 missing gender
## [1] 2683

Problem 2: Exploratory Data Analysis

” As you already know, one of the mayor of Chicago’s priority this year is equity in pay for city employees, especially in some of the city’s largest departments.

Furthermore, equity is defined as pay equality between 1) different genders and 2) different races

Use your master table and see if there are general pay differences between genders in the city’s 5 largest departments. Please also calculate the n-size for males and females in each department. ”

In the Police Department, there is evidence of pay inequity. There are 2970 females and 9105 males. The male median wage is $90,024 which is $3,018 more than the median female wage of $87,006. The maximum male earner makes $260,004 while the maximum female earner makes $185,364. However, this is explained because the Police SuperIntendent is a male, and the position of “Superintendent” comes with a higher salary than the rest.

In the Fire Department, there is evidence of pay inequity. There are 422 females and 4133 males. The male median wage is $103,914 while the female median wage is $100,560. When looking at the median wage, males make ~$3,000 more. The maximum male earner in the Fire Department also makes $217,728 while the maximum female earner makes $197,736. The pay inequity between the max male and female earners in the Fire Department is much larger than the Police Department.

In the Streets & Sanitation Department, there is some evidence of pay inequity. There are 234 females and 1499 males. While the median pay for both male ($38.35/hr) and females ($39.02) are almost the same, females make 67 cents more. However, the max male earner makes $175,092 while the max female earner makes $131,664 a difference of $43,428.

In the water management department, there is some evidence of pay inequity. There are 234 females and 1499 males. The median pay per hr is basically the same for both male and females. They both make around $49/hr. Compared to the other departments, the Water Management Department pays the max female earner $159,036 and the max male earner $152,832.

In the aviation department, similarly to the water management department, there is some evidence of pay inequity in favor of females. There are 378 females and 1185 Males. The median pay per hr is $4 higher for females and the max female earner makes $87,276 more than the max male earner.

First, let’s define the 5 largest departments. We will define this as the department with the most number of employees. These are: Police, Fire, Streets & San, Water Management, and Aviation.

table(finaldf$Department)
## 
##        ADMIN HEARNG       ANIMAL CONTRL            AVIATION   BOARD OF ELECTION 
##                  35                  67                1766                 104 
##     BOARD OF ETHICS       BUDGET & MGMT           BUILDINGS    BUSINESS AFFAIRS 
##                   8                  38                 242                 167 
##          CITY CLERK        CITY COUNCIL                COPA    CULTURAL AFFAIRS 
##                  82                 370                 125                  59 
##                DAIS        DISABILITIES    FAMILY & SUPPORT             FINANCE 
##                1003                  26                 621                 505 
##                FIRE              HEALTH             HOUSING  HOUSING & ECON DEV 
##                4730                 513                  68                 155 
##     HUMAN RELATIONS     HUMAN RESOURCES       INSPECTOR GEN                 LAW 
##                  15                  69                  91                 377 
##   LICENSE APPL COMM      MAYOR'S OFFICE                OEMC              POLICE 
##                   1                 104                 932               13135 
##        POLICE BOARD         PROCUREMENT      PUBLIC LIBRARY PUBLIC SAFETY ADMIN 
##                   2                  78                1004                 259 
##       STREETS & SAN          TRANSPORTN           TREASURER         WATER MGMNT 
##                2040                1083                  29                1867
#Police: 13,135
#Fire: 4730 
#Streets & San: 2040
# Water MGMNT: 1867
# Aviation: 1766

Now, let’s try to find pay differences between genders. We will use the fivenum() function to find minimum, 1st quartile, median, 3rd quartile, max.

#Police Dept: 

POLICE= finaldf %>%
  filter(Department == "POLICE")

POLICE %>%
  group_by(gender) %>%
  summarize(salarysummary= fivenum(annual_salary, na.rm=TRUE))
## # A tibble: 15 × 2
## # Groups:   gender [3]
##    gender salarysummary
##    <chr>          <dbl>
##  1 F               9.74
##  2 F           76266   
##  3 F           87006   
##  4 F           93354   
##  5 F          185364   
##  6 M              12   
##  7 M           84054   
##  8 M           90024   
##  9 M           96060   
## 10 M          260004   
## 11 <NA>           14   
## 12 <NA>        76266   
## 13 <NA>        84054   
## 14 <NA>        93354   
## 15 <NA>       162684
table(POLICE$gender) #2970 FEMALES, 9105 MALES 
## 
##    F    M 
## 2970 9105
#Females                        Males
#minimum: $9.74                 minimum: $12    
#1st quartile: $76,266          1st quartile:$84,054
# median: $87,006               median: $90,024
#3rd quartile: $93,354          3rd quartile: $96,060
#max: $185,364                  max: $260,004

Fire Department!

#Fire Dept:

FIRE= finaldf %>%
  filter(Department == "FIRE")

FIRE %>%
  group_by(gender) %>%
  summarize(salarysummary= fivenum(annual_salary, na.rm=TRUE))
## # A tibble: 15 × 2
## # Groups:   gender [3]
##    gender salarysummary
##    <chr>          <dbl>
##  1 F              45288
##  2 F              88338
##  3 F             100560
##  4 F             114096
##  5 F             197736
##  6 M              52896
##  7 M              98424
##  8 M             103914
##  9 M             116808
## 10 M             217728
## 11 <NA>           62160
## 12 <NA>           88338
## 13 <NA>          100764
## 14 <NA>          110226
## 15 <NA>          165048
  table(FIRE$gender)  #422 F, 4133 M
## 
##    F    M 
##  422 4133
#Females                        Males
#minimum: $45288                 minimum: $52896    
#1st quartile: $88338          1st quartile:$98424  
# median: $100560              median: $103914
#3rd quartile: $114096         3rd quartile: $116808
#max: $197736                     max: $217728

Streets & San Department

#Streets & San

STREETS= finaldf %>%
  filter(Department == "STREETS & SAN") 

STREETS %>%
  group_by(gender) %>%
  summarize(salarysummary= fivenum(annual_salary, na.rm=TRUE))
## # A tibble: 15 × 2
## # Groups:   gender [3]
##    gender salarysummary
##    <chr>          <dbl>
##  1 F               21.7
##  2 F               30.7
##  3 F               39.0
##  4 F               39.4
##  5 F           131664  
##  6 M               21.7
##  7 M               38.4
##  8 M               38.4
##  9 M               39.4
## 10 M           175092  
## 11 <NA>            21.7
## 12 <NA>            30.7
## 13 <NA>            38.4
## 14 <NA>            39.4
## 15 <NA>        115656
table(STREETS$gender) #367 F, 1472 M
## 
##    F    M 
##  367 1472
#Females                        Males
#minimum: $21.73                minimum: $21.73    
#1st quartile: $30.68           1st quartile:$38.35
# median: $39.02                median: $38.35
#3rd quartile: $39.39            3rd quartile: $39.39
#max: $131,664                 max: $175,092

Water Management Department

# Water MGMNT

WATER= finaldf %>%
  filter(Department == "WATER MGMNT")

WATER%>%
  group_by(gender) %>%
  summarize(salarysummary= fivenum(annual_salary, na.rm=TRUE))
## # A tibble: 15 × 2
## # Groups:   gender [3]
##    gender salarysummary
##    <chr>          <dbl>
##  1 F               19.0
##  2 F               44.4
##  3 F               49.9
##  4 F            76932  
##  5 F           159036  
##  6 M               20.6
##  7 M               44.4
##  8 M               49.8
##  9 M               52.3
## 10 M           152832  
## 11 <NA>            19.0
## 12 <NA>            44.4
## 13 <NA>            46  
## 14 <NA>         72024  
## 15 <NA>        132972
table(WATER$gender) # 234 F, 1499 M
## 
##    F    M 
##  234 1499
#Females                        Males
#minimum: $18.950               minimum: $20.550    
#1st quartile: $44.400          1st quartile:$44.400
# median: $49.915               median: $49.830
#3rd quartile: $76932            3rd quartile: $52.320
#max: $159036                    max: $152832

Aviation Department

# Aviation

AVIATION= finaldf %>%
  filter(Department == "AVIATION")

AVIATION %>%
  group_by(gender) %>%
  summarize(salarysummary= fivenum(annual_salary, na.rm=TRUE))
## # A tibble: 15 × 2
## # Groups:   gender [3]
##    gender salarysummary
##    <chr>          <dbl>
##  1 F               14  
##  2 F               38.4
##  3 F               50.4
##  4 F            80484  
##  5 F           275004  
##  6 M               15  
##  7 M               38.4
##  8 M               46.5
##  9 M            66852  
## 10 M           187728  
## 11 <NA>            14  
## 12 <NA>            38.4
## 13 <NA>            38.4
## 14 <NA>         71550  
## 15 <NA>        126504
table(AVIATION$gender) #378 F, 1185 M
## 
##    F    M 
##  378 1185
275004-187728
## [1] 87276
#Females                        Males
#minimum: $14                 minimum: $15    
#1st quartile: $38.35          1st quartile:$38.35
# median: $50.37              median: $46.50
#3rd quartile: $80,484        3rd quartile: $66,852
#max: $275,004                 max: $187,728

Problem 3: T-tests

” Is the difference you observed in problem 2 statistically significant? ”

POLICE: Yes, at a 95% confidence level, there is a significant difference in pay in the Police Dept among Males and Females.Males on average make $4,874 more than females. This is statistically significant p< .05.

FIRE: Yes, at a 95% confidence level, there is a significant difference in pay between gender in FIRE Dept. This is border line significant p<.05. Males on average make $2537 more than females.

Streets & San: No, no significant difference in pay between gender in Streets & Sanitation department.

Water Management: Yes, at 95% confidence interval, there is a significant difference in gender pay in the Water Management Department. p<.05. Females on average make $21,070 more than males.

Aviation: Yes, at 95% confidence interval, there is a significant difference in gender pay in the Aviation Department.p<.05 Females on average make $13977 more than males.

For this, we will use two sample t-Test (Independent Samples t-Test) for the 5 departments:

# Police
t.test(POLICE$annual_salary ~ POLICE$gender) # Yes, at a 95% confidence level, there is a significant difference in pay in the Police Dept among Males and Females.
## 
##  Welch Two Sample t-test
## 
## data:  POLICE$annual_salary by POLICE$gender
## t = -12.921, df = 4991.2, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -5613.325 -4134.329
## sample estimates:
## mean in group F mean in group M 
##        87267.68        92141.50
#Fire

t.test(FIRE$annual_salary ~ FIRE$gender) # Yes, at a 95% confidence level, there is a significant difference in pay between gender in FIRE Dept. This is border line significant p<.05 
## 
##  Welch Two Sample t-test
## 
## data:  FIRE$annual_salary by FIRE$gender
## t = -2.0214, df = 495.99, p-value = 0.04377
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -5003.88746   -71.13431
## sample estimates:
## mean in group F mean in group M 
##        103619.4        106156.9
#Streets & San

t.test(STREETS$annual_salary ~ STREETS$gender) # No, no significant difference in pay between gender in Streets & San department.
## 
##  Welch Two Sample t-test
## 
## data:  STREETS$annual_salary by STREETS$gender
## t = 1.1382, df = 583.1, p-value = 0.2555
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -1602.186  6018.760
## sample estimates:
## mean in group F mean in group M 
##        15770.17        13561.88
# Water MGMNT: 

t.test(WATER$annual_salary ~ WATER$gender) # Yes, at 95% confidence interval, there is a significant difference in gender pay in the Water Management Department. p<.05 
## 
##  Welch Two Sample t-test
## 
## data:  WATER$annual_salary by WATER$gender
## t = 6.7751, df = 284.77, p-value = 7.129e-11
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  14949.13 27192.32
## sample estimates:
## mean in group F mean in group M 
##        36036.95        14966.23
# Aviation: 

t.test(AVIATION$annual_salary ~ AVIATION$gender) # Yes, at 95% confidence interval, there is a significant diff. in gender pay in the Aviation Dept. 
## 
##  Welch Two Sample t-test
## 
## data:  AVIATION$annual_salary by AVIATION$gender
## t = 5.484, df = 599.37, p-value = 6.136e-08
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##   8971.532 18982.357
## sample estimates:
## mean in group F mean in group M 
##        40080.86        26103.92

Exploratory Data Analysis II

” Use your master table and see if there are general pay differences between races in the city’s 5 largest departments. Please also calculate the n-size for each race subgroup in each department. ”

In the Police Department, this is the n-size for each race. 2 races Akaska Native(aian) Asian or Pacific Islander(API) black hispanic white 1 36 297 1007 1232 8524 By looking at the median pay, Black and white workers seem to make a little more than the rest. Alaska Native and those classified as 2 races ( which is only 1 worker) make less than the rest. There is evidence of pay inequity in the Police Dept.

In the Fire department, this is the n-size for each race. aian api black hispanic white 14 115 391 292 3583 Looking at the median pay, black and white still make more than everybody else while Alaska Native make the least. These results are similar in the Police Department. There is evidence of pay inequity.

In the Streets & Sanitation department, this is the n-size for each race. aian api black hispanic white 9 41 189 245 1203 Looking at the median-pay, all races make the same, $38.35 per hr, except for Alaska Native workers who make $39.39.

In the water management department, this is the n-size for each race. aian api black hispanic white 7 45 191 150 1277 Looking at the median pay, Alaska Native and Asian or Pacific Islander(API) workers make a little more compared to the rest. Hispanic workers make the least in this department. There is evidence of pay inequity here.

In the aviation department, this is the n-size for each race. aian api black hispanic white 6 38 172 169 1051

Looking at the median pay, Hispanic workers make significantly more than the rest, followed closely behind by white workers. API, AIAN, and black workers make $12 less than Hispanic workers at the median pay.

We will use the median function to find pay differences between races in the 5 departments: Police, Fire, Streets & San, Water Management, and Aviation.

Visualize it first using boxplots. There seems to be more outliers in the white category compared to the rest. Also, 2 races seem to have only 1 value.This particular person who identifies as 2 races works for the Police Department only.

ggplot(finaldf) +
  aes(x = race, y = annual_salary) +
  geom_boxplot()

Police Department:

#Police Dept: 
POLICE %>%
  group_by(race) %>%
  summarize(salarysummary= median(annual_salary, na.rm=TRUE)) %>%
  arrange(salarysummary)
## # A tibble: 7 × 2
##   race     salarysummary
##   <chr>            <dbl>
## 1 2prace           66336
## 2 aian             84054
## 3 <NA>             84054
## 4 api              87006
## 5 hispanic         87006
## 6 black            90024
## 7 white            90024
table(POLICE$race)
## 
##   2prace     aian      api    black hispanic    white 
##        1       36      297     1007     1232     8524
  # 2 races     Alaska Native(aian)      Asian or Pacific Islander(API)      black   hispanic    white 
      #1            36                      297                               1007     1232      8524 

Fire Department

#Fire Dept:

FIRE %>%
  group_by(race) %>%
  summarize(salarysummary= median(annual_salary, na.rm=TRUE)) %>%
  arrange(salarysummary)
## # A tibble: 6 × 2
##   race     salarysummary
##   <chr>            <dbl>
## 1 aian            100149
## 2 <NA>            100560
## 3 api             101874
## 4 hispanic        101874
## 5 black           103914
## 6 white           103914
  table(FIRE$race) 
## 
##     aian      api    black hispanic    white 
##       14      115      391      292     3583
  # aian      api    black hispanic    white 
   # 14      115      391      292     3583 

Streets & Sanitation

#Streets & San

STREETS %>%
  group_by(race) %>%
  summarize(salarysummary= median(annual_salary, na.rm=TRUE)) %>%
  arrange(salarysummary)
## # A tibble: 6 × 2
##   race     salarysummary
##   <chr>            <dbl>
## 1 api               38.4
## 2 black             38.4
## 3 hispanic          38.4
## 4 white             38.4
## 5 <NA>              38.4
## 6 aian              39.4
table(STREETS$race) 
## 
##     aian      api    black hispanic    white 
##        9       41      189      245     1203
#aian      api    black   hispanic    white 
# 9        41      189      245       1203 

Water Management

# Water MGMNT

WATER%>%
  group_by(race) %>%
  summarize(salarysummary= median(annual_salary, na.rm=TRUE)) %>%
  arrange(salarysummary)
## # A tibble: 6 × 2
##   race     salarysummary
##   <chr>            <dbl>
## 1 <NA>              44.9
## 2 hispanic          47.3
## 3 white             49.8
## 4 black             50  
## 5 aian              52.1
## 6 api               52.1
table(WATER$race) 
## 
##     aian      api    black hispanic    white 
##        7       45      191      150     1277
#aian      api    black hispanic    white 
# 7       45      191      150     1277 

Aviation Department

# Aviation

AVIATION %>%
  group_by(race) %>%
  summarize(salarysummary= median(annual_salary, na.rm=TRUE)) %>%
  arrange(salarysummary)
## # A tibble: 6 × 2
##   race     salarysummary
##   <chr>            <dbl>
## 1 api               38.4
## 2 aian              38.6
## 3 black             38.8
## 4 <NA>              38.8
## 5 white             48.3
## 6 hispanic          50.8
table(AVIATION$race) 
## 
##     aian      api    black hispanic    white 
##        6       38      172      169     1051
 #aian      api    black hispanic    white 
  # 6       38      172      169     1051 

Problem 5: ANOVA and Tukey-Post Hoc Tests

” Is the differences statistically significant? ”

In the Police Department, when running the ANOVA test, there is a significant difference (p < .001) in pay between races. The Tukey-post Hoc result shows that there are significant pay differences between Hispanic and Black Officers as well as Hispanic and White officers.

In the Fire Department, when running the ANOVA test, there is a significant difference in pay between races (p < .001). The Tukey Post-Hoc test shows significant pay differences between Hispanic and Black workers and Hispanic and White workers. The results are similar in the Police Department.

In the Streets & San Department, when running the ANOVA test, the P-value is 0.33, so we can conclude that there are no significant differences in pay between races.

In the Water Management Department, the ANOVA test shows p = 0.018 (p<.05), so there is a significant difference in pay between races. When running the post-hoc test, there appears to be significant pay differences between Asian or Pacific Islander (API) and Black workers, API and Hispanic workers, and API and White workers.

In the Aviation Department, the ANOVA test shows p = 0.008 (p<.05), so there is significant difference in pay between races. When running the post-hoc test, there appears to be significant pay differences between Hispanic and Black workers.

Police: p<.001, main effect of race is statistically significant

#POLICE

### Anova 
police_aov <- aov(annual_salary ~ race,
  data = POLICE
)

summary(police_aov)
##                Df    Sum Sq   Mean Sq F value   Pr(>F)    
## race            5 2.145e+10 4.289e+09   13.66 2.56e-13 ***
## Residuals   11091 3.483e+12 3.141e+08                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 2038 observations deleted due to missingness
report(police_aov) # Yes, the main effect of race is statistically significant
## The ANOVA (formula: annual_salary ~ race) suggests that:
## 
##   - The main effect of race is statistically significant and very small (F(5, 11091) = 13.66, p < .001; Eta2 = 6.12e-03, 95% CI [3.61e-03, 1.00])
## 
## Effect sizes were labelled following Field's (2013) recommendations.
# Post hoc

TukeyHSD(aov(POLICE$annual_salary ~ POLICE$race), conf.level=0.95) # The only significant difference in pay is between: Hispanic and Blacks, Hispanic and White
##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = POLICE$annual_salary ~ POLICE$race)
## 
## $`POLICE$race`
##                         diff         lwr       upr     p adj
## aian-2prace     18872.666667 -32334.7294 70080.063 0.9006700
## api-2prace      22917.111111 -27678.5166 73512.739 0.7901553
## black-2prace    25574.340616 -24961.3973 76110.079 0.7009686
## hispanic-2prace 21503.868912 -29027.2907 72035.029 0.8307554
## white-2prace    25572.990380 -24940.6367 76086.617 0.7006211
## api-aian         4044.444444  -4869.6190 12958.508 0.7889571
## black-aian       6701.673949  -1865.9274 15269.275 0.2241973
## hispanic-aian    2631.202246  -5909.3529 11171.757 0.9518125
## white-aian       6700.323713  -1735.8787 15136.526 0.2091544
## black-api        2657.229505   -678.0245  5992.484 0.2061438
## hispanic-api    -1413.242199  -4678.3930  1851.909 0.8203920
## white-api        2655.879269   -325.6704  5637.429 0.1130430
## hispanic-black  -4070.471703  -6216.2756 -1924.668 0.0000010
## white-black        -1.350236  -1684.4733  1681.773 1.0000000
## white-hispanic   4069.121468   2529.5782  5608.665 0.0000000

For the Fire Dept: effect of race on pay is statistically significant

#FIRE

## Anova
fire_aov <- aov(annual_salary ~ race,
  data = FIRE
)

summary(fire_aov)
##               Df    Sum Sq   Mean Sq F value   Pr(>F)    
## race           4 1.188e+10 2.969e+09   5.712 0.000139 ***
## Residuals   4390 2.282e+12 5.198e+08                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 335 observations deleted due to missingness
report(fire_aov)
## The ANOVA (formula: annual_salary ~ race) suggests that:
## 
##   - The main effect of race is statistically significant and very small (F(4, 4390) = 5.71, p < .001; Eta2 = 5.18e-03, 95% CI [1.68e-03, 1.00])
## 
## Effect sizes were labelled following Field's (2013) recommendations.
#Tukey post hoc
TukeyHSD(aov(FIRE$annual_salary ~ FIRE$race), conf.level=0.95) # The only significant difference is between: Hispanic and Blacks, Hispanic and White
##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = FIRE$annual_salary ~ FIRE$race)
## 
## $`FIRE$race`
##                     diff         lwr       upr     p adj
## api-aian        2948.631 -14662.2048 20559.467 0.9910324
## black-aian      8775.302  -8147.5346 25698.139 0.6179301
## hispanic-aian   1821.590 -15200.1253 18843.305 0.9984155
## white-aian      7276.761  -9383.4639 23936.985 0.7560887
## black-api       5826.671   -773.2084 12426.551 0.1129629
## hispanic-api   -1127.041  -7976.4786  5722.397 0.9916077
## white-api       4328.130  -1565.8576 10222.117 0.2641640
## hispanic-black -6953.712 -11765.7453 -2141.679 0.0007766
## white-black    -1498.542  -4812.1426  1815.059 0.7313721
## white-hispanic  5455.171   1668.8352  9241.506 0.0008148

In the Streets & San: no evidence of pay inequity. Main effect of race is statistically not significant, p= 0.33

#Streets & San


## Anova
streets_aov <- aov(annual_salary ~ race,
  data = STREETS
)

summary(streets_aov)
##               Df    Sum Sq   Mean Sq F value Pr(>F)
## race           4 5.363e+09 1.341e+09   1.152   0.33
## Residuals   1682 1.957e+12 1.164e+09               
## 353 observations deleted due to missingness
report(streets_aov)
## The ANOVA (formula: annual_salary ~ race) suggests that:
## 
##   - The main effect of race is statistically not significant and very small (F(4, 1682) = 1.15, p = 0.330; Eta2 = 2.73e-03, 95% CI [0.00, 1.00])
## 
## Effect sizes were labelled following Field's (2013) recommendations.
#Tukey post hoc
TukeyHSD(aov(STREETS$annual_salary ~ STREETS$race), conf.level=0.95) # No significant difference to report here
##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = STREETS$annual_salary ~ STREETS$race)
## 
## $`STREETS$race`
##                      diff        lwr       upr     p adj
## api-aian        3813.8663 -30475.372 38103.105 0.9981508
## black-aian       879.6089 -30901.311 32660.528 0.9999926
## hispanic-aian  -2651.5876 -34266.980 28963.805 0.9993910
## white-aian      2330.4643 -28835.693 33496.622 0.9996133
## black-api      -2934.2574 -18982.486 13113.971 0.9874417
## hispanic-api   -6465.4539 -22183.337  9252.430 0.7942772
## white-api      -1483.4020 -16276.920 13310.116 0.9987687
## hispanic-black -3531.1965 -12549.336  5486.943 0.8223354
## white-black     1450.8555  -5837.708  8739.419 0.9827265
## white-hispanic  4982.0520  -1547.065 11511.169 0.2275587

Water management:

# Water MGMNT


## Anova
water_aov <- aov(annual_salary ~ race,
  data = WATER
)

summary(water_aov)
##               Df    Sum Sq   Mean Sq F value Pr(>F)  
## race           4 1.817e+10 4.543e+09       3 0.0176 *
## Residuals   1665 2.521e+12 1.514e+09                 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 197 observations deleted due to missingness
report(water_aov)
## The ANOVA (formula: annual_salary ~ race) suggests that:
## 
##   - The main effect of race is statistically significant and very small (F(4, 1665) = 3.00, p = 0.018; Eta2 = 7.16e-03, 95% CI [6.77e-04, 1.00])
## 
## Effect sizes were labelled following Field's (2013) recommendations.
# Tukey post hoc
TukeyHSD(aov(WATER$annual_salary ~ WATER$race), conf.level=0.95) # There is significant difference in pay between Asian or Pacific Islander (API)  and Black workers, API and Hispanic workers, and API and White workers. 
##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = WATER$annual_salary ~ WATER$race)
## 
## $`WATER$race`
##                       diff        lwr       upr     p adj
## api-aian        21259.6517 -21916.938 64436.242 0.6632503
## black-aian       1801.0934 -39093.777 42695.964 0.9999529
## hispanic-aian    -772.6726 -41864.658 40319.312 0.9999984
## white-aian       1491.3841 -38784.028 41766.796 0.9999764
## black-api      -19458.5583 -37067.571 -1849.546 0.0217421
## hispanic-api   -22032.3243 -40094.388 -3970.261 0.0078669
## white-api      -19768.2675 -35886.446 -3650.089 0.0073927
## hispanic-black  -2573.7660 -14167.326  9019.794 0.9741412
## white-black      -309.7092  -8553.991  7934.573 0.9999750
## white-hispanic   2264.0568  -6908.129 11436.243 0.9620028

In the Aviation Department, there is significant difference in pay, p = 0.008

# Aviation


## Anova
aviation_aov <- aov(annual_salary ~ race,
  data = AVIATION
)

summary(aviation_aov) 
##               Df    Sum Sq  Mean Sq F value  Pr(>F)   
## race           4 2.440e+10 6.10e+09   3.446 0.00821 **
## Residuals   1431 2.533e+12 1.77e+09                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 330 observations deleted due to missingness
report(aviation_aov)
## The ANOVA (formula: annual_salary ~ race) suggests that:
## 
##   - The main effect of race is statistically significant and very small (F(4, 1431) = 3.45, p = 0.008; Eta2 = 9.54e-03, 95% CI [1.41e-03, 1.00])
## 
## Effect sizes were labelled following Field's (2013) recommendations.
#Tukey post hoc test
TukeyHSD(aov(AVIATION$annual_salary ~ AVIATION$race), conf.level=0.95) #Yes, only 1 sig. difference between Hispanic and Black p<.05
##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = AVIATION$annual_salary ~ AVIATION$race)
## 
## $`AVIATION$race`
##                     diff        lwr       upr     p adj
## api-aian       28009.127 -22468.817 78487.072 0.5524841
## black-aian     22368.127 -25353.243 70089.496 0.7034981
## hispanic-aian  37341.565 -10394.080 85077.210 0.2051528
## white-aian     29174.345 -17869.549 76218.238 0.4380739
## black-api      -5641.000 -26237.640 14955.639 0.9450306
## hispanic-api    9332.437 -11297.256 29962.131 0.7305194
## white-api       1165.217 -17808.985 20139.420 0.9998228
## hispanic-black 14973.438   2527.938 27418.938 0.0091913
## white-black     6806.218  -2645.057 16257.493 0.2828403
## white-hispanic -8167.220 -17690.311  1355.871 0.1322420