Use the original data files: IL.TXT, census_2010_race.csv, and pa446_chicago_full.csv. Clean all of them and create 1 master table. The master table need to have the following columns: last_name, first_name, job_titles, department, annual_salary (for both salaried and hourly employees), race, gender. SHOW YOUR WORK.
The final master table is titled “finaldf” and you can find it in line 174.
library(readr)
library(here)
library(tidyverse)
library(report) # useful to find statistical significance results
library(nortest) # normality test
df <- read_csv("pa446_chicago_full.csv")
df_gender<- read.csv("IL.Txt", header=FALSE)
df_race <- read_csv("census_2010_race.csv")
Clean up the Chicago_Full dataset
#separate name column
df_sep <- df %>%
separate(Name, c('last_name', 'first_mid_name'), sep= "," )
df_sep2 <- df_sep %>%
mutate(first_mid_name = str_trim(first_mid_name)) %>%
separate(first_mid_name, c('first_name', 'mid_name'), sep= " ")
df_sep2$first_name <- tolower(df_sep2$first_name) # change to lower
#unite annual_salary into one
df_sep2$annual_salary =
paste(df_sep2$`Annual Salary`, df_sep2$`Hourly Rate`)
df_sep2$annual_salary <- gsub("[^0-9.-]", "", df_sep2$annual_salary) #remove NA values
Clean up gender dataset
colnames(df_gender) <- c("state","gender", "born_year", "first_name", "count")
# change classes
df_gender=
df_gender %>%
mutate(state = as.character(state),
gender = as.character(gender),
born_year = as.integer(born_year),
first_name = as.character(first_name),
count = as.integer(count))
# distinct name regardless of year
df_gender_distinct <- df_gender %>%
distinct(gender, first_name) # distinct by name regardless of year
dim(df_gender_distinct) #12232 x 2
## [1] 12232 2
# remove duplicates
df_gender_slice <- df_gender %>%
group_by(first_name) %>%
slice(which.max(count)) #
dim(df_gender_slice) # 11,402 x 5
## [1] 11402 5
df_gender_slice$first_name <- tolower(df_gender_slice$first_name)
Merge salary and gender dataset
df_joined_3= left_join(df_sep2, df_gender_slice)
df_joined_3= select(df_joined_3, -born_year, -count) # remove born_year, count column
Clean up race column
df_race$pctaian <- as.numeric(df_race$pctaian) # change classes
df_race$pctblack <- as.numeric(df_race$pctblack)
df_race$pctwhite <- as.numeric(df_race$pctwhite)
df_race$pctapi <- as.numeric(df_race$pctapi)
df_race$pct2prace <- as.numeric(df_race$pct2prace)
df_race$pcthispanic <- as.numeric(df_race$pcthispanic)
final_col <- c('name', 'pctwhite', 'pctblack', 'pctapi', 'pctaian', 'pct2prace', 'pcthispanic')
race_col <- c('pctwhite', 'pctblack', 'pctapi', 'pctaian', 'pct2prace', 'pcthispanic')
# select cols above
df_race_filter <- df_race %>%
select(final_col)
# pivot longer
df_long <-
df_race_filter %>%
pivot_longer(
col= !name, # or race_col
names_to= 'race',
values_to= 'percent'
)
# slice max
df_long_slice <- df_long %>%
group_by(name) %>%
slice(which.max(percent))
# remove pct
df_long_slice$race <- str_replace(df_long_slice$race,
'^pct',
'' )
# change to lower
df_long_slice$name <- tolower(df_long_slice$name)
# FINAL Dataset
df_finalrace = select(df_long_slice, name, race)
Merge gender and race
DATA=
left_join(df_joined_3, df_finalrace, by = c("first_name" = "name"))
#rename
DATA= rename(DATA, c("job_titles" = "Job Titles"))
#make salary numeric
DATA$annual_salary <- as.numeric(DATA$annual_salary)
FINAL Master Table with the following columns: last_name, first_name, job_titles, department, annual_salary (for both salaried and hourly employees), race, gender.
finaldf=
DATA %>% select(last_name, first_name, job_titles, Department, annual_salary, race, gender)
sum(is.na(finaldf$race)) #5174 missing race values
## [1] 5174
sum(is.na(finaldf$gender)) #2683 missing gender
## [1] 2683
” As you already know, one of the mayor of Chicago’s priority this year is equity in pay for city employees, especially in some of the city’s largest departments.
Furthermore, equity is defined as pay equality between 1) different genders and 2) different races
Use your master table and see if there are general pay differences between genders in the city’s 5 largest departments. Please also calculate the n-size for males and females in each department. ”
In the Police Department, there is evidence of pay inequity. There are 2970 females and 9105 males. The male median wage is $90,024 which is $3,018 more than the median female wage of $87,006. The maximum male earner makes $260,004 while the maximum female earner makes $185,364. However, this is explained because the Police SuperIntendent is a male, and the position of “Superintendent” comes with a higher salary than the rest.
In the Fire Department, there is evidence of pay inequity. There are 422 females and 4133 males. The male median wage is $103,914 while the female median wage is $100,560. When looking at the median wage, males make ~$3,000 more. The maximum male earner in the Fire Department also makes $217,728 while the maximum female earner makes $197,736. The pay inequity between the max male and female earners in the Fire Department is much larger than the Police Department.
In the Streets & Sanitation Department, there is some evidence of pay inequity. There are 234 females and 1499 males. While the median pay for both male ($38.35/hr) and females ($39.02) are almost the same, females make 67 cents more. However, the max male earner makes $175,092 while the max female earner makes $131,664 a difference of $43,428.
In the water management department, there is some evidence of pay inequity. There are 234 females and 1499 males. The median pay per hr is basically the same for both male and females. They both make around $49/hr. Compared to the other departments, the Water Management Department pays the max female earner $159,036 and the max male earner $152,832.
In the aviation department, similarly to the water management department, there is some evidence of pay inequity in favor of females. There are 378 females and 1185 Males. The median pay per hr is $4 higher for females and the max female earner makes $87,276 more than the max male earner.
First, let’s define the 5 largest departments. We will define this as the department with the most number of employees. These are: Police, Fire, Streets & San, Water Management, and Aviation.
table(finaldf$Department)
##
## ADMIN HEARNG ANIMAL CONTRL AVIATION BOARD OF ELECTION
## 35 67 1766 104
## BOARD OF ETHICS BUDGET & MGMT BUILDINGS BUSINESS AFFAIRS
## 8 38 242 167
## CITY CLERK CITY COUNCIL COPA CULTURAL AFFAIRS
## 82 370 125 59
## DAIS DISABILITIES FAMILY & SUPPORT FINANCE
## 1003 26 621 505
## FIRE HEALTH HOUSING HOUSING & ECON DEV
## 4730 513 68 155
## HUMAN RELATIONS HUMAN RESOURCES INSPECTOR GEN LAW
## 15 69 91 377
## LICENSE APPL COMM MAYOR'S OFFICE OEMC POLICE
## 1 104 932 13135
## POLICE BOARD PROCUREMENT PUBLIC LIBRARY PUBLIC SAFETY ADMIN
## 2 78 1004 259
## STREETS & SAN TRANSPORTN TREASURER WATER MGMNT
## 2040 1083 29 1867
#Police: 13,135
#Fire: 4730
#Streets & San: 2040
# Water MGMNT: 1867
# Aviation: 1766
Now, let’s try to find pay differences between genders. We will use the fivenum() function to find minimum, 1st quartile, median, 3rd quartile, max.
#Police Dept:
POLICE= finaldf %>%
filter(Department == "POLICE")
POLICE %>%
group_by(gender) %>%
summarize(salarysummary= fivenum(annual_salary, na.rm=TRUE))
## # A tibble: 15 × 2
## # Groups: gender [3]
## gender salarysummary
## <chr> <dbl>
## 1 F 9.74
## 2 F 76266
## 3 F 87006
## 4 F 93354
## 5 F 185364
## 6 M 12
## 7 M 84054
## 8 M 90024
## 9 M 96060
## 10 M 260004
## 11 <NA> 14
## 12 <NA> 76266
## 13 <NA> 84054
## 14 <NA> 93354
## 15 <NA> 162684
table(POLICE$gender) #2970 FEMALES, 9105 MALES
##
## F M
## 2970 9105
#Females Males
#minimum: $9.74 minimum: $12
#1st quartile: $76,266 1st quartile:$84,054
# median: $87,006 median: $90,024
#3rd quartile: $93,354 3rd quartile: $96,060
#max: $185,364 max: $260,004
Fire Department!
#Fire Dept:
FIRE= finaldf %>%
filter(Department == "FIRE")
FIRE %>%
group_by(gender) %>%
summarize(salarysummary= fivenum(annual_salary, na.rm=TRUE))
## # A tibble: 15 × 2
## # Groups: gender [3]
## gender salarysummary
## <chr> <dbl>
## 1 F 45288
## 2 F 88338
## 3 F 100560
## 4 F 114096
## 5 F 197736
## 6 M 52896
## 7 M 98424
## 8 M 103914
## 9 M 116808
## 10 M 217728
## 11 <NA> 62160
## 12 <NA> 88338
## 13 <NA> 100764
## 14 <NA> 110226
## 15 <NA> 165048
table(FIRE$gender) #422 F, 4133 M
##
## F M
## 422 4133
#Females Males
#minimum: $45288 minimum: $52896
#1st quartile: $88338 1st quartile:$98424
# median: $100560 median: $103914
#3rd quartile: $114096 3rd quartile: $116808
#max: $197736 max: $217728
Streets & San Department
#Streets & San
STREETS= finaldf %>%
filter(Department == "STREETS & SAN")
STREETS %>%
group_by(gender) %>%
summarize(salarysummary= fivenum(annual_salary, na.rm=TRUE))
## # A tibble: 15 × 2
## # Groups: gender [3]
## gender salarysummary
## <chr> <dbl>
## 1 F 21.7
## 2 F 30.7
## 3 F 39.0
## 4 F 39.4
## 5 F 131664
## 6 M 21.7
## 7 M 38.4
## 8 M 38.4
## 9 M 39.4
## 10 M 175092
## 11 <NA> 21.7
## 12 <NA> 30.7
## 13 <NA> 38.4
## 14 <NA> 39.4
## 15 <NA> 115656
table(STREETS$gender) #367 F, 1472 M
##
## F M
## 367 1472
#Females Males
#minimum: $21.73 minimum: $21.73
#1st quartile: $30.68 1st quartile:$38.35
# median: $39.02 median: $38.35
#3rd quartile: $39.39 3rd quartile: $39.39
#max: $131,664 max: $175,092
Water Management Department
# Water MGMNT
WATER= finaldf %>%
filter(Department == "WATER MGMNT")
WATER%>%
group_by(gender) %>%
summarize(salarysummary= fivenum(annual_salary, na.rm=TRUE))
## # A tibble: 15 × 2
## # Groups: gender [3]
## gender salarysummary
## <chr> <dbl>
## 1 F 19.0
## 2 F 44.4
## 3 F 49.9
## 4 F 76932
## 5 F 159036
## 6 M 20.6
## 7 M 44.4
## 8 M 49.8
## 9 M 52.3
## 10 M 152832
## 11 <NA> 19.0
## 12 <NA> 44.4
## 13 <NA> 46
## 14 <NA> 72024
## 15 <NA> 132972
table(WATER$gender) # 234 F, 1499 M
##
## F M
## 234 1499
#Females Males
#minimum: $18.950 minimum: $20.550
#1st quartile: $44.400 1st quartile:$44.400
# median: $49.915 median: $49.830
#3rd quartile: $76932 3rd quartile: $52.320
#max: $159036 max: $152832
Aviation Department
# Aviation
AVIATION= finaldf %>%
filter(Department == "AVIATION")
AVIATION %>%
group_by(gender) %>%
summarize(salarysummary= fivenum(annual_salary, na.rm=TRUE))
## # A tibble: 15 × 2
## # Groups: gender [3]
## gender salarysummary
## <chr> <dbl>
## 1 F 14
## 2 F 38.4
## 3 F 50.4
## 4 F 80484
## 5 F 275004
## 6 M 15
## 7 M 38.4
## 8 M 46.5
## 9 M 66852
## 10 M 187728
## 11 <NA> 14
## 12 <NA> 38.4
## 13 <NA> 38.4
## 14 <NA> 71550
## 15 <NA> 126504
table(AVIATION$gender) #378 F, 1185 M
##
## F M
## 378 1185
275004-187728
## [1] 87276
#Females Males
#minimum: $14 minimum: $15
#1st quartile: $38.35 1st quartile:$38.35
# median: $50.37 median: $46.50
#3rd quartile: $80,484 3rd quartile: $66,852
#max: $275,004 max: $187,728
” Is the difference you observed in problem 2 statistically significant? ”
POLICE: Yes, at a 95% confidence level, there is a significant difference in pay in the Police Dept among Males and Females.Males on average make $4,874 more than females. This is statistically significant p< .05.
FIRE: Yes, at a 95% confidence level, there is a significant difference in pay between gender in FIRE Dept. This is border line significant p<.05. Males on average make $2537 more than females.
Streets & San: No, no significant difference in pay between gender in Streets & Sanitation department.
Water Management: Yes, at 95% confidence interval, there is a significant difference in gender pay in the Water Management Department. p<.05. Females on average make $21,070 more than males.
Aviation: Yes, at 95% confidence interval, there is a significant difference in gender pay in the Aviation Department.p<.05 Females on average make $13977 more than males.
For this, we will use two sample t-Test (Independent Samples t-Test) for the 5 departments:
# Police
t.test(POLICE$annual_salary ~ POLICE$gender) # Yes, at a 95% confidence level, there is a significant difference in pay in the Police Dept among Males and Females.
##
## Welch Two Sample t-test
##
## data: POLICE$annual_salary by POLICE$gender
## t = -12.921, df = 4991.2, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -5613.325 -4134.329
## sample estimates:
## mean in group F mean in group M
## 87267.68 92141.50
#Fire
t.test(FIRE$annual_salary ~ FIRE$gender) # Yes, at a 95% confidence level, there is a significant difference in pay between gender in FIRE Dept. This is border line significant p<.05
##
## Welch Two Sample t-test
##
## data: FIRE$annual_salary by FIRE$gender
## t = -2.0214, df = 495.99, p-value = 0.04377
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -5003.88746 -71.13431
## sample estimates:
## mean in group F mean in group M
## 103619.4 106156.9
#Streets & San
t.test(STREETS$annual_salary ~ STREETS$gender) # No, no significant difference in pay between gender in Streets & San department.
##
## Welch Two Sample t-test
##
## data: STREETS$annual_salary by STREETS$gender
## t = 1.1382, df = 583.1, p-value = 0.2555
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -1602.186 6018.760
## sample estimates:
## mean in group F mean in group M
## 15770.17 13561.88
# Water MGMNT:
t.test(WATER$annual_salary ~ WATER$gender) # Yes, at 95% confidence interval, there is a significant difference in gender pay in the Water Management Department. p<.05
##
## Welch Two Sample t-test
##
## data: WATER$annual_salary by WATER$gender
## t = 6.7751, df = 284.77, p-value = 7.129e-11
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 14949.13 27192.32
## sample estimates:
## mean in group F mean in group M
## 36036.95 14966.23
# Aviation:
t.test(AVIATION$annual_salary ~ AVIATION$gender) # Yes, at 95% confidence interval, there is a significant diff. in gender pay in the Aviation Dept.
##
## Welch Two Sample t-test
##
## data: AVIATION$annual_salary by AVIATION$gender
## t = 5.484, df = 599.37, p-value = 6.136e-08
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 8971.532 18982.357
## sample estimates:
## mean in group F mean in group M
## 40080.86 26103.92
” Use your master table and see if there are general pay differences between races in the city’s 5 largest departments. Please also calculate the n-size for each race subgroup in each department. ”
In the Police Department, this is the n-size for each race. 2 races Akaska Native(aian) Asian or Pacific Islander(API) black hispanic white 1 36 297 1007 1232 8524 By looking at the median pay, Black and white workers seem to make a little more than the rest. Alaska Native and those classified as 2 races ( which is only 1 worker) make less than the rest. There is evidence of pay inequity in the Police Dept.
In the Fire department, this is the n-size for each race. aian api black hispanic white 14 115 391 292 3583 Looking at the median pay, black and white still make more than everybody else while Alaska Native make the least. These results are similar in the Police Department. There is evidence of pay inequity.
In the Streets & Sanitation department, this is the n-size for each race. aian api black hispanic white 9 41 189 245 1203 Looking at the median-pay, all races make the same, $38.35 per hr, except for Alaska Native workers who make $39.39.
In the water management department, this is the n-size for each race. aian api black hispanic white 7 45 191 150 1277 Looking at the median pay, Alaska Native and Asian or Pacific Islander(API) workers make a little more compared to the rest. Hispanic workers make the least in this department. There is evidence of pay inequity here.
In the aviation department, this is the n-size for each race. aian api black hispanic white 6 38 172 169 1051
Looking at the median pay, Hispanic workers make significantly more than the rest, followed closely behind by white workers. API, AIAN, and black workers make $12 less than Hispanic workers at the median pay.
We will use the median function to find pay differences between races in the 5 departments: Police, Fire, Streets & San, Water Management, and Aviation.
Visualize it first using boxplots. There seems to be more outliers in the white category compared to the rest. Also, 2 races seem to have only 1 value.This particular person who identifies as 2 races works for the Police Department only.
ggplot(finaldf) +
aes(x = race, y = annual_salary) +
geom_boxplot()
Police Department:
#Police Dept:
POLICE %>%
group_by(race) %>%
summarize(salarysummary= median(annual_salary, na.rm=TRUE)) %>%
arrange(salarysummary)
## # A tibble: 7 × 2
## race salarysummary
## <chr> <dbl>
## 1 2prace 66336
## 2 aian 84054
## 3 <NA> 84054
## 4 api 87006
## 5 hispanic 87006
## 6 black 90024
## 7 white 90024
table(POLICE$race)
##
## 2prace aian api black hispanic white
## 1 36 297 1007 1232 8524
# 2 races Alaska Native(aian) Asian or Pacific Islander(API) black hispanic white
#1 36 297 1007 1232 8524
Fire Department
#Fire Dept:
FIRE %>%
group_by(race) %>%
summarize(salarysummary= median(annual_salary, na.rm=TRUE)) %>%
arrange(salarysummary)
## # A tibble: 6 × 2
## race salarysummary
## <chr> <dbl>
## 1 aian 100149
## 2 <NA> 100560
## 3 api 101874
## 4 hispanic 101874
## 5 black 103914
## 6 white 103914
table(FIRE$race)
##
## aian api black hispanic white
## 14 115 391 292 3583
# aian api black hispanic white
# 14 115 391 292 3583
Streets & Sanitation
#Streets & San
STREETS %>%
group_by(race) %>%
summarize(salarysummary= median(annual_salary, na.rm=TRUE)) %>%
arrange(salarysummary)
## # A tibble: 6 × 2
## race salarysummary
## <chr> <dbl>
## 1 api 38.4
## 2 black 38.4
## 3 hispanic 38.4
## 4 white 38.4
## 5 <NA> 38.4
## 6 aian 39.4
table(STREETS$race)
##
## aian api black hispanic white
## 9 41 189 245 1203
#aian api black hispanic white
# 9 41 189 245 1203
Water Management
# Water MGMNT
WATER%>%
group_by(race) %>%
summarize(salarysummary= median(annual_salary, na.rm=TRUE)) %>%
arrange(salarysummary)
## # A tibble: 6 × 2
## race salarysummary
## <chr> <dbl>
## 1 <NA> 44.9
## 2 hispanic 47.3
## 3 white 49.8
## 4 black 50
## 5 aian 52.1
## 6 api 52.1
table(WATER$race)
##
## aian api black hispanic white
## 7 45 191 150 1277
#aian api black hispanic white
# 7 45 191 150 1277
Aviation Department
# Aviation
AVIATION %>%
group_by(race) %>%
summarize(salarysummary= median(annual_salary, na.rm=TRUE)) %>%
arrange(salarysummary)
## # A tibble: 6 × 2
## race salarysummary
## <chr> <dbl>
## 1 api 38.4
## 2 aian 38.6
## 3 black 38.8
## 4 <NA> 38.8
## 5 white 48.3
## 6 hispanic 50.8
table(AVIATION$race)
##
## aian api black hispanic white
## 6 38 172 169 1051
#aian api black hispanic white
# 6 38 172 169 1051
” Is the differences statistically significant? ”
In the Police Department, when running the ANOVA test, there is a significant difference (p < .001) in pay between races. The Tukey-post Hoc result shows that there are significant pay differences between Hispanic and Black Officers as well as Hispanic and White officers.
In the Fire Department, when running the ANOVA test, there is a significant difference in pay between races (p < .001). The Tukey Post-Hoc test shows significant pay differences between Hispanic and Black workers and Hispanic and White workers. The results are similar in the Police Department.
In the Streets & San Department, when running the ANOVA test, the P-value is 0.33, so we can conclude that there are no significant differences in pay between races.
In the Water Management Department, the ANOVA test shows p = 0.018 (p<.05), so there is a significant difference in pay between races. When running the post-hoc test, there appears to be significant pay differences between Asian or Pacific Islander (API) and Black workers, API and Hispanic workers, and API and White workers.
In the Aviation Department, the ANOVA test shows p = 0.008 (p<.05), so there is significant difference in pay between races. When running the post-hoc test, there appears to be significant pay differences between Hispanic and Black workers.
Police: p<.001, main effect of race is statistically significant
#POLICE
### Anova
police_aov <- aov(annual_salary ~ race,
data = POLICE
)
summary(police_aov)
## Df Sum Sq Mean Sq F value Pr(>F)
## race 5 2.145e+10 4.289e+09 13.66 2.56e-13 ***
## Residuals 11091 3.483e+12 3.141e+08
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 2038 observations deleted due to missingness
report(police_aov) # Yes, the main effect of race is statistically significant
## The ANOVA (formula: annual_salary ~ race) suggests that:
##
## - The main effect of race is statistically significant and very small (F(5, 11091) = 13.66, p < .001; Eta2 = 6.12e-03, 95% CI [3.61e-03, 1.00])
##
## Effect sizes were labelled following Field's (2013) recommendations.
# Post hoc
TukeyHSD(aov(POLICE$annual_salary ~ POLICE$race), conf.level=0.95) # The only significant difference in pay is between: Hispanic and Blacks, Hispanic and White
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = POLICE$annual_salary ~ POLICE$race)
##
## $`POLICE$race`
## diff lwr upr p adj
## aian-2prace 18872.666667 -32334.7294 70080.063 0.9006700
## api-2prace 22917.111111 -27678.5166 73512.739 0.7901553
## black-2prace 25574.340616 -24961.3973 76110.079 0.7009686
## hispanic-2prace 21503.868912 -29027.2907 72035.029 0.8307554
## white-2prace 25572.990380 -24940.6367 76086.617 0.7006211
## api-aian 4044.444444 -4869.6190 12958.508 0.7889571
## black-aian 6701.673949 -1865.9274 15269.275 0.2241973
## hispanic-aian 2631.202246 -5909.3529 11171.757 0.9518125
## white-aian 6700.323713 -1735.8787 15136.526 0.2091544
## black-api 2657.229505 -678.0245 5992.484 0.2061438
## hispanic-api -1413.242199 -4678.3930 1851.909 0.8203920
## white-api 2655.879269 -325.6704 5637.429 0.1130430
## hispanic-black -4070.471703 -6216.2756 -1924.668 0.0000010
## white-black -1.350236 -1684.4733 1681.773 1.0000000
## white-hispanic 4069.121468 2529.5782 5608.665 0.0000000
For the Fire Dept: effect of race on pay is statistically significant
#FIRE
## Anova
fire_aov <- aov(annual_salary ~ race,
data = FIRE
)
summary(fire_aov)
## Df Sum Sq Mean Sq F value Pr(>F)
## race 4 1.188e+10 2.969e+09 5.712 0.000139 ***
## Residuals 4390 2.282e+12 5.198e+08
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 335 observations deleted due to missingness
report(fire_aov)
## The ANOVA (formula: annual_salary ~ race) suggests that:
##
## - The main effect of race is statistically significant and very small (F(4, 4390) = 5.71, p < .001; Eta2 = 5.18e-03, 95% CI [1.68e-03, 1.00])
##
## Effect sizes were labelled following Field's (2013) recommendations.
#Tukey post hoc
TukeyHSD(aov(FIRE$annual_salary ~ FIRE$race), conf.level=0.95) # The only significant difference is between: Hispanic and Blacks, Hispanic and White
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = FIRE$annual_salary ~ FIRE$race)
##
## $`FIRE$race`
## diff lwr upr p adj
## api-aian 2948.631 -14662.2048 20559.467 0.9910324
## black-aian 8775.302 -8147.5346 25698.139 0.6179301
## hispanic-aian 1821.590 -15200.1253 18843.305 0.9984155
## white-aian 7276.761 -9383.4639 23936.985 0.7560887
## black-api 5826.671 -773.2084 12426.551 0.1129629
## hispanic-api -1127.041 -7976.4786 5722.397 0.9916077
## white-api 4328.130 -1565.8576 10222.117 0.2641640
## hispanic-black -6953.712 -11765.7453 -2141.679 0.0007766
## white-black -1498.542 -4812.1426 1815.059 0.7313721
## white-hispanic 5455.171 1668.8352 9241.506 0.0008148
In the Streets & San: no evidence of pay inequity. Main effect of race is statistically not significant, p= 0.33
#Streets & San
## Anova
streets_aov <- aov(annual_salary ~ race,
data = STREETS
)
summary(streets_aov)
## Df Sum Sq Mean Sq F value Pr(>F)
## race 4 5.363e+09 1.341e+09 1.152 0.33
## Residuals 1682 1.957e+12 1.164e+09
## 353 observations deleted due to missingness
report(streets_aov)
## The ANOVA (formula: annual_salary ~ race) suggests that:
##
## - The main effect of race is statistically not significant and very small (F(4, 1682) = 1.15, p = 0.330; Eta2 = 2.73e-03, 95% CI [0.00, 1.00])
##
## Effect sizes were labelled following Field's (2013) recommendations.
#Tukey post hoc
TukeyHSD(aov(STREETS$annual_salary ~ STREETS$race), conf.level=0.95) # No significant difference to report here
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = STREETS$annual_salary ~ STREETS$race)
##
## $`STREETS$race`
## diff lwr upr p adj
## api-aian 3813.8663 -30475.372 38103.105 0.9981508
## black-aian 879.6089 -30901.311 32660.528 0.9999926
## hispanic-aian -2651.5876 -34266.980 28963.805 0.9993910
## white-aian 2330.4643 -28835.693 33496.622 0.9996133
## black-api -2934.2574 -18982.486 13113.971 0.9874417
## hispanic-api -6465.4539 -22183.337 9252.430 0.7942772
## white-api -1483.4020 -16276.920 13310.116 0.9987687
## hispanic-black -3531.1965 -12549.336 5486.943 0.8223354
## white-black 1450.8555 -5837.708 8739.419 0.9827265
## white-hispanic 4982.0520 -1547.065 11511.169 0.2275587
Water management:
# Water MGMNT
## Anova
water_aov <- aov(annual_salary ~ race,
data = WATER
)
summary(water_aov)
## Df Sum Sq Mean Sq F value Pr(>F)
## race 4 1.817e+10 4.543e+09 3 0.0176 *
## Residuals 1665 2.521e+12 1.514e+09
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 197 observations deleted due to missingness
report(water_aov)
## The ANOVA (formula: annual_salary ~ race) suggests that:
##
## - The main effect of race is statistically significant and very small (F(4, 1665) = 3.00, p = 0.018; Eta2 = 7.16e-03, 95% CI [6.77e-04, 1.00])
##
## Effect sizes were labelled following Field's (2013) recommendations.
# Tukey post hoc
TukeyHSD(aov(WATER$annual_salary ~ WATER$race), conf.level=0.95) # There is significant difference in pay between Asian or Pacific Islander (API) and Black workers, API and Hispanic workers, and API and White workers.
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = WATER$annual_salary ~ WATER$race)
##
## $`WATER$race`
## diff lwr upr p adj
## api-aian 21259.6517 -21916.938 64436.242 0.6632503
## black-aian 1801.0934 -39093.777 42695.964 0.9999529
## hispanic-aian -772.6726 -41864.658 40319.312 0.9999984
## white-aian 1491.3841 -38784.028 41766.796 0.9999764
## black-api -19458.5583 -37067.571 -1849.546 0.0217421
## hispanic-api -22032.3243 -40094.388 -3970.261 0.0078669
## white-api -19768.2675 -35886.446 -3650.089 0.0073927
## hispanic-black -2573.7660 -14167.326 9019.794 0.9741412
## white-black -309.7092 -8553.991 7934.573 0.9999750
## white-hispanic 2264.0568 -6908.129 11436.243 0.9620028
In the Aviation Department, there is significant difference in pay, p = 0.008
# Aviation
## Anova
aviation_aov <- aov(annual_salary ~ race,
data = AVIATION
)
summary(aviation_aov)
## Df Sum Sq Mean Sq F value Pr(>F)
## race 4 2.440e+10 6.10e+09 3.446 0.00821 **
## Residuals 1431 2.533e+12 1.77e+09
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 330 observations deleted due to missingness
report(aviation_aov)
## The ANOVA (formula: annual_salary ~ race) suggests that:
##
## - The main effect of race is statistically significant and very small (F(4, 1431) = 3.45, p = 0.008; Eta2 = 9.54e-03, 95% CI [1.41e-03, 1.00])
##
## Effect sizes were labelled following Field's (2013) recommendations.
#Tukey post hoc test
TukeyHSD(aov(AVIATION$annual_salary ~ AVIATION$race), conf.level=0.95) #Yes, only 1 sig. difference between Hispanic and Black p<.05
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = AVIATION$annual_salary ~ AVIATION$race)
##
## $`AVIATION$race`
## diff lwr upr p adj
## api-aian 28009.127 -22468.817 78487.072 0.5524841
## black-aian 22368.127 -25353.243 70089.496 0.7034981
## hispanic-aian 37341.565 -10394.080 85077.210 0.2051528
## white-aian 29174.345 -17869.549 76218.238 0.4380739
## black-api -5641.000 -26237.640 14955.639 0.9450306
## hispanic-api 9332.437 -11297.256 29962.131 0.7305194
## white-api 1165.217 -17808.985 20139.420 0.9998228
## hispanic-black 14973.438 2527.938 27418.938 0.0091913
## white-black 6806.218 -2645.057 16257.493 0.2828403
## white-hispanic -8167.220 -17690.311 1355.871 0.1322420