The data I have used for my R project is a public data set from Kaggle.com called the “IBM HR Analytics Employee Attrition and Performance”. This is a fictional data set created by IBM data scientists. While this data set has a lot of attributes related to attrition of employees, I decided to use this data set to study the gender inequality in the corporate world. We will take a look at the job level, education level, attrition and income differences between men and women employees in this data set.
We have the employee related details for 1470 employees that span across 35 different columns. Some of the columns that we have primarily used for our analysis are:
-Age -Attrition -Gender -Monthly Income -Education Level -Job Level -Business Travel
summary(hrdata)
## Age Attrition BusinessTravel DailyRate
## Min. :18.00 Length:1470 Length:1470 Min. : 102.0
## 1st Qu.:30.00 Class :character Class :character 1st Qu.: 465.0
## Median :36.00 Mode :character Mode :character Median : 802.0
## Mean :36.92 Mean : 802.5
## 3rd Qu.:43.00 3rd Qu.:1157.0
## Max. :60.00 Max. :1499.0
## Department DistanceFromHome Education EducationField
## Length:1470 Min. : 1.000 Min. :1.000 Length:1470
## Class :character 1st Qu.: 2.000 1st Qu.:2.000 Class :character
## Mode :character Median : 7.000 Median :3.000 Mode :character
## Mean : 9.193 Mean :2.913
## 3rd Qu.:14.000 3rd Qu.:4.000
## Max. :29.000 Max. :5.000
## EmployeeCount EmployeeNumber EnvironmentSatisfaction Gender
## Min. :1 Min. : 1.0 Min. :1.000 Length:1470
## 1st Qu.:1 1st Qu.: 491.2 1st Qu.:2.000 Class :character
## Median :1 Median :1020.5 Median :3.000 Mode :character
## Mean :1 Mean :1024.9 Mean :2.722
## 3rd Qu.:1 3rd Qu.:1555.8 3rd Qu.:4.000
## Max. :1 Max. :2068.0 Max. :4.000
## HourlyRate JobInvolvement JobLevel JobRole
## Min. : 30.00 Min. :1.00 Min. :1.000 Length:1470
## 1st Qu.: 48.00 1st Qu.:2.00 1st Qu.:1.000 Class :character
## Median : 66.00 Median :3.00 Median :2.000 Mode :character
## Mean : 65.89 Mean :2.73 Mean :2.064
## 3rd Qu.: 83.75 3rd Qu.:3.00 3rd Qu.:3.000
## Max. :100.00 Max. :4.00 Max. :5.000
## JobSatisfaction MaritalStatus MonthlyIncome MonthlyRate
## Min. :1.000 Length:1470 Min. : 1009 Min. : 2094
## 1st Qu.:2.000 Class :character 1st Qu.: 2911 1st Qu.: 8047
## Median :3.000 Mode :character Median : 4919 Median :14236
## Mean :2.729 Mean : 6503 Mean :14313
## 3rd Qu.:4.000 3rd Qu.: 8379 3rd Qu.:20462
## Max. :4.000 Max. :19999 Max. :26999
## NumCompaniesWorked Over18 OverTime PercentSalaryHike
## Min. :0.000 Length:1470 Length:1470 Min. :11.00
## 1st Qu.:1.000 Class :character Class :character 1st Qu.:12.00
## Median :2.000 Mode :character Mode :character Median :14.00
## Mean :2.693 Mean :15.21
## 3rd Qu.:4.000 3rd Qu.:18.00
## Max. :9.000 Max. :25.00
## PerformanceRating RelationshipSatisfaction StandardHours StockOptionLevel
## Min. :3.000 Min. :1.000 Min. :80 Min. :0.0000
## 1st Qu.:3.000 1st Qu.:2.000 1st Qu.:80 1st Qu.:0.0000
## Median :3.000 Median :3.000 Median :80 Median :1.0000
## Mean :3.154 Mean :2.712 Mean :80 Mean :0.7939
## 3rd Qu.:3.000 3rd Qu.:4.000 3rd Qu.:80 3rd Qu.:1.0000
## Max. :4.000 Max. :4.000 Max. :80 Max. :3.0000
## TotalWorkingYears TrainingTimesLastYear WorkLifeBalance YearsAtCompany
## Min. : 0.00 Min. :0.000 Min. :1.000 Min. : 0.000
## 1st Qu.: 6.00 1st Qu.:2.000 1st Qu.:2.000 1st Qu.: 3.000
## Median :10.00 Median :3.000 Median :3.000 Median : 5.000
## Mean :11.28 Mean :2.799 Mean :2.761 Mean : 7.008
## 3rd Qu.:15.00 3rd Qu.:3.000 3rd Qu.:3.000 3rd Qu.: 9.000
## Max. :40.00 Max. :6.000 Max. :4.000 Max. :40.000
## YearsInCurrentRole YearsSinceLastPromotion YearsWithCurrManager
## Min. : 0.000 Min. : 0.000 Min. : 0.000
## 1st Qu.: 2.000 1st Qu.: 0.000 1st Qu.: 2.000
## Median : 3.000 Median : 1.000 Median : 3.000
## Mean : 4.229 Mean : 2.188 Mean : 4.123
## 3rd Qu.: 7.000 3rd Qu.: 3.000 3rd Qu.: 7.000
## Max. :18.000 Max. :15.000 Max. :17.000
The analysis that I am doing on this data set will help us understand or realize if some gender based myths or pre-notions that we have about corporate employment are true or not based on this data set.
This visualization helps us get a general idea of the spread of employee population by age. Age of employee data used for analysis ranges from 18 to 60 years.
As you can see from the visualization 73% of the employees are between the age of 26 and 45.
age_bracket1 <- nrow(hrdata[(hrdata$Age >= 18 & hrdata$Age <= 20),])
age_bracket2 <- nrow(hrdata[(hrdata$Age >= 21 & hrdata$Age <= 25),])
age_bracket3 <- nrow(hrdata[(hrdata$Age >= 26 & hrdata$Age <= 30),])
age_bracket4 <- nrow(hrdata[(hrdata$Age >= 31 & hrdata$Age <= 35),])
age_bracket5 <- nrow(hrdata[(hrdata$Age >= 36 & hrdata$Age <= 40),])
age_bracket6 <- nrow(hrdata[(hrdata$Age >= 41 & hrdata$Age <= 45),])
age_bracket7 <- nrow(hrdata[(hrdata$Age >= 46 & hrdata$Age <= 50),])
age_bracket8 <- nrow(hrdata[(hrdata$Age >= 51 & hrdata$Age <= 55),])
age_bracket9 <- nrow(hrdata[(hrdata$Age >= 56 & hrdata$Age <= 60),])
age_brackets <- c(age_bracket1, age_bracket2, age_bracket3, age_bracket4, age_bracket5, age_bracket6,
age_bracket7, age_bracket8, age_bracket9)
range <- c("18-20", "21-25", "26-30", "31-35", "36-40", "41-45", "45-50", "51-55", "56-60")
age_count <- data.frame(range, age_brackets)
fig1 <- ggplot(age_count, aes(x= range, y=age_brackets)) +
geom_bar(colour="black", fill="lightblue", stat="identity") +
labs(title = "Number of Employees by age range", x= "Age Range (18 to 60)", y = "Employee Count") +
theme(plot.title = element_text(hjust=0.5)) +
theme_replace()
fig1
This visualization should help us understand who are more likely to leave the organization based on gender.
As you see from the the visualization men are more likely (17%) to leave the organization compared to women employees (14%).
attrition <- hrdata %>%
select(Attrition, Gender) %>%
group_by(Attrition, Gender) %>%
summarise(n = length(Attrition), .groups = 'keep') %>%
data.frame()
attrition
## Attrition Gender n
## 1 No Female 501
## 2 No Male 732
## 3 Yes Female 87
## 4 Yes Male 150
tot_attrition <- hrdata %>%
select(Attrition, Gender) %>%
group_by(Attrition) %>%
summarise(n = length(Attrition)) %>%
data.frame()
max_y <- round_any(max(tot_attrition$n),200, ceiling)
fig2 <- ggplot(attrition, aes(x = Attrition, y = n , fill = Gender)) +
geom_bar(stat= "identity") +
coord_flip() +
labs(title = "Attrition Count by Gender", x= "Atrition", y = "Count of Employees", fill ="Gender") +
theme_light() +
theme(plot.title = element_text(hjust = 0.5)) +
scale_fill_brewer(palette="Paired", guide = guide_legend(reverse = TRUE)) +
geom_text(data=tot_attrition,aes(x=Attrition, y=n, label=scales::comma(n), fill= NULL), hjust = -0.3, size=4) +
scale_y_continuous(labels = comma,
breaks = seq(0, max_y, by = 200),
limits=c(0, max_y))
fig2
While modern day corporate companies are fighting hard and working to bridge the gender wage gap, I wanted to analyze if a gender wage exists in our data set.
So I tried comparing the mean monthly income between men and women employees across all job levels. This comparison revealed that, in 4 of the 5 job levels in the organization women employees’ average monthly income is less than men employees.
avg_income <- hrdata %>%
select(Gender, MonthlyIncome, JobLevel) %>%
group_by(Gender, JobLevel) %>%
summarise(a_income = mean(MonthlyIncome), .groups = 'keep') %>%
data.frame()
avg_income
## Gender JobLevel a_income
## 1 Female 1 2780.487
## 2 Female 2 5435.327
## 3 Female 3 9962.702
## 4 Female 4 15431.373
## 5 Female 5 19129.917
## 6 Male 1 2790.634
## 7 Male 2 5549.185
## 8 Male 3 9706.992
## 9 Male 4 15570.927
## 10 Male 5 19224.844
max_y2 <- round_any(max(avg_income$a_income),2500, ceiling)
fig3 <- ggplot(avg_income, aes(x = JobLevel, y = a_income, fill=Gender)) +
geom_bar(stat= "identity", position= "dodge") +
coord_flip() +
labs(title = "Average monthly income across joblevels by Gender",x= "JobLevel" , y = "Avg.Monthly Income in $", fill ="Gender") +
theme_light() +
theme(plot.title = element_text(hjust = 0.5)) +
scale_fill_brewer(palette="Set2", guide = guide_legend(reverse = TRUE)) +
scale_y_continuous(labels = comma,
breaks = seq(0, max_y2, by = 2500),
limits=c(0, max_y2))
fig3
Now that we know women employees make less compared to their peers in the opposite gender, we now want to understand if they have the same opportunity as men for career development in the organization.
In this visualization we are comparing the average age of men and women employees across each job level. As you can see, in 3 of the 5 job levels it takes longer for women to get to these positions compared to men. It is also to be noted that this gap is the highest at the top most job level.
avg_age <- hrdata %>%
select(Gender, Age, JobLevel) %>%
group_by(Gender, JobLevel) %>%
summarise(a_age = mean(Age), .groups = 'keep') %>%
data.frame()
avg_age
## Gender JobLevel a_age
## 1 Female 1 32.57286
## 2 Female 2 36.99091
## 3 Female 3 39.84043
## 4 Female 4 47.29412
## 5 Female 5 48.87500
## 6 Male 1 32.49709
## 7 Male 2 36.32484
## 8 Male 3 40.27419
## 9 Male 4 47.76364
## 10 Male 5 47.15556
max_y2 <- round_any(max(avg_age$a_age),5, ceiling)
fig3a <- ggplot(avg_age, aes(x = JobLevel, y = a_age, fill=Gender)) +
geom_bar(stat= "identity", position="dodge") +
# coord_flip() +
labs(title = "Average Age across joblevels by Gender",x= "JobLevel" , y = "Avg. Age in years", fill ="Gender") +
theme_light() +
theme(plot.title = element_text(hjust = 0.5)) +
scale_fill_brewer(palette="Set1", guide = guide_legend(reverse = FALSE)) +
scale_y_continuous(breaks = seq(0, max_y2, by = 5),
limits=c(0, max_y2))
fig3a
If women are underpaid and get less opportunities to go up the corporate ladder, is it possible that their educational qualification play a role in to this ? Let’s find out!
This visualization compares and contrasts the educational level of employees based on gender. It is important to note that % of women employees with a College degree or Bachelor degree or Doctorate is higher than their male peers.
In general, % of women employees with a college or higher degree (90%) is more than % of male employees with a college or higher degree(87.5%).
education_df <- hrdata %>%
select(Education, Gender) %>%
mutate(edu_description = ifelse(hrdata$Education == "1" , "Below College",
ifelse(hrdata$Education == "2", "College",
ifelse(hrdata$Education == "3", "Bachelor",
ifelse(hrdata$Education == "4", "Master", "Doctor"))))) %>%
group_by(Gender, Education, edu_description) %>%
summarise(n= length(Education), .groups ='keep') %>%
group_by(Gender) %>%
mutate(percent_of_total = round(100*n/sum(n),1)) %>%
ungroup() %>%
data.frame()
education_df
## Gender Education edu_description n percent_of_total
## 1 Female 1 Below College 60 10.2
## 2 Female 2 College 117 19.9
## 3 Female 3 Bachelor 235 40.0
## 4 Female 4 Master 154 26.2
## 5 Female 5 Doctor 22 3.7
## 6 Male 1 Below College 110 12.5
## 7 Male 2 College 165 18.7
## 8 Male 3 Bachelor 337 38.2
## 9 Male 4 Master 244 27.7
## 10 Male 5 Doctor 26 2.9
education_df$edu_description = factor(education_df$edu_description, levels = c("Below College", "College", "Bachelor",
"Master", "Doctor"))
fig4 <- ggplot(data = education_df, aes(x="", y=n, fill = edu_description)) +
geom_bar(stat="identity", position="fill") +
coord_polar(theta="y", start=0) +
labs(fill = "Education Level", x= NULL, y= NULL, title = "Education level of Employees by gender",
caption = "Source: IBM HR Analytics Employee Attrition data") +
theme_light() +
theme(plot.title = element_text(hjust = 0.5),
axis.text = element_blank(),
axis.ticks = element_blank(),
panel.grid =element_blank()) +
facet_wrap(~Gender, ncol = 2, nrow=1) +
scale_fill_brewer(palette = "Oranges") +
geom_text(aes(x=1.7, label= paste0(percent_of_total,"%")),
size=4,
position=position_fill(vjust =0.5))
fig4
We just learned that women employees have better qualifications compared to their male peers. So that cannot be a reason for being underpaid or having less opportunities to grow in an organization. Could it be their business travel preferences ?
This visualization clearly highlights that higher % of women employees travel frequently compared to men in similar job profiles that need frequent traveling. On the flip side, it also indicates that there is a higher % of male employees who don’t travel at all for business compared to women employees in the same category.
travel_df <- hrdata %>%
select(BusinessTravel, Gender) %>%
group_by(Gender, BusinessTravel) %>%
summarise(n= length(BusinessTravel), .groups ='keep') %>%
group_by(Gender) %>%
mutate(percent_of_total = round(100*n/sum(n),1)) %>%
ungroup() %>%
data.frame()
travel_df
## Gender BusinessTravel n percent_of_total
## 1 Female Non-Travel 49 8.3
## 2 Female Travel_Frequently 117 19.9
## 3 Female Travel_Rarely 422 71.8
## 4 Male Non-Travel 101 11.5
## 5 Male Travel_Frequently 160 18.1
## 6 Male Travel_Rarely 621 70.4
fig5 <- plot_ly(hole=0.7) %>%
layout(title = "Business Travel requirements for Employees by Gender") %>%
add_trace(data= travel_df[travel_df$Gender == "Female",],
labels = ~BusinessTravel,
values =~travel_df[travel_df$Gender == "Female","n"],
type ="pie",
textposition ="inside",
hovertemplate= "Gender: Female<br>Travel Type:%{label}<br>Percent:%{percent}<br>Number of Employees: %{value}<extra></extra>") %>%
add_trace(data= travel_df[travel_df$Gender == "Male",],
labels = ~BusinessTravel,
values =~travel_df[travel_df$Gender == "Male","n"],
type ="pie",
textposition ="inside",
hovertemplate= "Gender: Male<br>Travel Type:%{label}<br>Percent:%{percent}<br>Number of Employees: %{value}<extra></extra>",
domain = list(
x = c(0.16,0.84),
y = c(0.16,0.84)))
fig5
Why do we have this disparity? Is it possible that something is wrong in the system with the way Job level, Education and monthly income are tied together.
This heatmap visualization makes it clear that monthly income is directly proportional to job levels in the organization and education levels for the most part.
edu_jl <- hrdata %>%
select(Education, MonthlyIncome, JobLevel) %>%
mutate(edu_description = ifelse(hrdata$Education == "1" , "Below College",
ifelse(hrdata$Education == "2", "College",
ifelse(hrdata$Education == "3", "Bachelor",
ifelse(hrdata$Education == "4", "Master", "Doctor"))))) %>%
group_by(edu_description, JobLevel) %>%
summarise(avgincome =mean(MonthlyIncome), .groups= 'keep') %>%
data.frame()
edu_jl
## edu_description JobLevel avgincome
## 1 Bachelor 1 2720.359
## 2 Bachelor 2 5549.456
## 3 Bachelor 3 9612.020
## 4 Bachelor 4 15279.318
## 5 Bachelor 5 19151.679
## 6 Below College 1 2773.202
## 7 Below College 2 5617.000
## 8 Below College 3 10513.200
## 9 Below College 4 15183.250
## 10 Below College 5 19392.167
## 11 College 1 2771.170
## 12 College 2 5301.816
## 13 College 3 9444.121
## 14 College 4 16086.765
## 15 College 5 19043.538
## 16 Doctor 1 3127.375
## 17 Doctor 2 5427.650
## 18 Doctor 3 9506.333
## 19 Doctor 4 15590.222
## 20 Doctor 5 18943.000
## 21 Master 1 2913.785
## 22 Master 2 5578.830
## 23 Master 3 10184.586
## 24 Master 4 15566.357
## 25 Master 5 19309.200
breaks <- c(seq(0, max(edu_jl$avgincome), by=2500))
myorder <- c("Below College","College","Bachelor","Master","Doctor")
edu_jl$edu_description <- factor(edu_jl$edu_description , levels = myorder)
fig6 <- ggplot(edu_jl, aes(x=JobLevel, y=edu_description, fill=avgincome)) +
geom_tile(color ="black") +
geom_text(aes(label=comma(avgincome))) +
coord_equal(ratio=1) +
labs(title="Average Monthly Income across Job Levels & Education Levels",
x= "Job Level",
y="Education Level",
fill = "Average Monthly Income in $") +
theme_minimal() +
theme(plot.title = element_text(hjust=0.5)) +
scale_fill_continuous(low= "white", high= "orange", breaks= breaks) +
guides(fill= guide_legend(reverse= TRUE, override.aes=list(colour="black")))
fig6
ggplotly(fig6, tooltip = c("avgincome","JobLevel","edu_description")) %>%
style(hoverlabel = list(bgcolor = "yellow"))
It is very obvious from all the visualization that gender wage gap does exist in this organization. In spite of women employees in this organization being less attrition prone, holding higher education levels and more willing to travel for business, they are still compensated financially lesser and take longer for climbing the corporate ladder compared to their peers from opposite gender.
This is a systematic issue of gender inequality at workplace and we cannot turn blind eyes to it anymore. The HR policies including compensation policies should be revived with a pledge to fix gender wage gap.