IBM_HR_Data <- read.csv("/cloud/project/IBM_HR_Data.csv")
library(tidyverse)
## ── Attaching packages ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── tidyverse 1.3.0 ──
## ✓ ggplot2 3.2.1 ✓ purrr 0.3.3
## ✓ tibble 2.1.3 ✓ dplyr 0.8.3
## ✓ tidyr 1.0.0 ✓ stringr 1.4.0
## ✓ readr 1.3.1 ✓ forcats 0.4.0
## ── Conflicts ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
Are there significant differences in Job Satisfaction amongst Departments?
IBM_HR_Data%>%group_by(Department)%>%filter(DistanceFromHome>=20)%>%tally()
## # A tibble: 3 x 2
## Department n
## <fct> <int>
## 1 Human Resources 10
## 2 Research & Development 147
## 3 Sales 72
IBM_HR_Data%>%group_by(Department)%>%summarize(JobSatMean=mean(JobSatisfaction), MaxJobSat=max(JobSatisfaction), MinJobSat=min(JobSatisfaction), JobSatRange=(MaxJobSat=MinJobSat))
## # A tibble: 3 x 5
## Department JobSatMean MaxJobSat MinJobSat JobSatRange
## <fct> <dbl> <int> <int> <int>
## 1 Human Resources 2.60 4 1 1
## 2 Research & Development 2.73 4 1 1
## 3 Sales 2.75 4 1 1
Are there significant differences in Job Involvement amongst Departments?
IBM_HR_Data%>%group_by(Department)%>%summarize(JobInvolvementMean=mean(JobInvolvement))
## # A tibble: 3 x 2
## Department JobInvolvementMean
## <fct> <dbl>
## 1 Human Resources 2.75
## 2 Research & Development 2.74
## 3 Sales 2.70
To identify the number of employees in each department, I ran the following table:
IBM_HR_Data%>%select(Department)%>%table()
## .
## Human Resources Research & Development Sales
## 63 961 446
To see the levels of job involvement amongst the departments, I ran the following geometric bar graph. It showed that a majority of the employees have a level three job involvement. With the way the bar graph is set up, I am able to see how the employees as a whole identify as their level of job involvement, but it is difficult to differentiate what percentages of each department identify the job involvement.
IBM_HR_Data%>%group_by(Department)%>% select(JobInvolvement)%>%ggplot(aes(x=JobInvolvement, fill=Department))+geom_bar()
## Adding missing grouping variables: `Department`
The bar graph below shows that the percentage of job involvement amongst departments is very similar. There is a smaller percentage of employees in Sales that have a level 4 in Job Involvement, and a larger percentage of employees in Sales have a level 1 Job Involvement.
IBM_HR_Data%>%ggplot(aes(Department,fill=factor(JobInvolvement)))+geom_bar(position="fill")
With such a small variety in JobInvolvement amongst the Departments (i.e. Human Resources, Research & Development, and Sales), I’ll shift my focus to attrition differentiations amongst Departments.
IBM_HR_Data%>% group_by(Department)%>%select(Attrition)%>%table()
## Adding missing grouping variables: `Department`
## Attrition
## Department No Yes
## Human Resources 51 12
## Research & Development 828 133
## Sales 354 92
IBM_HR_Data%>%ggplot(aes(Department, fill=factor(Attrition)))+geom_bar(position="fill")
The Research & Development Department has a lower percentage of attrition, even though the number count of attrition is higher than the other departments’.
IBM_HR_Data%>%group_by(Department, JobSatisfaction, Attrition)%>% tally()%>%arrange()
## # A tibble: 24 x 4
## # Groups: Department, JobSatisfaction [12]
## Department JobSatisfaction Attrition n
## <fct> <int> <fct> <int>
## 1 Human Resources 1 No 6
## 2 Human Resources 1 Yes 5
## 3 Human Resources 2 No 18
## 4 Human Resources 2 Yes 2
## 5 Human Resources 3 No 12
## 6 Human Resources 3 Yes 3
## 7 Human Resources 4 No 15
## 8 Human Resources 4 Yes 2
## 9 Research & Development 1 No 154
## 10 Research & Development 1 Yes 38
## # … with 14 more rows
Is there a significant difference amongst females and males in the organization?
IBM_HR_Data%>%select(Gender)%>%table()
## .
## Female Male
## 588 882
IBM_HR_Data%>%ggplot(aes(Attrition, fill=Gender))+geom_bar(position=position_dodge())
One would need to focus on percentage differences in attrition in order to determine a difference due to gender.
The level of job involvement based on gender is depicted below. Both men and women have the highest tally in levels 3 and 2 followed by levels 4 and 1.
IBM_HR_Data%>%ggplot(aes(JobInvolvement, fill=Gender))+geom_bar(position=position_dodge())
IBM_HR_Data%>%group_by(Gender)%>%select(EducationField)%>%table()
## Adding missing grouping variables: `Gender`
## EducationField
## Gender Human Resources Life Sciences Marketing Medical Other Technical Degree
## Female 8 240 69 190 29 52
## Male 19 366 90 274 53 80
Although there is a large difference in the number of men and women in each field, as for the dispersement of education fields is similar amongst men and women. Most females and males have an education in Life Sciences.
IBM_HR_Data%>%group_by(Gender)%>%select(HourlyRate)%>%ggplot(aes(HourlyRate,fill=Gender))+geom_bar(position=position_dodge())
## Adding missing grouping variables: `Gender`
IBM_HR_Data%>%ggplot(aes(JobSatisfaction, fill=Gender))+geom_bar(position="fill")
The percentage difference in job satisfaction amongst men and women show that there are more men that make up the job satisfaction than there are women. However, one should not forget the difference in number of women than men in the organization (Women-588, Men- 882).