Homework1.utf8

IBM_HR_Data <- read.csv("/cloud/project/IBM_HR_Data.csv")

library(tidyverse)

## ── Attaching packages ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── tidyverse 1.3.0 ──

## ✓ ggplot2 3.2.1     ✓ purrr   0.3.3
## ✓ tibble  2.1.3     ✓ dplyr   0.8.3
## ✓ tidyr   1.0.0     ✓ stringr 1.4.0
## ✓ readr   1.3.1     ✓ forcats 0.4.0

## ── Conflicts ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()

Are there significant differences in Job Satisfaction amongst Departments?

IBM_HR_Data%>%group_by(Department)%>%filter(DistanceFromHome>=20)%>%tally()

## # A tibble: 3 x 2
##   Department                 n
##   <fct>                  <int>
## 1 Human Resources           10
## 2 Research & Development   147
## 3 Sales                     72

IBM_HR_Data%>%group_by(Department)%>%summarize(JobSatMean=mean(JobSatisfaction), MaxJobSat=max(JobSatisfaction), MinJobSat=min(JobSatisfaction), JobSatRange=(MaxJobSat=MinJobSat))

## # A tibble: 3 x 5
##   Department             JobSatMean MaxJobSat MinJobSat JobSatRange
##   <fct>                       <dbl>     <int>     <int>       <int>
## 1 Human Resources              2.60         4         1           1
## 2 Research & Development       2.73         4         1           1
## 3 Sales                        2.75         4         1           1

Are there significant differences in Job Involvement amongst Departments?

IBM_HR_Data%>%group_by(Department)%>%summarize(JobInvolvementMean=mean(JobInvolvement))

## # A tibble: 3 x 2
##   Department             JobInvolvementMean
##   <fct>                               <dbl>
## 1 Human Resources                      2.75
## 2 Research & Development               2.74
## 3 Sales                                2.70

To identify the number of employees in each department, I ran the following table:

IBM_HR_Data%>%select(Department)%>%table()

## .
##        Human Resources Research & Development                  Sales 
##                     63                    961                    446

To see the levels of job involvement amongst the departments, I ran the following geometric bar graph. It showed that a majority of the employees have a level three job involvement. With the way the bar graph is set up, I am able to see how the employees as a whole identify as their level of job involvement, but it is difficult to differentiate what percentages of each department identify the job involvement.

IBM_HR_Data%>%group_by(Department)%>% select(JobInvolvement)%>%ggplot(aes(x=JobInvolvement, fill=Department))+geom_bar()

## Adding missing grouping variables: `Department`

The bar graph below shows that the percentage of job involvement amongst departments is very similar. There is a smaller percentage of employees in Sales that have a level 4 in Job Involvement, and a larger percentage of employees in Sales have a level 1 Job Involvement.

IBM_HR_Data%>%ggplot(aes(Department,fill=factor(JobInvolvement)))+geom_bar(position="fill")

With such a small variety in JobInvolvement amongst the Departments (i.e. Human Resources, Research & Development, and Sales), I’ll shift my focus to attrition differentiations amongst Departments.

IBM_HR_Data%>% group_by(Department)%>%select(Attrition)%>%table()

## Adding missing grouping variables: `Department`

##                         Attrition
## Department                No Yes
##   Human Resources         51  12
##   Research & Development 828 133
##   Sales                  354  92

IBM_HR_Data%>%ggplot(aes(Department, fill=factor(Attrition)))+geom_bar(position="fill")

The Research & Development Department has a lower percentage of attrition, even though the number count of attrition is higher than the other departments’.

IBM_HR_Data%>%group_by(Department, JobSatisfaction, Attrition)%>% tally()%>%arrange()

## # A tibble: 24 x 4
## # Groups:   Department, JobSatisfaction [12]
##    Department             JobSatisfaction Attrition     n
##    <fct>                            <int> <fct>     <int>
##  1 Human Resources                      1 No            6
##  2 Human Resources                      1 Yes           5
##  3 Human Resources                      2 No           18
##  4 Human Resources                      2 Yes           2
##  5 Human Resources                      3 No           12
##  6 Human Resources                      3 Yes           3
##  7 Human Resources                      4 No           15
##  8 Human Resources                      4 Yes           2
##  9 Research & Development               1 No          154
## 10 Research & Development               1 Yes          38
## # … with 14 more rows

Is there a significant difference amongst females and males in the organization?

IBM_HR_Data%>%select(Gender)%>%table()

## .
## Female   Male 
##    588    882

IBM_HR_Data%>%ggplot(aes(Attrition, fill=Gender))+geom_bar(position=position_dodge())

One would need to focus on percentage differences in attrition in order to determine a difference due to gender.

The level of job involvement based on gender is depicted below. Both men and women have the highest tally in levels 3 and 2 followed by levels 4 and 1.

IBM_HR_Data%>%ggplot(aes(JobInvolvement, fill=Gender))+geom_bar(position=position_dodge())

IBM_HR_Data%>%group_by(Gender)%>%select(EducationField)%>%table()

## Adding missing grouping variables: `Gender`

##         EducationField
## Gender   Human Resources Life Sciences Marketing Medical Other Technical Degree
##   Female               8           240        69     190    29               52
##   Male                19           366        90     274    53               80

Although there is a large difference in the number of men and women in each field, as for the dispersement of education fields is similar amongst men and women. Most females and males have an education in Life Sciences.

IBM_HR_Data%>%group_by(Gender)%>%select(HourlyRate)%>%ggplot(aes(HourlyRate,fill=Gender))+geom_bar(position=position_dodge())

## Adding missing grouping variables: `Gender`

IBM_HR_Data%>%ggplot(aes(JobSatisfaction, fill=Gender))+geom_bar(position="fill")

The percentage difference in job satisfaction amongst men and women show that there are more men that make up the job satisfaction than there are women. However, one should not forget the difference in number of women than men in the organization (Women-588, Men- 882).