Vijay Lakshmanan Iyer (s3797863), Rohit Gupta (s3798988), Bijo B Thomas (s3758150)
Last updated: 27 October, 2019
Every organization invests a lot in hiring an employee, training them and this investment is not only subjected to money, but also time.
So it is requirement of every company to have an analysis to check or see what all parameters affect the rate of attrition and use them as a mean to improve employee satisfaction and lead to less attrition.
The most common parameters that can lead to attrition in an organization are experience in the industry and pay scale.
But a parameter, can not only affect the attrition rate but also affect other parameter. For example, gender and marriage, job roles and pay rate etc.
This report examines the mean differences in the hourly pay rate of employee belonging to two specific departments, namely - Sales, Research & Department.
This will give an idea of how much difference of pay rate is there, because of which there can be high chance of attrition in any one of the deparment.
The main objective is to examine and study the mean differences in the pay rate of the employees of Sales and Research & Department.
To proceed with our analysis, we will use two sample t-test to prove statistic significance in the research.
The prime source of the dataset used in the analysis is from Kaggle “https://www.kaggle.com/pavansubhasht/ibm-hr-analytics-attrition-dataset”
This HR analytics dataset is collected by the IBM company and has observations of the employees working in total of 3 department, namely Sales, Human Resources and Research & Department.
The dataset has 35 variables with 1470 employee records.
Few of the important variables include “Hourly Rate”, “Department”, “Job Role”, “Age”,“TotalWorkingYears” and “YearsAtCompany”
attrition <- read_csv("C:/Users/iyerv/Downloads/ibm-hr-analytics-attrition-dataset/WA_Fn-UseC_-HR-Employee-Attrition.csv")
dim(attrition)## [1] 1470 35
For the analysis, we are using two variables namely: - Deparment - Hourly Rate
In the department, we will be conducting the research for Sales and Research & Department.
#First convert Department to factor variable
attrition_new <- attrition %>% mutate(Department = factor(Department))#Filter dataset based on the required department
attrition <- attrition_new %>% filter(Department == "Sales" | Department == "Research & Development")
dim(attrition)## [1] 1407 35
## [1] Sales Research & Development
## Levels: Human Resources Research & Development Sales
hourlyRateSummary <- attrition %>% group_by(Department) %>% summarise(Min = min(HourlyRate,na.rm = TRUE),
Q1 = quantile(HourlyRate,probs = .25,na.rm = TRUE),
Median = median(HourlyRate, na.rm = TRUE),
Q3 = quantile(HourlyRate,probs = .75,na.rm = TRUE),
Max = max(HourlyRate,na.rm = TRUE),
Mean = mean(HourlyRate, na.rm = TRUE),
SD = sd(HourlyRate, na.rm = TRUE),
n = n(),
Missing = sum(is.na(HourlyRate)))
hourlyRateSummaryLooking at the descriptive statistics, it is very hard to find any difference in the hourly rate for both the department.
Let’s see the visualisation of these statistics.
attrition_RnD <- attrition %>% filter(Department == "Research & Development")
attrition_RnD$HourlyRate %>% qqPlot(dist="norm")## [1] 43 61
attrition_Sales <- attrition %>% filter(Department == "Sales")
attrition_Sales$HourlyRate %>% qqPlot(dist = "norm")## [1] 49 91
Distribution of Hourly Rate for both the Sales and Research & Department are roughly the same. It is very difficult to identify from the histrogram distribution of the hourly rate for each department.
Analysing the qq plot of hourly rate distribution for each department, shows that the distribution is not normal.
But under the central limit theorom, as the number of observations is greater than 30, we can assume that the distribution is normal.
Hypothesis \[H_0: \sigma_1 ^ 2 = \sigma_2 ^2 \] \[H_A: \sigma_1 ^2 \ne \sigma_2 ^2 \]
Looking at the statistical output, p value: 0.8746 is definitely greater than the significance level \(\alpha\) 0.05.
Thus we fail to reject the null hypothesis and assume that the variance of hourly and department are roughly same.
As we assumed the equal variances between the hourly rate and the department, we can proceed with the t test.
Hypothesis: \[Ho: \mu_1 - \mu2 = 0\]
\[Ha: \mu_1 - \mu2 \ne 0\]
##
## Two Sample t-test
##
## data: HourlyRate by Department
## t = 0.55706, df = 1405, p-value = 0.5776
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -1.632279 2.926988
## sample estimates:
## mean in group Research & Development mean in group Sales
## 66.16753 65.52018
From the t test analysis, p value computed is 0.5776, which is greater than the significance level 0.05. So we fail to reject the null hypothesis.
Also, the confidence interval for the mean difference captures the mean difference of 0. This supports previous claim to fail to reject the null hypothesis.
So the analysis to prove the mean differences in the hourly rate for Sales and Research & Department failed to give statistical evidence for the claim.
Through this analysis, we get to know that most of the employees of the Sales and Research & Development are earning roughly the same per hour and this doesn’t affect the attrition rate in the organization.
Limitations: The size of the sample, which is 1407, could have been more to gain better understanding of the effect of hourly rate of employee working in different department on the attrition.