Introduction

Every organization invests a lot in hiring an employee, training them and this investment is not only subjected to money, but also time.
So it is requirement of every company to have an analysis to check or see what all parameters affect the rate of attrition and use them as a mean to improve employee satisfaction and lead to less attrition.
The most common parameters that can lead to attrition in an organization are experience in the industry and pay scale.
But a parameter, can not only affect the attrition rate but also affect other parameter. For example, gender and marriage, job roles and pay rate etc.

Introduction Cont.

This report examines the mean differences in the hourly pay rate of employee belonging to two specific departments, namely - Sales, Research & Department.
This will give an idea of how much difference of pay rate is there, because of which there can be high chance of attrition in any one of the deparment.

Problem Statement

The main objective is to examine and study the mean differences in the pay rate of the employees of Sales and Research & Department.
To proceed with our analysis, we will use two sample t-test to prove statistic significance in the research.

Data

The prime source of the dataset used in the analysis is from Kaggle “https://www.kaggle.com/pavansubhasht/ibm-hr-analytics-attrition-dataset”
This HR analytics dataset is collected by the IBM company and has observations of the employees working in total of 3 department, namely Sales, Human Resources and Research & Department.
The dataset has 35 variables with 1470 employee records.
Few of the important variables include “Hourly Rate”, “Department”, “Job Role”, “Age”,“TotalWorkingYears” and “YearsAtCompany”

attrition <- read_csv("C:/Users/iyerv/Downloads/ibm-hr-analytics-attrition-dataset/WA_Fn-UseC_-HR-Employee-Attrition.csv")

dim(attrition)

## [1] 1470   35

Data Cont.

For the analysis, we are using two variables namely: - Deparment - Hourly Rate

In the department, we will be conducting the research for Sales and Research & Department.

#First convert Department to factor variable
attrition_new <- attrition %>% mutate(Department = factor(Department))

#Filter dataset based on the required department
attrition <- attrition_new %>% filter(Department == "Sales" | Department == "Research & Development")

dim(attrition)

## [1] 1407   35

unique(attrition$Department)

## [1] Sales                  Research & Development
## Levels: Human Resources Research & Development Sales

Descriptive Statistics and Visualisation

First we will check the descriptive statistics for the hourly rates for both Sales and Research & Development.

hourlyRateSummary <- attrition %>% group_by(Department) %>% summarise(Min = min(HourlyRate,na.rm = TRUE),
          Q1 = quantile(HourlyRate,probs = .25,na.rm = TRUE),
          Median = median(HourlyRate, na.rm = TRUE),
          Q3 = quantile(HourlyRate,probs = .75,na.rm = TRUE),
          Max = max(HourlyRate,na.rm = TRUE),
          Mean = mean(HourlyRate, na.rm = TRUE),
          SD = sd(HourlyRate, na.rm = TRUE),
          n = n(),
          Missing = sum(is.na(HourlyRate)))

hourlyRateSummary

Looking at the descriptive statistics, it is very hard to find any difference in the hourly rate for both the department.

For both the department, the minimum and maximum hourly rate are 30 and 100 respectively.
For both the department, the median is 66, but there is a slight difference in their mean. For Research & Development, mean is 66.167 and for Sales, mean is 65.520.
Even the Standard deviation doesn’t tell us the difference in distribution of hourly rates for both of the departments.

Let’s see the visualisation of these statistics.

Decsriptive Statistics Cont.

library(lattice)
attrition %>% histogram(~HourlyRate | Department, data = .,layout=c(1,2))

attrition_RnD <- attrition %>% filter(Department == "Research & Development")
attrition_RnD$HourlyRate %>% qqPlot(dist="norm")

## [1] 43 61

attrition_Sales <- attrition %>% filter(Department == "Sales")
attrition_Sales$HourlyRate %>% qqPlot(dist = "norm")

## [1] 49 91

Distribution of Hourly Rate for both the Sales and Research & Department are roughly the same. It is very difficult to identify from the histrogram distribution of the hourly rate for each department.
Analysing the qq plot of hourly rate distribution for each department, shows that the distribution is not normal.
But under the central limit theorom, as the number of observations is greater than 30, we can assume that the distribution is normal.

Hypothesis Testing

To start with the two sample t test, we will first go with the Levene Test for equal variances.

Hypothesis \[H_0: \sigma_1 ^ 2 = \sigma_2 ^2 \] \[H_A: \sigma_1 ^2 \ne \sigma_2 ^2 \]

leveneTest(HourlyRate ~ Department, data = attrition)

Looking at the statistical output, p value: 0.8746 is definitely greater than the significance level \(\alpha\) 0.05.
Thus we fail to reject the null hypothesis and assume that the variance of hourly and department are roughly same.

Hypthesis Testing Cont.

As we assumed the equal variances between the hourly rate and the department, we can proceed with the t test.

Hypothesis: \[Ho: \mu_1 - \mu2 = 0\]

\[Ha: \mu_1 - \mu2 \ne 0\]

t.test(
  HourlyRate ~ Department,
  data = attrition,
  var.equal = TRUE,
  alternative = "two.sided"
)

## 
##  Two Sample t-test
## 
## data:  HourlyRate by Department
## t = 0.55706, df = 1405, p-value = 0.5776
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -1.632279  2.926988
## sample estimates:
## mean in group Research & Development                  mean in group Sales 
##                             66.16753                             65.52018

Discussion

From the t test analysis, p value computed is 0.5776, which is greater than the significance level 0.05. So we fail to reject the null hypothesis.
Also, the confidence interval for the mean difference captures the mean difference of 0. This supports previous claim to fail to reject the null hypothesis.
So the analysis to prove the mean differences in the hourly rate for Sales and Research & Department failed to give statistical evidence for the claim.
Through this analysis, we get to know that most of the employees of the Sales and Research & Development are earning roughly the same per hour and this doesn’t affect the attrition rate in the organization.
Limitations: The size of the sample, which is 1407, could have been more to gain better understanding of the effect of hourly rate of employee working in different department on the attrition.

MATH1324 Introduction to Statistics Assignment 3

Analysis of pay rates for two different department.

RPubs link information