MATH1324 Introduction to Statistics Assignment 3

Analysis of pay rates for two different department.

Vijay Lakshmanan Iyer (s3797863), Rohit Gupta (s3798988), Bijo B Thomas (s3758150)

Last updated: 27 October, 2019

Introduction

Introduction Cont.

Problem Statement

Data

attrition <- read_csv("C:/Users/iyerv/Downloads/ibm-hr-analytics-attrition-dataset/WA_Fn-UseC_-HR-Employee-Attrition.csv")

dim(attrition)
## [1] 1470   35

Data Cont.

For the analysis, we are using two variables namely: - Deparment - Hourly Rate

In the department, we will be conducting the research for Sales and Research & Department.

#First convert Department to factor variable
attrition_new <- attrition %>% mutate(Department = factor(Department))
#Filter dataset based on the required department
attrition <- attrition_new %>% filter(Department == "Sales" | Department == "Research & Development")

dim(attrition)
## [1] 1407   35
unique(attrition$Department)
## [1] Sales                  Research & Development
## Levels: Human Resources Research & Development Sales

Descriptive Statistics and Visualisation

hourlyRateSummary <- attrition %>% group_by(Department) %>% summarise(Min = min(HourlyRate,na.rm = TRUE),
          Q1 = quantile(HourlyRate,probs = .25,na.rm = TRUE),
          Median = median(HourlyRate, na.rm = TRUE),
          Q3 = quantile(HourlyRate,probs = .75,na.rm = TRUE),
          Max = max(HourlyRate,na.rm = TRUE),
          Mean = mean(HourlyRate, na.rm = TRUE),
          SD = sd(HourlyRate, na.rm = TRUE),
          n = n(),
          Missing = sum(is.na(HourlyRate)))

hourlyRateSummary

Looking at the descriptive statistics, it is very hard to find any difference in the hourly rate for both the department.

Let’s see the visualisation of these statistics.

Decsriptive Statistics Cont.

library(lattice)
attrition %>% histogram(~HourlyRate | Department, data = .,layout=c(1,2))

attrition_RnD <- attrition %>% filter(Department == "Research & Development")
attrition_RnD$HourlyRate %>% qqPlot(dist="norm")

## [1] 43 61
attrition_Sales <- attrition %>% filter(Department == "Sales")
attrition_Sales$HourlyRate %>% qqPlot(dist = "norm")

## [1] 49 91

Hypothesis Testing

Hypothesis \[H_0: \sigma_1 ^ 2 = \sigma_2 ^2 \] \[H_A: \sigma_1 ^2 \ne \sigma_2 ^2 \]

leveneTest(HourlyRate ~ Department, data = attrition)

Hypthesis Testing Cont.

As we assumed the equal variances between the hourly rate and the department, we can proceed with the t test.

Hypothesis: \[Ho: \mu_1 - \mu2 = 0\]

\[Ha: \mu_1 - \mu2 \ne 0\]

t.test(
  HourlyRate ~ Department,
  data = attrition,
  var.equal = TRUE,
  alternative = "two.sided"
)
## 
##  Two Sample t-test
## 
## data:  HourlyRate by Department
## t = 0.55706, df = 1405, p-value = 0.5776
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -1.632279  2.926988
## sample estimates:
## mean in group Research & Development                  mean in group Sales 
##                             66.16753                             65.52018

Discussion