Accessing factors influencing Job satisfaction

Employee Job satisfaction is of utmost importance because it influences productivity. Employers globally are recognizing human resource as a key element in determining overall organizational performance.Since performance is a key factor in realizing organizational goals and productivity, human resource managers are increasingly concerned about factors that enhance job satisfaction, and hence productivity. The purpose of this study is to identify factors that influence job satisfaction and their relationships. # The data The data set contains thirteen input variables consisting of employee demographic data such as gender, age, and employee number.Presumptive factors affecting Job satisfaction are also included, namely; Hourly rate, Education level, Hours worked in a week, job level , Performance rating, the number of years in current role etc. This study aims at accessing relationship between these factors and employee Job satisfaction.

library(readxl)
Job_data <- read_excel("C:/Users/Christine/Desktop/Rpubs/Job data.xlsx")
attach(Job_data)
head(Job_data, 3)
# A tibble: 3 x 12
  `Employee ID`   Age Gender `Education Leve~ `Hourly\r\nRate` `Weekly\r\nHour~
          <dbl> <dbl>  <dbl>            <dbl>            <dbl>            <dbl>
1             1    41      0                2               94               40
2             2    49      1                1               61               47
3             3    37      1                2               92               40
# ... with 6 more variables: `Job\r\nLevel` <dbl>,
#   `Last\r\nPerformance\r\nRating` <dbl>, `Years At Company` <dbl>, `Years
#   In\r\nCurrent\r\nRole` <dbl>, `Years Since\r\nLast\r\nPromotion` <dbl>,
#   `Job\r\nSatisfaction\r\nRating` <dbl>

Descriptives

library(tidyr)
library(dplyr)
library(ggplot2)
library(ggpubr)
names(Job_data)<-gsub("\\s","_",names(Job_data))
A<-Job_data%>%
  ggplot(aes(x=Age, y=Job__Satisfaction__Rating))+geom_point()+geom_smooth(method=lm)
B<-Job_data%>%
  ggplot(aes(x=Education_Level, y= Job__Satisfaction__Rating))+geom_point()+geom_smooth(method=lm)
C<-Job_data%>%
  ggplot(aes(x=Hourly__Rate,y= Job__Satisfaction__Rating))+geom_point()+geom_smooth(method=lm)
D<-Job_data%>%
  ggplot(aes(x=Weekly__Hours__Worked, y=Job__Satisfaction__Rating))+ geom_point()+geom_smooth(method = lm)
E<-Job_data%>%
  ggplot(aes(x=Job__Level, y=Job__Satisfaction__Rating))+ geom_point()+geom_smooth(method = lm)
F<-Job_data%>%
  ggplot(aes(x= Years_Since__Last__Promotion, y=Job__Satisfaction__Rating))+ geom_point()+geom_smooth(method = lm)
ggarrange(A, B, C, D, E,F, ncol= 3, nrow = 3)


There does not seem to have any significant relationship between Age, Education level and Job satisfaction. On the other hand, Hourly rate, number of hours worked in a week, influence job satisfaction positively. The more the years since last promotion, the lower the job satisfaction. The factors seem to have linear relationship with employee job satisfaction. In order to fit a linear model, we test out linear regression assumptions.

library(tidyverse)
library(caret)
Job_data<-na.omit(Job_data)
sample_n(Job_data,3)
# A tibble: 3 x 12
  Employee_ID   Age Gender Education_Level Hourly__Rate Weekly__Hours__~
        <dbl> <dbl>  <dbl>           <dbl>        <dbl>            <dbl>
1          35    24      1               3           61               53
2         439    35      1               3           72               45
3         552    39      0               3          179               40
# ... with 6 more variables: Job__Level <dbl>, Last__Performance__Rating <dbl>,
#   Years_At_Company <dbl>, Years_In__Current__Role <dbl>,
#   Years_Since__Last__Promotion <dbl>, Job__Satisfaction__Rating <dbl>
set.seed(123)
training.samples<-Job_data$Job__Satisfaction__Rating %>%
  createDataPartition(p=0.8,list= FALSE)
train.set<-Job_data[training.samples,]
test.set<-Job_data[-training.samples, ]

The data is split into a training set and a test set for fitting the model and prediction

model_1 <-lm(Job__Satisfaction__Rating ~ Hourly__Rate+ Weekly__Hours__Worked+ Job__Level+Years_Since__Last__Promotion, train.set)
summary(model_1)

Call:
lm(formula = Job__Satisfaction__Rating ~ Hourly__Rate + Weekly__Hours__Worked + 
    Job__Level + Years_Since__Last__Promotion, data = train.set)

Residuals:
    Min      1Q  Median      3Q     Max 
-3.2660 -0.8131  0.0469  0.8305  2.7144 

Coefficients:
                              Estimate Std. Error t value Pr(>|t|)    
(Intercept)                   4.177453   0.321358  12.999  < 2e-16 ***
Hourly__Rate                  0.028037   0.001306  21.475  < 2e-16 ***
Weekly__Hours__Worked        -0.003364   0.006985  -0.482     0.63    
Job__Level                    0.115363   0.021105   5.466 7.29e-08 ***
Years_Since__Last__Promotion -0.072722   0.015653  -4.646 4.35e-06 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1.182 on 497 degrees of freedom
Multiple R-squared:  0.5594,    Adjusted R-squared:  0.5559 
F-statistic: 157.8 on 4 and 497 DF,  p-value: < 2.2e-16


The data is fitted into a linear regression model with input variables as hourly rate, years since last promotion, job level and weekly hours worked. The model confirms our hypothesis that those are the factors that affect job satisfaction as shown by p-values of below 0.05 and Adjusted R-squared of 55 %

predicted.classes<-predict(model_1, test.set)
head(round(predicted.classes))
1 2 3 4 5 6 
9 7 5 6 6 7 
summary(aov(model_1, test.set))
                              Df Sum Sq Mean Sq F value  Pr(>F)    
Hourly__Rate                   1 201.55  201.55 181.204 < 2e-16 ***
Weekly__Hours__Worked          1   3.88    3.88   3.489 0.06426 .  
Job__Level                     1  12.01   12.01  10.795 0.00134 ** 
Years_Since__Last__Promotion   1   4.67    4.67   4.198 0.04268 *  
Residuals                    118 131.25    1.11                    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Fitting a model with only factors that strongly influence job satisfaction

Model_fit<-lm (Job__Satisfaction__Rating~Hourly__Rate+ Years_Since__Last__Promotion, Job_data)
summary(Model_fit)

Call:
lm(formula = Job__Satisfaction__Rating ~ Hourly__Rate + Years_Since__Last__Promotion, 
    data = Job_data)

Residuals:
    Min      1Q  Median      3Q     Max 
-3.2534 -0.7495  0.0492  0.7989  2.9829 

Coefficients:
                             Estimate Std. Error t value Pr(>|t|)    
(Intercept)                   4.34070    0.12651  34.310  < 2e-16 ***
Hourly__Rate                  0.02994    0.00116  25.806  < 2e-16 ***
Years_Since__Last__Promotion -0.07517    0.01454  -5.168 3.19e-07 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1.192 on 622 degrees of freedom
Multiple R-squared:  0.5419,    Adjusted R-squared:  0.5405 
F-statistic:   368 on 2 and 622 DF,  p-value: < 2.2e-16
summary(aov(Model_fit))
                              Df Sum Sq Mean Sq F value   Pr(>F)    
Hourly__Rate                   1 1008.3  1008.3  709.21  < 2e-16 ***
Years_Since__Last__Promotion   1   38.0    38.0   26.71 3.19e-07 ***
Residuals                    622  884.3     1.4                     
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

From the analysis,Hourly rate influences job satisfaction positively( p-value< 0.05) and years since last promotion affect job satisfaction inversely.From this survey, the two are the main factors that influence job satisfaction

The plots below show that the models satisfy assumptions of linear regression

plot(model_1)

plot(Model_fit)