Job Satisfaction Survey

Introduction

The following report is an analysis of a job satisfaction survey. There were 32 respondents answering 10 questions (a portion is shown below).The objective of this survey is to look at the overall satisfaction of the employees and understand some factors that impact it. There are some areas of concern and excellence that will be mentioned. This report will look at some key graphs and provide predictive value for future surveys. The resulting confusion matrices will determine and other factors will determine the usefulness of the predictive methods.

Data sample
Department Years Ideas Communication Recognition Training Conditions Tools Balance Satisfaction
Administrative 16 2 3 2 2 4 5 2 3
Administrative 2 4 4 3 4 4 5 3 9
Administrative 14 4 3 2 2 5 5 5 6
Maintenance 17 5 4 3 5 5 5 3 8
Maintenance 15 5 5 5 5 5 5 5 9

Notes on Data

With only 32 responses, there is a limit to the quality of analysis. There are no outliers in Satisfaction nor are there any missing values in the data table. There has been some data transformation for particular instances. In every case a disclaimer and explanation is provided.

Histogram Summary

The following is a summary of the given numerical values of each attribute. Years has been binned with a bin width of 5 years.

Findings

Distribution It seems the only category where the data resembles normal distribution is recognition. The rest seem to be skewed.

Satisfaction by Department

By breaking the data down by department, department managers can better understand their individual departments. They can evaluate how it is doing in each category. They may be able to do follow up studies or customize their approach to their employees. The following graphs show

The following boxplots examine the overall satisfaction of the company by department.

A further breakdown of each departments the average scores in each scored category. To keep the graph focused, Satisfaction scores have been transformed to a scale of 0 to 5, meaning they are half of their original scores.

Findings

Areas of success: The Departments with the highest median satisfaction score were Maintenance and QC with a score of 8.5. With the exception of Balance, these two had first and second highest mean values in every category. Tools is the category which received the highest mean score from every department. The workers seem to believe they have the equipment needed for their tasks.

Areas of Concern: Administration finished with the lowest average satisfaction. Their mean Recognition score was the lowest mean category score of any department. Further investigation may be warranted. SR had the lowest median satisfaction. Management had very low training scores. This combined with other deficiencies, resulted in this department having the third lowest satisfaction. Low satisfaction among management may negatively impact the efficiency and morale of other workers. So, this should be further investigated.

Satisfaction by Tenure

Correlation Geat map

Satisfaction Level compared to mean Satisfaction
Years Ideas Communication Recognition Training Conditions Tools Balance Satisfaction
Years 1.00 -0.25 -0.38 -0.34 0.13 -0.17 0.10 -0.38 -0.32
Ideas -0.25 1.00 0.63 0.78 0.52 0.49 0.22 0.63 0.85
Communication -0.38 0.63 1.00 0.69 0.51 0.43 0.34 0.54 0.72
Recognition -0.34 0.78 0.69 1.00 0.55 0.56 0.25 0.67 0.84
Training 0.13 0.52 0.51 0.55 1.00 0.55 0.21 0.29 0.69
Conditions -0.17 0.49 0.43 0.56 0.55 1.00 0.23 0.52 0.65
Tools 0.10 0.22 0.34 0.25 0.21 0.23 1.00 0.19 0.20
Balance -0.38 0.63 0.54 0.67 0.29 0.52 0.19 1.00 0.71
Satisfaction -0.32 0.85 0.72 0.84 0.69 0.65 0.20 0.71 1.00

Findings

Tenure and Satisfaction With the is a negative correlation between years worked at the company and the score given to most of the other variables, particularly Satisfaction. Though the correlations are weak, since this is the only predetermined numerical variable for each employee, these relationships should be examined.

Satisfaction and other variables The perceived value of employee ideas and corresponding recognition are the most important variables when looking at overall satisfaction.

Prediction Methods

KNN

Translated Data

The objective of using KNN is to predict if an individual will have above average satisfaction.No partitioning of the data was done.
Satisfaction Level compared to mean Satisfaction
Department Years Ideas Communication Recognition Training Conditions Tools Balance Satisfaction.Level
Administrative 16 2 3 2 2 4 5 2 Low
Administrative 2 4 4 3 4 4 5 3 High
Administrative 14 4 3 2 2 5 5 5 Low
Maintenance 17 5 4 3 5 5 5 3 High
Maintenance 15 5 5 5 5 5 5 5 High
Management 1 5 4 4 3 5 3 5 High
Management 3 3 4 3 3 4 5 5 High
Management 3 2 2 2 2 3 5 3 Low
Production 16 2 3 2 4 4 4 2 Low
Production 15 2 3 1 4 4 4 2 Low
Production 13 3 3 3 4 4 4 3 High
Production 3 5 5 5 5 5 5 5 High
Production 6 2 2 1 3 3 4 2 Low
Production 1 5 4 4 3 4 5 5 High
Production 3 3 4 3 4 5 5 4 High
Production 2 4 4 4 4 5 5 5 High
Production 3 3 4 3 3 2 4 4 Low
Production 2 4 3 4 3 3 4 4 Low
Production 2 4 5 4 4 4 4 4 High
Production 15 5 4 3 4 3 5 3 High
Production 5 4 5 3 2 3 5 4 High
Production 8 5 5 3 5 3 5 3 High
Production 17 4 3 4 3 3 5 2 Low
Production 15 5 3 4 5 5 5 5 High
Production 5 2 4 2 2 2 5 3 Low
QC 1 5 5 5 5 5 5 5 High
QC 11 3 4 4 4 5 5 2 High
SR 21 3 2 2 3 2 4 3 Low
SR 8 3 2 2 2 2 4 2 Low
SR 32 2 3 2 4 2 5 3 Low
SR 2 5 5 5 5 5 5 5 High
SR 18 4 4 4 5 5 5 5 High
## k-Nearest Neighbors 
## 
## 32 samples
##  9 predictor
##  2 classes: 'High', 'Low' 
## 
## No pre-processing
## Resampling: Cross-Validated (7 fold, repeated 3 times) 
## Summary of sample sizes: 27, 27, 28, 28, 27, 27, ... 
## Resampling results across tuning parameters:
## 
##   k  Accuracy   Kappa    
##   5  0.8111111  0.5890887
##   7  0.7587302  0.4466835
##   9  0.6682540  0.2232878
## 
## Accuracy was used to select the optimal model using the largest value.
## The final value used for the model was k = 5.

KNN Classifier Result
Department Years Ideas Communication Recognition Training Conditions Tools Balance Satisfaction.Level prediction
Administrative 16 2 3 2 2 4 5 2 Low Low
Administrative 2 4 4 3 4 4 5 3 High High
Administrative 14 4 3 2 2 5 5 5 Low High
Maintenance 17 5 4 3 5 5 5 3 High High
Maintenance 15 5 5 5 5 5 5 5 High High
Management 1 5 4 4 3 5 3 5 High High
Management 3 3 4 3 3 4 5 5 High High
Management 3 2 2 2 2 3 5 3 Low Low
Production 16 2 3 2 4 4 4 2 Low Low
Production 15 2 3 1 4 4 4 2 Low Low
Production 13 3 3 3 4 4 4 3 High High
Production 3 5 5 5 5 5 5 5 High High
Production 6 2 2 1 3 3 4 2 Low Low
Production 1 5 4 4 3 4 5 5 High High
Production 3 3 4 3 4 5 5 4 High High
Production 2 4 4 4 4 5 5 5 High High
Production 3 3 4 3 3 2 4 4 Low Low
Production 2 4 3 4 3 3 4 4 Low High
Production 2 4 5 4 4 4 4 4 High High
Production 15 5 4 3 4 3 5 3 High High
Production 5 4 5 3 2 3 5 4 High High
Production 8 5 5 3 5 3 5 3 High High
Production 17 4 3 4 3 3 5 2 Low Low
Production 15 5 3 4 5 5 5 5 High High
Production 5 2 4 2 2 2 5 3 Low Low
QC 1 5 5 5 5 5 5 5 High High
QC 11 3 4 4 4 5 5 2 High High
SR 21 3 2 2 3 2 4 3 Low Low
SR 8 3 2 2 2 2 4 2 Low Low
SR 32 2 3 2 4 2 5 3 Low Low
SR 2 5 5 5 5 5 5 5 High High
SR 18 4 4 4 5 5 5 5 High High
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction High Low
##       High   19   2
##       Low     0  11
##                                           
##                Accuracy : 0.9375          
##                  95% CI : (0.7919, 0.9923)
##     No Information Rate : 0.5938          
##     P-Value [Acc > NIR] : 1.452e-05       
##                                           
##                   Kappa : 0.8672          
##                                           
##  Mcnemar's Test P-Value : 0.4795          
##                                           
##             Sensitivity : 1.0000          
##             Specificity : 0.8462          
##          Pos Pred Value : 0.9048          
##          Neg Pred Value : 1.0000          
##              Prevalence : 0.5938          
##          Detection Rate : 0.5938          
##    Detection Prevalence : 0.6562          
##       Balanced Accuracy : 0.9231          
##                                           
##        'Positive' Class : High            
## 

Findings

Optimal K The optimal K was 7 with a 94% accuracy rate.

ConcernsThere is an issue with using KNN due to the number of predictor variables and limited respondents. This will limit this tools ability to accurately predict. To remedy this problem, add more respondents or remove dimensions.

Naive Bayes

Translated Data

The objective of using Naive Bayes is to predict if an individual will have above average satisfaction. No partitioning of the data was done. However there has been some data transformation, where each numerical attribute is converted to a categorical value of “High” or “Low” referring to the original value in relationship to the mean score of the attribute.
Scores Compared Against Category Averages
Dep Yrs Ids Com Rec Tra Con Tls Bal Sat
Administrative High Low Low Low Low High High Low Low
Administrative Low High High Low High High High Low High
Administrative High High Low Low Low High High High Low
Maintenance High High High Low High High High Low High
Maintenance High High High High High High High High High
Management Low High High High Low High Low High High
Management Low Low High Low Low High High High High
Management Low Low Low Low Low Low High Low Low
Production High Low Low Low High High Low Low Low
Production High Low Low Low High High Low Low Low
Production High Low Low Low High High Low Low High
Production Low High High High High High High High High
Production Low Low Low Low Low Low Low Low Low
Production Low High High High Low High High High High
Production Low Low High Low High High High High High
Production Low High High High High High High High High
Production Low Low High Low Low Low Low High Low
Production Low High Low High Low Low Low High Low
Production Low High High High High High Low High High
Production High High High Low High Low High Low High
Production Low High High Low Low Low High High High
Production Low High High Low High Low High Low High
Production High High Low High Low Low High Low Low
Production High High Low High High High High High High
Production Low Low High Low Low Low High Low Low
QC Low High High High High High High High High
QC High Low High High High High High Low High
SR High Low Low Low Low Low Low Low Low
SR Low Low Low Low Low Low Low Low Low
SR High Low Low Low High Low High Low Low
SR Low High High High High High High High High
SR High High High High High High High High High

Predicted Classes and Values

Naive Bayes Result
Dep Yrs Ids Com Rec Tra Con Tls Bal Sat High Low Predicted
Administrative High Low Low Low Low High High Low Low 0.0032 0.9968 Low
Administrative Low High High Low High High High Low High 0.9794 0.0206 High
Administrative High High Low Low Low High High High Low 0.2269 0.7731 Low
Maintenance High High High Low High High High Low High 0.9999 0.0001 High
Maintenance High High High High High High High High High 1.0000 0.0000 High
Management Low High High High Low High Low High High 0.9926 0.0074 High
Management Low Low High Low Low High High High High 0.8978 0.1022 High
Management Low Low Low Low Low Low High Low Low 0.0022 0.9978 Low
Production High Low Low Low High High Low Low Low 0.0183 0.9817 Low
Production High Low Low Low High High Low Low Low 0.0183 0.9817 Low
Production High Low Low Low High High Low Low High 0.0183 0.9817 Low
Production Low High High High High High High High High 0.9999 0.0001 High
Production Low Low Low Low Low Low Low Low Low 0.0002 0.9998 Low
Production Low High High High Low High High High High 0.9983 0.0017 High
Production Low Low High Low High High High High High 0.9874 0.0126 High
Production Low High High High High High High High High 0.9999 0.0001 High
Production Low Low High Low Low Low Low High Low 0.0775 0.9225 Low
Production Low High Low High Low Low Low High Low 0.1452 0.8548 Low
Production Low High High High High High Low High High 0.9992 0.0008 High
Production High High High Low High Low High Low High 0.8497 0.1503 High
Production Low High High Low Low Low High High High 0.8673 0.1327 High
Production Low High High Low High Low High Low High 0.9188 0.0812 High
Production High High Low High Low Low High Low Low 0.0682 0.9318 Low
Production High High Low High High High High High High 0.9875 0.0125 High
Production Low Low High Low Low Low High Low Low 0.0675 0.9325 Low
QC Low High High High High High High High High 1.0000 0.0000 High
QC High Low High High High High High Low High 0.9998 0.0002 High
SR High Low Low Low Low Low Low Low Low 0.0001 0.9999 Low
SR Low Low Low Low Low Low Low Low Low 0.0001 0.9999 Low
SR High Low Low Low High Low High Low Low 0.0045 0.9955 Low
SR Low High High High High High High High High 0.9997 0.0003 High
SR High High High High High High High High High 0.9994 0.0006 High

Confusion Matrix Data

## Confusion Matrix and Statistics
## 
##           Reference
## Prediction High Low
##       High   18   0
##       Low     1  13
##                                           
##                Accuracy : 0.9688          
##                  95% CI : (0.8378, 0.9992)
##     No Information Rate : 0.5938          
##     P-Value [Acc > NIR] : 1.303e-06       
##                                           
##                   Kappa : 0.936           
##                                           
##  Mcnemar's Test P-Value : 1               
##                                           
##             Sensitivity : 0.9474          
##             Specificity : 1.0000          
##          Pos Pred Value : 1.0000          
##          Neg Pred Value : 0.9286          
##              Prevalence : 0.5938          
##          Detection Rate : 0.5625          
##    Detection Prevalence : 0.5625          
##       Balanced Accuracy : 0.9737          
##                                           
##        'Positive' Class : High            
## 

Findings

Using Naive Bayes, we can predict if an individual’s satisfaction level compared to the mean with 97% accuracy.

Concerns There are instances that a predictor category may be absent (eg. People from QC with low Satisfaction). This will limit this tools ability. To remedy this more data is needed.