Job Satisfaction Assignment

A small company, ABC Company, employs 32 people. Owners have brought forward employee satisfaction as a key issue, and want us to run analysis so that they can understand it more thoroughly. We are tasked with predicting job satisfaction using the data provided. As such, satisfaction is our response variable, and the rest are our predictor variables.

Visualizations and Exploratory Analysis

Box Plot

Fig. 1.1

Within these series of box plots, we can see the levels of satisfaction within each department. Some departments have higher levels of satisfaction than others, though the data is limited in scope, so we can’t say this would remain the same with more data. Maintenance and Quality Control have the highest satisfaction levels, but they also have the lowest range due to the relative size disparity of their department compared to the total employees. The main department to examine is Production, as they make up most of the company’s employees. That department is just about average in all regards, so all we can say is that we have found nothing odd to report about its data.

Histogram

Fig. 1.2

The histogram is somewhat symmetrical but is skewed left, meaning that there are slightly more satisfied employees than ones that are unsatisfied.

Fig. 1.3

This histogram shows us that there is an incredible amount of employees that are new hires. The graph is bimodal and skewed right. Since it is not symmetrical, we use the median to evaluate the data’s centre, leading us to acknowledge the new hires as the centre of the data.

Heat Map

Fig. 1.4

The heat map provides a lot of information that can be unpacked.

For instance, we can see that years spent working actually have a negative correlation with most variables, except for just two. Those two variables are tools and training, and this is understandable, as the longer you remain employed, the more trained and better equipped you should be.

Another point we can look to is the pairwise correlation between ideas and recognition, this stemming from the fact that being an employee that creates value for the company will lead to recognition by the company. This pair is also most strongly correlated with satisfaction of any other variables. This leads me to believe that satisfaction for this set of employees comes most strongly from them feeling that they provide value whenever they sit down to work.

Scatter Plot

Fig. 1.5

Taking inspiration from the heat map analysis, we look to see the impact of recognition on satisfaction in the form of a scatter plot. We can see a trend in the plot, where an increase in recognition improves the satisfaction of the employee.

Fig. 1.6

The next point of interest is the pairing of ideas and recognition. We can see a strong trend in the plot, where an increase in perceived brilliance by management concerning the employee’s ideas improves the recognition the employee receives.

Fig. 1.7

The final scatter plot displays satisfaction by number of years worked at the company, and it is colour-coded by department. There is not much data to work with, so this analysis is limited by that factor, but generally, we can see that satisfaction is highest within the first four years of working for the company.

K-Nearest-Neighbours

## k-Nearest Neighbors 
## 
## 32 samples
##  9 predictor
##  2 classes: 'Unsatisfied', 'Satisfied' 
## 
## No pre-processing
## Resampling: Cross-Validated (10 fold, repeated 3 times) 
## Summary of sample sizes: 29, 29, 28, 29, 29, 28, ... 
## Resampling results across tuning parameters:
## 
##   k  Accuracy   Kappa    
##   5  0.8972222  0.7566667
##   7  0.7833333  0.4900000
##   9  0.6833333  0.2900000
## 
## Accuracy was used to select the optimal model using the largest value.
## The final value used for the model was k = 5.

Classification Results

department	years	ideas	communication	recognition	training	conditions	tools	balance	satisfaction	prediction
Administrative	16	2	3	2	2	4	5	2	Unsatisfied	Unsatisfied
Administrative	2	4	4	3	4	4	5	3	Satisfied	Satisfied
Administrative	14	4	3	2	2	5	5	5	Unsatisfied	Satisfied
Maintenance	17	5	4	3	5	5	5	3	Satisfied	Satisfied
Maintenance	15	5	5	5	5	5	5	5	Satisfied	Satisfied
Management	1	5	4	4	3	5	3	5	Satisfied	Satisfied
Management	3	3	4	3	3	4	5	5	Satisfied	Satisfied
Management	3	2	2	2	2	3	5	3	Unsatisfied	Unsatisfied
Production	16	2	3	2	4	4	4	2	Unsatisfied	Unsatisfied
Production	15	2	3	1	4	4	4	2	Unsatisfied	Unsatisfied
Production	13	3	3	3	4	4	4	3	Satisfied	Satisfied
Production	3	5	5	5	5	5	5	5	Satisfied	Satisfied
Production	6	2	2	1	3	3	4	2	Unsatisfied	Unsatisfied
Production	1	5	4	4	3	4	5	5	Satisfied	Satisfied
Production	3	3	4	3	4	5	5	4	Satisfied	Satisfied
Production	2	4	4	4	4	5	5	5	Satisfied	Satisfied
Production	3	3	4	3	3	2	4	4	Unsatisfied	Unsatisfied
Production	2	4	3	4	3	3	4	4	Unsatisfied	Satisfied
Production	2	4	5	4	4	4	4	4	Satisfied	Satisfied
Production	15	5	4	3	4	3	5	3	Satisfied	Satisfied
Production	5	4	5	3	2	3	5	4	Satisfied	Satisfied
Production	8	5	5	3	5	3	5	3	Satisfied	Satisfied
Production	17	4	3	4	3	3	5	2	Unsatisfied	Unsatisfied
Production	15	5	3	4	5	5	5	5	Satisfied	Satisfied
Production	5	2	4	2	2	2	5	3	Unsatisfied	Unsatisfied
QC	1	5	5	5	5	5	5	5	Satisfied	Satisfied
QC	11	3	4	4	4	5	5	2	Satisfied	Satisfied
SR	21	3	2	2	3	2	4	3	Unsatisfied	Unsatisfied
SR	8	3	2	2	2	2	4	2	Unsatisfied	Unsatisfied
SR	32	2	3	2	4	2	5	3	Unsatisfied	Unsatisfied
SR	2	5	5	5	5	5	5	5	Satisfied	Satisfied
SR	18	4	4	4	5	5	5	5	Satisfied	Satisfied

Confusion Matrix

## Confusion Matrix and Statistics
## 
##              Reference
## Prediction    Unsatisfied Satisfied
##   Unsatisfied          11         0
##   Satisfied             2        19
##                                           
##                Accuracy : 0.9375          
##                  95% CI : (0.7919, 0.9923)
##     No Information Rate : 0.5938          
##     P-Value [Acc > NIR] : 1.452e-05       
##                                           
##                   Kappa : 0.8672          
##                                           
##  Mcnemar's Test P-Value : 0.4795          
##                                           
##             Sensitivity : 1.0000          
##             Specificity : 0.8462          
##          Pos Pred Value : 0.9048          
##          Neg Pred Value : 1.0000          
##              Prevalence : 0.5938          
##          Detection Rate : 0.5938          
##    Detection Prevalence : 0.6562          
##       Balanced Accuracy : 0.9231          
##                                           
##        'Positive' Class : Satisfied       
##

KNN Model Output

The highest accuracy was found to be gained by using K = 5

The confusion matrix displays a few key factors that point to the success of the model’s predictive power. We can first look to the accuracy, which is 93.75% for this output; that is a strong value to have, as it is close to error-free. This is much higher than the NIR of 59.38%, NIR being the accuracy gained by only predicting the most frequent class.

We can also look to the p-value; we see that it is significantly lower than 5%, meaning that the model, compared to random guessing, is significantly better at predictions for this data.

The specificity and positive predictive values are 100%, and the sensitivity and negative predictive values are both above 84%. This shows that the model does very well both when seeking to predict a class correctly and when seeking to avoid false classification.

The 95% CI of (0.7919, 0.9923) is strong. By showing us the range that the model’s true accuracy may rest in after extended analysis, we become aware that the model has a relatively safe floor, and the ceiling is close to perfection as well.

The Kappa value of 0.867 is close to 1, showing that the model’s success is not due solely to chance, meaning that this success is repeatable due to the strong level of agreement between classes.

Naive Bayes

Classification Results

department	years	ideas	communication	recognition	training	conditions	tools	balance	satisfaction	Unsatisfied	Satisfied	pred.class
Administrative	Hired within last 4 years	2	3	2	2	4	5	2	Unsatisfied	1.0000000	0.0000000	Unsatisfied
Administrative	Hired more than 4 years ago	4	4	3	4	4	5	3	Satisfied	0.0142491	0.9857509	Satisfied
Administrative	Hired within last 4 years	4	3	2	2	5	5	5	Unsatisfied	0.9985680	0.0014320	Unsatisfied
Maintenance	Hired within last 4 years	5	4	3	5	5	5	3	Satisfied	0.0000000	1.0000000	Satisfied
Maintenance	Hired within last 4 years	5	5	5	5	5	5	5	Satisfied	0.0000000	1.0000000	Satisfied
Management	Hired more than 4 years ago	5	4	4	3	5	3	5	Satisfied	0.0000000	1.0000000	Satisfied
Management	Hired more than 4 years ago	3	4	3	3	4	5	5	Satisfied	0.0021911	0.9978089	Satisfied
Management	Hired more than 4 years ago	2	2	2	2	3	5	3	Unsatisfied	1.0000000	0.0000000	Unsatisfied
Production	Hired within last 4 years	2	3	2	4	4	4	2	Unsatisfied	1.0000000	0.0000000	Unsatisfied
Production	Hired within last 4 years	2	3	1	4	4	4	2	Unsatisfied	0.9999999	0.0000001	Unsatisfied
Production	Hired within last 4 years	3	3	3	4	4	4	3	Satisfied	0.7927756	0.2072244	Unsatisfied
Production	Hired more than 4 years ago	5	5	5	5	5	5	5	Satisfied	0.0000000	1.0000000	Satisfied
Production	Hired within last 4 years	2	2	1	3	3	4	2	Unsatisfied	1.0000000	0.0000000	Unsatisfied
Production	Hired more than 4 years ago	5	4	4	3	4	5	5	Satisfied	0.0000122	0.9999878	Satisfied
Production	Hired more than 4 years ago	3	4	3	4	5	5	4	Satisfied	0.0007979	0.9992021	Satisfied
Production	Hired more than 4 years ago	4	4	4	4	5	5	5	Satisfied	0.0002190	0.9997810	Satisfied
Production	Hired more than 4 years ago	3	4	3	3	2	4	4	Unsatisfied	0.9882210	0.0117790	Unsatisfied
Production	Hired more than 4 years ago	4	3	4	3	3	4	4	Unsatisfied	0.9315183	0.0684817	Unsatisfied
Production	Hired more than 4 years ago	4	5	4	4	4	4	4	Satisfied	0.0008345	0.9991655	Satisfied
Production	Hired within last 4 years	5	4	3	4	3	5	3	Satisfied	0.0000902	0.9999098	Satisfied
Production	Hired within last 4 years	4	5	3	2	3	5	4	Satisfied	0.0037451	0.9962549	Satisfied
Production	Hired within last 4 years	5	5	3	5	3	5	3	Satisfied	0.0000000	1.0000000	Satisfied
Production	Hired within last 4 years	4	3	4	3	3	5	2	Unsatisfied	0.9798306	0.0201694	Unsatisfied
Production	Hired within last 4 years	5	3	4	5	5	5	5	Satisfied	0.0000001	0.9999999	Satisfied
Production	Hired within last 4 years	2	4	2	2	2	5	3	Unsatisfied	1.0000000	0.0000000	Unsatisfied
QC	Hired more than 4 years ago	5	5	5	5	5	5	5	Satisfied	0.0000000	1.0000000	Satisfied
QC	Hired within last 4 years	3	4	4	4	5	5	2	Satisfied	0.0005646	0.9994354	Satisfied
SR	Hired within last 4 years	3	2	2	3	2	4	3	Unsatisfied	1.0000000	0.0000000	Unsatisfied
SR	Hired within last 4 years	3	2	2	2	2	4	2	Unsatisfied	1.0000000	0.0000000	Unsatisfied
SR	Hired within last 4 years	2	3	2	4	2	5	3	Unsatisfied	1.0000000	0.0000000	Unsatisfied
SR	Hired more than 4 years ago	5	5	5	5	5	5	5	Satisfied	0.0000000	1.0000000	Satisfied
SR	Hired within last 4 years	4	4	4	5	5	5	5	Satisfied	0.0000066	0.9999934	Satisfied

Confusion Matrix

## Confusion Matrix and Statistics
## 
##              Reference
## Prediction    Unsatisfied Satisfied
##   Unsatisfied          13         1
##   Satisfied             0        18
##                                           
##                Accuracy : 0.9688          
##                  95% CI : (0.8378, 0.9992)
##     No Information Rate : 0.5938          
##     P-Value [Acc > NIR] : 1.303e-06       
##                                           
##                   Kappa : 0.936           
##                                           
##  Mcnemar's Test P-Value : 1               
##                                           
##             Sensitivity : 0.9474          
##             Specificity : 1.0000          
##          Pos Pred Value : 1.0000          
##          Neg Pred Value : 0.9286          
##              Prevalence : 0.5938          
##          Detection Rate : 0.5625          
##    Detection Prevalence : 0.5625          
##       Balanced Accuracy : 0.9737          
##                                           
##        'Positive' Class : Satisfied       
##

Naive Bayes Model Output

The confusion matrix displays a few key factors that point to the success of the model’s predictive power. We can first look to the accuracy, which is 96.88% for this output; that is a very strong value to have, as it is almost error-free. This is also much higher than the NIR of 59.38%.

We can also look to the p-value; we see that it is significantly lower than 5%, meaning that the model, compared to random guessing, is significantly better at predictions for this data.

The sensitivity and negative predictive values are 100%, and the specificity and positive predictive values are both above 92%. This shows that the model does very well both when seeking to predict a class correctly and when seeking to avoid false classification.

Interestingly enough, the strengths of the Naive Bayes model are the inverse of those of the KNN model. Whereas the KNN model had perfect scores for its specificity and positive predictive values and somewhat worse ones for the rest, the Naive Bayes model had perfect scores for its sensitivity and negative predictive values and somewhat worse ones for its other performance matrix. Could this information be used somehow in use of the models to complement each other’s performance?

The 95% CI of (0.8378, 0.9992) is also strong. By showing us the range that the model’s true accuracy may rest in after extended analysis, we become aware that even if the model performs at its worst, it will still be above 83% accuracy; that is a very safe floor, and the ceiling is close to perfection as well.

The Kappa value of 0.936 is close to 1, showing that the model’s success is not due solely to chance, meaning that this success is repeatable due to the strong level of agreement between classes. This Kappa value is significantly higher than the already strong value returned in the KNN analysis.

This model looks to be slightly more reliable in analysis of this dataset, if we focus solely on predictive reliability, , than the KNN model (basing our success mostly on the general and balanced accuracy values, Kappa value, and the overall performance of the performance metrics; both have a good p-value, so that is a non-factor in comparison).

Job Satisfaction Assignment

Radoslaw Budzik

2025-03-31

Visualizations and Exploratory Analysis

Box Plot

Histogram

The histogram is somewhat symmetrical but is skewed left, meaning that there are slightly more satisfied employees than ones that are unsatisfied.

This histogram shows us that there is an incredible amount of employees that are new hires. The graph is bimodal and skewed right. Since it is not symmetrical, we use the median to evaluate the data’s centre, leading us to acknowledge the new hires as the centre of the data.

Heat Map

The heat map provides a lot of information that can be unpacked.

For instance, we can see that years spent working actually have a negative correlation with most variables, except for just two. Those two variables are tools and training, and this is understandable, as the longer you remain employed, the more trained and better equipped you should be.

Scatter Plot

Taking inspiration from the heat map analysis, we look to see the impact of recognition on satisfaction in the form of a scatter plot. We can see a trend in the plot, where an increase in recognition improves the satisfaction of the employee.

The next point of interest is the pairing of ideas and recognition. We can see a strong trend in the plot, where an increase in perceived brilliance by management concerning the employee’s ideas improves the recognition the employee receives.

K-Nearest-Neighbours

Classification Results

Confusion Matrix

KNN Model Output

The highest accuracy was found to be gained by using K = 5

We can also look to the p-value; we see that it is significantly lower than 5%, meaning that the model, compared to random guessing, is significantly better at predictions for this data.

The specificity and positive predictive values are 100%, and the sensitivity and negative predictive values are both above 84%. This shows that the model does very well both when seeking to predict a class correctly and when seeking to avoid false classification.

The 95% CI of (0.7919, 0.9923) is strong. By showing us the range that the model’s true accuracy may rest in after extended analysis, we become aware that the model has a relatively safe floor, and the ceiling is close to perfection as well.

The Kappa value of 0.867 is close to 1, showing that the model’s success is not due solely to chance, meaning that this success is repeatable due to the strong level of agreement between classes.

Naive Bayes

Classification Results

Confusion Matrix

Naive Bayes Model Output

The confusion matrix displays a few key factors that point to the success of the model’s predictive power. We can first look to the accuracy, which is 96.88% for this output; that is a very strong value to have, as it is almost error-free. This is also much higher than the NIR of 59.38%.

We can also look to the p-value; we see that it is significantly lower than 5%, meaning that the model, compared to random guessing, is significantly better at predictions for this data.

The sensitivity and negative predictive values are 100%, and the specificity and positive predictive values are both above 92%. This shows that the model does very well both when seeking to predict a class correctly and when seeking to avoid false classification.

The Kappa value of 0.936 is close to 1, showing that the model’s success is not due solely to chance, meaning that this success is repeatable due to the strong level of agreement between classes. This Kappa value is significantly higher than the already strong value returned in the KNN analysis.