To visualize the data, we will analyze the predictors of satisfaction score and number of employeers, to see if it is meaningful to predict job satisfaction.
The histogram shows how employees rated their overall job satisfaction on a scale of 1 to 10. Most responses are clustered around the mid-to-high end of the scale, with peaks around 6 to 8. A few employees reported very low or very high satisfaction.
This histogram shows that the majority of employees rate their work-life balance quite positively, mostly between 4 and 5. Very few employees gave ratings of 1 or 2, suggesting that overall work-life balance is perceived favorably across the organization
This histogram shows how many employees selected each satisfaction level by highlighting which scores are most common.
This scatter plot explores the relationship between employees’ years of service and their job satisfaction.
This scatter plot shows a clear positive relationship — employees with higher work-life balance ratings tend to report higher job satisfaction.
This boxplot shows how satisfaction varies by employee tenure. After further analysis, we can see that satisfaction looks stable, and there is a slight decline in median satisfaction as tenure increases after 15 years.
Employees who rated work-life balance higher also report a higher level od=f satisfaction. Median satisfaction rises from Balance rating 1 to 5, and this indicates a strong relationship between work-life balance and how satisfied employees feel overall.
Now we will take a look at correlation heatmaps for these same predictors, to analyze and understand if we can gain more insight on predicting job satisfaction.
This heatmap indicates the correlation between years of service and satisfaction. There is a small negative correlation between Years of Service and Satisfaction, suggesting that employees with more tenure may be slightly less satisfied on average.
This heatmap shows the strength and direction of the relationship between job satisfaction and years of service (employee tenure). The heatmap shows a weak negative correlation between employee tenure and satisfaction. This suggests that, on average, employees who have been at the company longer tend to report slightly lower satisfaction than newer employees. While the correlation is not strong, it may highlight areas to explore for long-tenured employee engagement.
This heatmap shows the relationship between employees’ perception of work-life balance and their overall job satisfaction. This heatmap reveals a strong positive correlation between Work-Life Balance and Satisfaction. Employees who rate their work-life balance highly also tend to be more satisfied with their jobs overall. This indicates that improving work-life balance could be a key driver of employee satisfaction.
## Confusion Matrix and Statistics
##
## Reference
## Prediction High Low Medium
## High 13 0 6
## Low 1 9 0
## Medium 1 0 2
##
## Overall Statistics
##
## Accuracy : 0.75
## 95% CI : (0.566, 0.8854)
## No Information Rate : 0.4688
## P-Value [Acc > NIR] : 0.001154
##
## Kappa : 0.5904
##
## Mcnemar's Test P-Value : NA
##
## Statistics by Class:
##
## Class: High Class: Low Class: Medium
## Sensitivity 0.8667 1.0000 0.25000
## Specificity 0.6471 0.9565 0.95833
## Pos Pred Value 0.6842 0.9000 0.66667
## Neg Pred Value 0.8462 1.0000 0.79310
## Prevalence 0.4688 0.2812 0.25000
## Detection Rate 0.4062 0.2812 0.06250
## Detection Prevalence 0.5938 0.3125 0.09375
## Balanced Accuracy 0.7569 0.9783 0.60417
The KNN model classified employee satisfaction into three categories: Low, Medium, and High. The confusion matrix shows how many predictions the model got right (diagonal values) and where it misclassified responses.K = 5 is used as a starting point, meaning each prediction was based on the 5 nearest observations.
The confusion matrix generated by the model shows how well the KNN classifier performed. It provides a breakdown of:
True Positives: Correctly predicted satisfaction categories
False Positives/Negatives: Misclassifications
Accuracy: The proportion of correctly classified instances out of the total
From this output, we are able to analyze how well each satisfaction level was predicted. If the high category has the highest accuracy and the low category is frequently misclassified as Medium, that may indicate overlap in how those employees responded to other survey factors.
## Confusion Matrix and Statistics
##
## Reference
## Prediction High Low Medium
## High 14 0 1
## Low 0 9 0
## Medium 1 0 7
##
## Overall Statistics
##
## Accuracy : 0.9375
## 95% CI : (0.7919, 0.9923)
## No Information Rate : 0.4688
## P-Value [Acc > NIR] : 1.991e-08
##
## Kappa : 0.9021
##
## Mcnemar's Test P-Value : NA
##
## Statistics by Class:
##
## Class: High Class: Low Class: Medium
## Sensitivity 0.9333 1.0000 0.8750
## Specificity 0.9412 1.0000 0.9583
## Pos Pred Value 0.9333 1.0000 0.8750
## Neg Pred Value 0.9412 1.0000 0.9583
## Prevalence 0.4688 0.2812 0.2500
## Detection Rate 0.4375 0.2812 0.2188
## Detection Prevalence 0.4688 0.2812 0.2500
## Balanced Accuracy 0.9373 1.0000 0.9167
In this approach, we used Naive Bayes to classify employee job satisfaction into three categories: Low, Medium, and High. All numerical predictors were binned into categorical ranges which were “Low”, “Medium”, and “High” to align with the requirements of the model and the assignment. This allowed us to shift from a regression problem to a classification task. All predictor variables—such as Years of Service, Work-Life Balance, Recognition, Communication, and others—were grouped into meaningful bins based on their value ranges (e.g., “Low”, “Medium”, “High” or ranges like “0–5”, “6–10”, etc.). These categorizations allowed the Naive Bayes algorithm to calculate the likelihood of each satisfaction level given the presence of certain input categories. This process simplifies the data structure, ensures compatibility with the Naive Bayes model’s assumptions, and provides a better understanding how specific work-related factors contribute to different levels of employee satisfaction.
The confusion matrix provides insights into: - How well the model classifies each satisfaction level - The accuracy (overall correct predictions) - Any misclassifications, especially where the model confuses Medium and High categories
These results suggest that work-related factors such as balance, recognition, and communication carry meaningful predictive power when it comes to classifying satisfaction, though some nuances between Medium and Low satisfaction groups may be harder for the model to differentiate precisely.
To conclude, this analysis explored how various factors influence employee job satisfaction, and we used predictors such as work-life balance, years of service, and the distribution of satisfaction scores across the workforce. The scatter plot between work-life balance and satisfaction revealed a strong positive correlation—employees who rated their work-life balance highly also tended to report higher levels of satisfaction. This arguement was also supported by the correlation heatmap, which indicated a clear upward trend between the two variables. Opposing this argument, the relationship between years of service and satisfaction was weaker and slightly negative, indicating that longer-tenured employees are marginally less satisfied on average.
To evaluate the predictive data-driven models we used, both K-Nearest Neighbors (KNN) and Naive Bayes classification models were applied to predict various satisfaction levels (Low, Medium, High). The KNN model was able to classify satisfaction with reasonable accuracy, by identifying high satisfaction cases well. However, it showed overlap between medium and low categories, due to very similar predictor values among those groups.
The Naive Bayes model, which was supported with the converted categorical data, provided clear analytical outputs based on conditional probabilities. Just like KNN, it was strongest at identifying highly satisfied employees, but became somewhat weaker in the mid-range. These results suggest that while both models are useful, work-life balance stands out as a consistently strong predictor of job satisfaction, and could be a key focus area for improving employee well-being. This extarcted information can be of valuble use to employers and managers.