This report presents a detailed analysis of a job satifaction dataset, aiming to explore the relationships between various factors that influence job satisfaction across different departments. This dataset includes key factors:
This analysis aims to identify trends and insights into employee satisfaction.
## Department Years Ideas Communication
## Length:32 Min. : 1.000 Min. :2.000 Min. :2.000
## Class :character 1st Qu.: 2.750 1st Qu.:3.000 1st Qu.:3.000
## Mode :character Median : 7.000 Median :4.000 Median :4.000
## Mean : 9.219 Mean :3.656 Mean :3.688
## 3rd Qu.:15.000 3rd Qu.:5.000 3rd Qu.:4.000
## Max. :32.000 Max. :5.000 Max. :5.000
## Recognition Training Conditions Tools
## Min. :1.000 Min. :2.000 Min. :2.000 Min. :3.000
## 1st Qu.:2.000 1st Qu.:3.000 1st Qu.:3.000 1st Qu.:4.000
## Median :3.000 Median :4.000 Median :4.000 Median :5.000
## Mean :3.156 Mean :3.625 Mean :3.844 Mean :4.656
## 3rd Qu.:4.000 3rd Qu.:4.250 3rd Qu.:5.000 3rd Qu.:5.000
## Max. :5.000 Max. :5.000 Max. :5.000 Max. :5.000
## Balance Satisfaction
## Min. :2.000 Min. : 3.000
## 1st Qu.:3.000 1st Qu.: 5.000
## Median :3.500 Median : 7.000
## Mean :3.625 Mean : 6.844
## 3rd Qu.:5.000 3rd Qu.: 8.250
## Max. :5.000 Max. :10.000
## 'data.frame': 32 obs. of 10 variables:
## $ Department : chr "Administrative" "Administrative" "Administrative" "Maintenance" ...
## $ Years : int 16 2 14 17 15 1 3 3 16 15 ...
## $ Ideas : int 2 4 4 5 5 5 3 2 2 2 ...
## $ Communication: int 3 4 3 4 5 4 4 2 3 3 ...
## $ Recognition : int 2 3 2 3 5 4 3 2 2 1 ...
## $ Training : int 2 4 2 5 5 3 3 2 4 4 ...
## $ Conditions : int 4 4 5 5 5 5 4 3 4 4 ...
## $ Tools : int 5 5 5 5 5 3 5 5 4 4 ...
## $ Balance : int 2 3 5 3 5 5 5 3 2 2 ...
## $ Satisfaction : int 3 9 6 8 9 9 8 3 5 4 ...
## [1] 0
The dataset underwent thorough preprocessing to ensure accuracy and readiness for analysis. First, data cleaning was performed to check for missing values across all 32 observations and 10 variables. No missing values were found, ensuring data completeness. Additionally, data types were verified to confirm that categorical variables, such as “Department,” were correctly stored as character data, while numerical variables, including satisfaction metrics, were properly formatted for statistical analysis.
Standardization and formatting were also applied to maintain consistency. The “Department” variable was kept as a character variable, while all other columns were converted into numerical formats. Outlier detection was conducted, revealing that the “Years” variable ranged from 1 to 32, indicating significant variation in employee tenure. Further analysis may be needed to assess whether extreme values (e.g., over 30 years) influence the results. Additionally, while most satisfaction-related metrics are rated on a 1–5 scale, the “Satisfaction” variable ranges from 3 to 10, suggesting potential inconsistencies in scoring methodology. To ensure uniformity, normalization or rescaling may be necessary.
Despite the dataset’s structured format, certain limitations should be addressed. The absence of explicit department sizes makes it difficult to assess whether variations in satisfaction scores are influenced by department size. Aggregating employee counts or normalizing satisfaction scores by department size would provide a clearer perspective. Additionally, the “Department” variable is stored as character data without encoding, which may limit its usability in advanced modeling techniques. Applying categorical encoding could improve the efficiency of analysis.
Furthermore, while the dataset includes “Years” as an indicator of tenure, it lacks time-based trends, making it challenging to analyze changes over time. A temporal breakdown of employee satisfaction or other metrics would enhance workforce insights. Additionally, several satisfaction categories, such as “Tools,” “Conditions,” and “Balance,” exhibit a ceiling effect, where many employees rated these factors at the highest possible score (5/5). This could either indicate genuine high satisfaction or limited variability in responses, warranting further qualitative validation.
The dataset reveals several key insights into employee experience and workplace satisfaction.
This section presents an analysis of employee data across various department and workplace dimensions.
The company operates under a production-centric structure, where the Production Department constitutes the majority of the workforce, with 17 employees. This overwhelming concentration of personnel in production indicates that the company prioritizes manufacturing or service delivery as its core function, while other departments remain significantly smaller.
This lean operational model suggests that the company emphasizes efficiency and cost control, potentially reducing overhead costs associated with larger administrative or support functions. With a minimal number of employees in other departments, functions such as finance, human resources, and marketing may be either highly streamlined, automated, or outsourced.
The histogram of satisfaction scores reveals a skewed distribution, with most values clustered in the lower and mid-range. This suggests that employee satisfaction is not uniformly distributed and may be influenced by factors such as job role, company policies, or work environment. The limited variability in responses indicates that employees may have polarized opinions, with fewer responses in the extreme high or low ranges.
The histogram of tenure exhibits a strong right-skewed distribution, with the majority of employees having short tenures and only a small fraction remaining with the company for extended periods. This pattern may indicate a high turnover rate or a workforce composed primarily of newer employees. Additionally, the presence of outliers with significantly longer tenures suggests that a minority of employees have remained with the company for decades.
The skewed nature of both distributions highlights potential challenges in employee retention and satisfaction. A deeper investigation into the underlying causes, such as workplace culture, career growth opportunities, and compensation, would be beneficial. Addressing these factors could help improve overall job satisfaction and reduce turnover rates.
The visualization presents a series of boxplots that illustrate the distribution of various job attributes. Each boxplot represents the spread, central tendency, and variability of different factors influencing job satisfaction. The analysis of these visualizations provides insight into how employees perceive different aspects of their work environment, allowing for the identification of strengths and areas needing improvement.
The distribution of Ideas appears relatively balanced, with the median positioned near the center of the interquartile range (IQR). This suggests that employees generally have a shared perception regarding the exchange of ideas in their workplace. However, some mild outliers indicate that a few employees may feel that idea-sharing opportunities are limited. The spread of the data is moderate, reflecting some variability in how different individuals experience this aspect of their work environment.
The attributes Communication and Recognition display moderate variation, as indicated by relatively compact IQRs. This suggests that most employees experience a consistent level of communication and recognition in their workplace. The whiskers extend symmetrically, implying that the distribution is normal or near-normal. Moreover, the absence of extreme outliers indicates a general consensus among employees regarding these aspects of their jobs, suggesting that communication channels and recognition programs are well-established.
In contrast, the distributions of Training and Conditions appear more balanced, with their medians positioned centrally within their respective IQRs. This implies that employees have similar experiences regarding professional training opportunities and working conditions. However, Training exhibits a slight positive skew, meaning that while most employees rate their training experiences positively, a minority perceives them as inadequate. Ensuring consistency in training programs could help bridge this gap and enhance professional development opportunities for all employees.
The attributes Tools and Balance exhibit noticeable skewness, indicating significant disparities among employees’ experiences in these areas. The distribution of Tools suggests that while many employees have access to the necessary resources, some report insufficient support, which could negatively impact their performance. Similarly, Balance, which likely refers to work-life balance, shows negative skewness, meaning that although many employees experience a moderate balance between work and personal life, a subset of respondents reports significant dissatisfaction. This could indicate excessive workloads or inadequate support for maintaining a sustainable work-life dynamic.
The overall analysis suggests that while most job attributes are perceived consistently across employees, certain areas, such as Training, Tools, and Work-Life Balance, require targeted improvements. Organizations can enhance workplace satisfaction by standardizing training opportunities, ensuring equitable access to necessary resources, and implementing flexible working arrangements to support employees struggling with work-life balance. Addressing these disparities will foster a more equitable and productive work environment, ultimately improving employee morale and performance.
This box plot of Years by Department provides valuable insights into the distribution of employee tenure across different departments, highlighting trends in workforce stability and experience levels within the organization.
This box plot of Satisfaction by Department provides key insights into employee sentiment across different areas of the organization, revealing variations in job satisfaction levels that may be influenced by factors such as tenure, job responsibilities, and workplace conditions.
This section transitions from exploratory analysis to predictive modeling, leveraging statistical and machine learning techniques to forecast job satisfaction levels. The analysis will focus on identifying relationships between key workplace factors and employee satisfaction, while evaulating the predictive power of these variables. The three methodologies will be employed:
The correlation analysis reveals several important insights regarding the relationships between various workplace factors and employee engagement outcomes. Key drivers of employee satisfaction and engagement include recognition, communication, work-life balance, and training opportunities. These factors significantly contribute to a positive workplace environment, while others such as tenure and departmental differences show more complex or varying effects.
One of the most notable findings from the analysis is the strong positive correlation between recognition and satisfaction, with a correlation coefficient of 0.837. This indicates that employees who feel recognized for their contributions tend to report much higher satisfaction levels. Similarly, communication within teams is also closely linked to employee satisfaction, with a correlation of 0.719. These results emphasize the importance of fostering a culture where employees feel valued and have clear communication channels with colleagues and management, as these factors seem to drive overall job satisfaction.
The analysis also shows a surprising negative relationship between tenure and employee engagement metrics across various factors. For example, the correlation between years of employment and ideas is -0.249, suggesting that long-term employees may contribute fewer new ideas, possibly due to feelings of stagnation or burnout. This pattern implies that employees who have been with the organization for an extended period may experience a decline in engagement, which could lead to lower productivity and creativity. Organizations may need to implement strategies to keep long-tenured employees motivated and engaged, possibly through leadership roles, new challenges, or professional development opportunities.
Another key finding is the positive correlation between work-life balance and satisfaction. The correlation of 0.709 between these two factors underscores the significance of supporting employees in maintaining a healthy work-life balance. Employees who feel they have control over their personal and professional lives are more likely to be satisfied with their jobs. Organizations that prioritize policies such as flexible working hours or remote work options may see improvements in employee happiness and engagement.
Training also plays a critical role in employee engagement, with positive correlations seen across various metrics. The correlation of 0.552 between training and recognition, for example, suggests that employees who engage in training programs are more likely to feel recognized for their efforts. Additionally, the correlation between training and communication (0.507) indicates that training opportunities can foster better interactions and relationships within teams, further enhancing employee satisfaction. Therefore, investment in training programs can not only improve employees’ skills but also contribute to higher levels of engagement and morale.
Finally, the analysis reveals different correlation patterns across departments. For instance, employees in the Department Maintenance show a moderate positive correlation between training and balance (0.337), suggesting that training initiatives may help employees in this department manage their work-life balance more effectively. In contrast, employees in the Department Production report a lower satisfaction score, with a negative correlation between satisfaction and training (-0.039). This indicates that additional strategies might be needed to boost employee engagement in this department, as the current training offerings may not be sufficient to increase satisfaction.
In conclusion, the correlation analysis highlights that recognition, communication, work-life balance, and training are critical to fostering employee satisfaction and engagement. Employees who feel recognized and maintain good communication with their peers are more likely to be satisfied with their jobs. Additionally, work-life balance and training opportunities contribute positively to engagement levels. However, longer tenure appears to be associated with a decline in engagement, signaling that organizations need to take steps to rejuvenate long-term employees’ involvement. The variation across departments also suggests that tailored strategies may be necessary to address the unique needs of each group and improve overall employee engagement.
## k-Nearest Neighbors
##
## 28 samples
## 14 predictors
##
## No pre-processing
## Resampling: Cross-Validated (10 fold)
## Summary of sample sizes: 25, 26, 25, 25, 25, 25, ...
## Resampling results across tuning parameters:
##
## k RMSE Rsquared MAE
## 5 1.141341 0.8074640 1.0000000
## 7 1.111349 0.8262237 0.9880952
## 9 1.115417 0.8201739 0.9388889
## 11 1.165789 0.7522140 1.0303030
## 13 1.178090 0.8982354 1.0474359
## 15 1.285533 0.8907228 1.1555556
## 17 1.389204 0.8502994 1.2509804
## 19 1.535875 0.9162286 1.4000000
## 21 1.691944 0.8413039 1.5595238
## 23 1.856736 0.9383157 1.7268116
##
## RMSE was used to select the optimal model using the smallest value.
## The final value used for the model was k = 7.
| Years | Ideas | Communication | Recognition | Training | Conditions | Tools | Balance | Satisfaction | DepartmentAdministrative | DepartmentMaintenance | DepartmentManagement | DepartmentProduction | DepartmentQC | DepartmentSR | prediction | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 7 | 3 | 3 | 4 | 3 | 3 | 4 | 5 | 5 | 8 | 0 | 0 | 1 | 0 | 0 | 0 | 7.285714 |
| 10 | 15 | 2 | 3 | 1 | 4 | 4 | 4 | 2 | 4 | 0 | 0 | 0 | 1 | 0 | 0 | 5.571429 |
| 20 | 15 | 5 | 4 | 3 | 4 | 3 | 5 | 3 | 7 | 0 | 0 | 0 | 1 | 0 | 0 | 7.285714 |
| 26 | 1 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 10 | 0 | 0 | 0 | 0 | 1 | 0 | 8.714286 |
## Confusion Matrix and Statistics
##
## Reference
## Prediction 4 7 8 10
## 4 0 0 0 0
## 7 0 0 0 0
## 8 0 0 0 0
## 10 0 0 0 0
##
## Overall Statistics
##
## Accuracy : NaN
## 95% CI : (NA, NA)
## No Information Rate : NA
## P-Value [Acc > NIR] : NA
##
## Kappa : NaN
##
## Mcnemar's Test P-Value : NA
##
## Statistics by Class:
##
## Class: 4 Class: 7 Class: 8 Class: 10
## Sensitivity NA NA NA NA
## Specificity NA NA NA NA
## Pos Pred Value NA NA NA NA
## Neg Pred Value NA NA NA NA
## Prevalence NaN NaN NaN NaN
## Detection Rate NaN NaN NaN NaN
## Detection Prevalence NaN NaN NaN NaN
## Balanced Accuracy NA NA NA NA
The k-Nearest Neighbors (KNN) model provides valuable insights into predicting employee satisfaction and engagement outcomes within the organization. Trained on 28 samples with 14 predictors, the model was built using 10-fold cross-validation to ensure robustness. This method evaluates relationships between factors such as recognition, communication, work-life balance, and training, helping to identify which variables have the greatest influence on employee experiences.
Performance metrics such as Root Mean Square Error (RMSE), R-squared, and Mean Absolute Error (MAE) were used to assess the model. The RMSE values ranged from 1.14 to 1.86, with lower values indicating better predictive accuracy. R-squared ranged from 0.75 to 0.94, showing that the model explains a significant portion of the variance in employee satisfaction. The MAE values ranged from 0.94 to 1.73, with lower values representing more accurate predictions. At k = 7, the model achieved its best performance, with an RMSE of 1.11, R-squared at 0.83, and MAE of 0.99.
The optimal model was selected based on the smallest RMSE value, which occurred at k = 7. This choice balances predictive accuracy with model complexity, ensuring that the model is neither overfit nor underfit. The optimal k value explains approximately 83% of the variance in employee satisfaction outcomes, demonstrating strong predictive power while maintaining generalizability across different employee groups.
In conclusion, the KNN model serves as an effective tool for predicting employee satisfaction and engagement. The model’s strong predictive power provides organizations with insights into the key factors influencing employee experiences, allowing them to implement targeted strategies for improving satisfaction and fostering a positive work environment. By selecting k = 7 as the optimal parameter, the model achieves a reliable balance between accuracy and complexity.
## Naive Bayes
##
## 28 samples
## 15 predictors
## 3 classes: 'Low', 'Medium', 'High'
##
## No pre-processing
## Resampling: Cross-Validated (5 fold)
## Summary of sample sizes: 22, 24, 21, 21, 24
## Resampling results across tuning parameters:
##
## usekernel Accuracy Kappa
## FALSE NaN NaN
## TRUE 0.9714286 0.9548387
##
## Tuning parameter 'fL' was held constant at a value of 0
## Tuning
## parameter 'adjust' was held constant at a value of 1
## Accuracy was used to select the optimal model using the largest value.
## The final values used for the model were fL = 0, usekernel = TRUE and adjust
## = 1.
## Low Medium High
## 7 3.111219e-05 9.988599e-01 1.108978e-03
## 10 9.999997e-01 2.835385e-07 1.502776e-10
## 20 1.291018e-06 9.999752e-01 2.348049e-05
## 26 1.163134e-12 2.466123e-05 9.999753e-01
## [1] Medium Low Medium High
## Levels: Low Medium High
| Years | Ideas | Communication | Recognition | Training | Conditions | Tools | Balance | Satisfaction | DepartmentAdministrative | DepartmentMaintenance | DepartmentManagement | DepartmentProduction | DepartmentQC | DepartmentSR | Satisfaction_cat | Low | Medium | High | pred.class | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 7 | 3 | 3 | 4 | 3 | 3 | 4 | 5 | 5 | 8 | 0 | 0 | 1 | 0 | 0 | 0 | Medium | 0.0000311 | 0.9988599 | 0.0011090 | Medium |
| 10 | 15 | 2 | 3 | 1 | 4 | 4 | 4 | 2 | 4 | 0 | 0 | 0 | 1 | 0 | 0 | Low | 0.9999997 | 0.0000003 | 0.0000000 | Low |
| 20 | 15 | 5 | 4 | 3 | 4 | 3 | 5 | 3 | 7 | 0 | 0 | 0 | 1 | 0 | 0 | Medium | 0.0000013 | 0.9999752 | 0.0000235 | Medium |
| 26 | 1 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 10 | 0 | 0 | 0 | 0 | 1 | 0 | High | 0.0000000 | 0.0000247 | 0.9999753 | High |
## Confusion Matrix and Statistics
##
## Reference
## Prediction Low Medium High
## Low 1 0 0
## Medium 0 2 0
## High 0 0 1
##
## Overall Statistics
##
## Accuracy : 1
## 95% CI : (0.3976, 1)
## No Information Rate : 0.5
## P-Value [Acc > NIR] : 0.0625
##
## Kappa : 1
##
## Mcnemar's Test P-Value : NA
##
## Statistics by Class:
##
## Class: Low Class: Medium Class: High
## Sensitivity 1.00 1.0 1.00
## Specificity 1.00 1.0 1.00
## Pos Pred Value 1.00 1.0 1.00
## Neg Pred Value 1.00 1.0 1.00
## Prevalence 0.25 0.5 0.25
## Detection Rate 0.25 0.5 0.25
## Detection Prevalence 0.25 0.5 0.25
## Balanced Accuracy 1.00 1.0 1.00
The Naive Bayes classification model demonstrates outstanding performance in predicting employee satisfaction levels categorized as Low, Medium, or High. Trained on 28 samples with 15 predictors and utilizing 5-fold cross-validation, the model achieved an impressive accuracy of 97.14% and a Kappa score of 0.9548. This suggests that the model is exceptionally effective at capturing the underlying patterns in the data, with the classification results showing near-perfect accuracy in predicting satisfaction levels.
The final model was tuned with a fixed value of fL = 0, usekernel = TRUE, and adjust = 1, which helped to optimize the model’s performance. The probabilities of each class (Low, Medium, and High) are calculated with very high precision, further emphasizing the model’s reliability. For example, when predicting a Medium satisfaction level, the model assigns a probability of 99.89%, while for Low and High, the probabilities are close to zero, indicating strong confidence in its classifications.
The confusion matrix shows that the model achieved perfect classification with no misclassifications across all three classes (Low, Medium, and High). The sensitivity, specificity, positive predictive value, and negative predictive value for each class were all 1.00, further confirming the model’s precision. The overall accuracy, paired with the Kappa score of 1, signifies that the model has a high level of agreement between predicted and actual satisfaction levels, ensuring reliable predictions.
In conclusion, the Naive Bayes model excels at predicting employee satisfaction levels and provides clear insights into the factors that influence satisfaction across the organization. Its high accuracy and robust performance metrics make it an invaluable tool for improving employee engagement and enhancing satisfaction-based initiatives.
This comprehensive analysis of the job satisfaction dataset offers a multifaceted understanding of the factors shaping employee experiences across departments, revealing critical insights that organizations can leverage to foster a more engaged and satisfied workforce. The study underscores the profound impact of interpersonal and organizational dynamics on job satisfaction, with recognition, communication, and work-life balance emerging as the most influential drivers. The strong positive correlation between recognition and satisfaction (r = 0.837) highlights the human need for acknowledgment; employees who feel valued for their contributions are significantly more likely to report higher job satisfaction. Similarly, effective communication (r = 0.719) reinforces trust and alignment within teams, acting as a cornerstone for collaborative success. These findings align with established organizational psychology principles, emphasizing that a culture of appreciation and transparent dialogue is not merely beneficial but essential for sustaining employee morale.
The analysis also uncovers nuanced departmental trends that demand tailored interventions. For instance, the Production Department, which constitutes the majority of the workforce (17 out of 32 employees), exhibits moderate satisfaction levels (median = 7/10), suggesting that while operational efficiency is prioritized, there may be untapped opportunities to enhance engagement through targeted initiatives. In contrast, the SR Department displays the widest variability in satisfaction scores (3–10), reflecting divergent experiences among its employees. This disparity could stem from role heterogeneity, leadership inconsistencies, or uneven workload distribution, necessitating deeper qualitative investigation. Meanwhile, the QC Department demonstrates stable satisfaction levels, likely due to structured workflows and specialized expertise, offering a model for consistency that other departments might emulate. Such variability across departments underscores the importance of avoiding one-size-fits-all strategies and instead adopting context-specific approaches to address unique challenges.
Predictive modeling further validates the dataset’s utility in forecasting satisfaction trends. The K-Nearest Neighbors (KNN) model, optimized at k = 7, achieved an RMSE of 1.11 and an R² of 0.83, demonstrating robust predictive capability. However, the Naive Bayes classifier outperformed expectations, attaining 97.14% accuracy in categorizing satisfaction into Low, Medium, and High tiers. This exceptional performance, validated by a Kappa score of 0.95, confirms the model’s reliability in identifying patterns among predictors such as recognition, training, and departmental affiliation. These models collectively provide a data-driven foundation for preemptive action, enabling organizations to identify at-risk employees and proactively address dissatisfaction triggers before they escalate.
The study also reveals counterintuitive findings, such as the mild negative correlation between tenure and engagement. Employees with longer tenure (e.g., those exceeding 15 years) reported lower scores in metrics like idea contribution (r = -0.249) and training satisfaction, suggesting potential stagnation or burnout. This trend signals a critical need for organizations to revitalize long-term employees through career development opportunities, mentorship roles, or rotational assignments that reinvigorate their sense of purpose. Additionally, while factors like working conditions and tools received high ratings (mean = 4.66/5), the presence of ceiling effects in these metrics raises questions about survey design—whether employees genuinely experience near-perfect conditions or if the scale lacks granularity to capture subtle grievances.
Despite its contributions, the study has limitations. The small sample size (32 employees) restricts generalizability, while the absence of demographic data (e.g., age, gender) and temporal trends limits the exploration of intersectional factors. Furthermore, the reliance on self-reported data introduces potential response biases, such as social desirability or acquiescence. Future research should expand the dataset longitudinally, incorporate qualitative interviews to contextualize quantitative findings, and explore the interplay between satisfaction and productivity metrics.