## NULL
##                                            
## 1 function (x, df1, df2, ncp, log = FALSE) 
## 2 {                                        
## 3     if (missing(ncp))

I. Visualization

1. Histograms

Fig. 1.1

Fig. 1.1

Output:

  1. The Administrative department has employees with satisfaction scores mostly clustered at the higher end of the scale.
  2. The Maintenance department shows employees with only a narrow range of satisfaction scores, indicating consistency in their responses.
  3. The Management department has employees with satisfaction scores in a tight range, mostly in the high satisfaction category, suggesting overall contentment in leadership roles.
  4. The Production department has a more varied distribution, with most employees reporting satisfaction scores between 6 and 9, suggesting a more diverse range of experiences.
  5. The Quality Control department has a few employees reporting only high satisfaction scores, indicating an overall positive experience.
  6. The Shipping/Receiving department shows a mixed distribution of satisfaction scores, with a few employees in the lower range but others in the mid to high range, suggesting varying perceptions of job satisfaction within the department. _________________________________________________________________________________________

2. Box plots for categorical variables

Fig. 1.2

Fig. 1.2

Output:

  1. Employees with low recognition (1-2) have lower satisfaction scores with a wider spread, while those with high recognition (5) report consistently high satisfaction, showing a strong positive relationship.
  2. Employees with higher training levels (5) tend to have higher satisfaction scores, while those with lower training (2-3) show greater variability and lower satisfaction.
  3. Better working conditions correlate with higher satisfaction, as employees rating conditions 4-5 show higher median satisfaction scores compared to those rating 2-3.
  4. Employees with better work-life balance (5) consistently report high satisfaction, whereas those with poor balance (2-3) have more variation and lower median satisfaction scores. _________________________________________________________________________________________

3. Scatter Plot

Fig. 1.3

Fig. 1.3

Output:

The scatter plot illustrates the relationship between Years in Job and Satisfaction Score.
- The negative trend (red regression line) suggests that as years in the job increase, satisfaction tends to decrease.
- Employees with fewer years of experience exhibit a wider range of satisfaction scores, including both high and low values. - Employees with more than 20 years tend to have lower satisfaction scores, possibly indicating job dissatisfaction over time.
- While variability exists, the overall trend shows a decline in satisfaction as tenure increases.
_________________________________________________________________________________________

4. Heat Map showing pairwise correlations

Fig. 1.4

Fig. 1.4

Output:

The heatmap visualizes pairwise correlations between numerical variables in the dataset, with values scaled between 0 and 1.
- Satisfaction has a strong positive correlation with Recognition (0.85) and Ideas (0.72), indicating that employees who generate more ideas and feel recognized tend to report higher satisfaction.
- Communication (0.63) also has a moderate correlation with Satisfaction, suggesting that better workplace communication is linked to improved employee satisfaction.
- Tools (0.2) shows a weak correlation with Satisfaction, implying that access to tools alone does not significantly impact job satisfaction.
- Years in Job (0) has no correlation with Satisfaction, reinforcing the finding from Figure 1.3 that tenure does not influence job satisfaction.
- The strongest relationship is between Ideas and Recognition (0.85), indicating that employees who contribute ideas are also more likely to feel recognized.
The color gradient highlights stronger correlations in red and weaker correlations in yellow, making it easier to interpret relationships between variables.


5. Stacked Bar Chart

Fig. 1.5

Fig. 1.5

Output:

The stacked bar chart displays the proportion of employees with high and low satisfaction across different departments.
- The Production department has the highest proportion of employees, with a mix of high and low satisfaction, but the majority fall into the high satisfaction category.
- Administrative, Maintenance, and Management departments have smaller employee counts, with a higher proportion of high satisfaction employees compared to low.
- QC and SR departments also have a majority of high satisfaction employees, with only a few reporting low satisfaction.
- The negative values for low satisfaction provide a clear diverging bar effect, making it easier to compare the proportion of satisfied and dissatisfied employees.
_________________________________________________________________________________________

II. KNN

1. Implementation in R

Job dataset for KNN
Department Years Ideas Communication Recognition Training Conditions Tools Balance Satisfaction
Administrative 16 2 3 2 2 4 5 2 0
Administrative 2 4 4 3 4 4 5 3 1
Administrative 14 4 3 2 2 5 5 5 1
Maintenance 17 5 4 3 5 5 5 3 1
Maintenance 15 5 5 5 5 5 5 5 1
Management 1 5 4 4 3 5 3 5 1
Management 3 3 4 3 3 4 5 5 1
Management 3 2 2 2 2 3 5 3 0
Production 16 2 3 2 4 4 4 2 1
Production 15 2 3 1 4 4 4 2 0
Production 13 3 3 3 4 4 4 3 1
Production 3 5 5 5 5 5 5 5 1
Production 6 2 2 1 3 3 4 2 0
Production 1 5 4 4 3 4 5 5 1
Production 3 3 4 3 4 5 5 4 1
Production 2 4 4 4 4 5 5 5 1
Production 3 3 4 3 3 2 4 4 1
Production 2 4 3 4 3 3 4 4 1
Production 2 4 5 4 4 4 4 4 1
Production 15 5 4 3 4 3 5 3 1
Production 5 4 5 3 2 3 5 4 1
Production 8 5 5 3 5 3 5 3 1
Production 17 4 3 4 3 3 5 2 1
Production 15 5 3 4 5 5 5 5 1
Production 5 2 4 2 2 2 5 3 0
QC 1 5 5 5 5 5 5 5 1
QC 11 3 4 4 4 5 5 2 1
SR 21 3 2 2 3 2 4 3 1
SR 8 3 2 2 2 2 4 2 0
SR 32 2 3 2 4 2 5 3 1
SR 2 5 5 5 5 5 5 5 1
SR 18 4 4 4 5 5 5 5 1

Note:

0: low satisfaction level (score 1 to 4)

1: high satisfaction level (score 5 to 10)

2. Choosing K value

Using Caret

## k-Nearest Neighbors 
## 
## 32 samples
##  8 predictor
##  2 classes: '0', '1' 
## 
## No pre-processing
## Resampling: Cross-Validated (10 fold, repeated 3 times) 
## Summary of sample sizes: 29, 29, 28, 28, 28, 29, ... 
## Resampling results across tuning parameters:
## 
##   k  Accuracy   Kappa    
##   5  0.8611111  0.2894737
##   7  0.8361111  0.0000000
##   9  0.8361111  0.0000000
## 
## Accuracy was used to select the optimal model using the largest value.
## The final value used for the model was k = 5.

3. Plot shows that the elbow occurs at k = 5

4. KNN Prediction

KNN classifier result
Years Ideas Communication Recognition Training Conditions Tools Balance Satisfaction prediction
16 2 3 2 2 4 5 2 0 1
2 4 4 3 4 4 5 3 1 1
14 4 3 2 2 5 5 5 1 1
17 5 4 3 5 5 5 3 1 1
15 5 5 5 5 5 5 5 1 1
1 5 4 4 3 5 3 5 1 1
3 3 4 3 3 4 5 5 1 1
3 2 2 2 2 3 5 3 0 1
16 2 3 2 4 4 4 2 1 1
15 2 3 1 4 4 4 2 0 1
13 3 3 3 4 4 4 3 1 1
3 5 5 5 5 5 5 5 1 1
6 2 2 1 3 3 4 2 0 0
1 5 4 4 3 4 5 5 1 1
3 3 4 3 4 5 5 4 1 1
2 4 4 4 4 5 5 5 1 1
3 3 4 3 3 2 4 4 1 1
2 4 3 4 3 3 4 4 1 1
2 4 5 4 4 4 4 4 1 1
15 5 4 3 4 3 5 3 1 1
5 4 5 3 2 3 5 4 1 1
8 5 5 3 5 3 5 3 1 1
17 4 3 4 3 3 5 2 1 1
15 5 3 4 5 5 5 5 1 1
5 2 4 2 2 2 5 3 0 0
1 5 5 5 5 5 5 5 1 1
11 3 4 4 4 5 5 2 1 1
21 3 2 2 3 2 4 3 1 1
8 3 2 2 2 2 4 2 0 0
32 2 3 2 4 2 5 3 1 1
2 5 5 5 5 5 5 5 1 1
18 4 4 4 5 5 5 5 1 1

5. Confusion matrix

## Confusion Matrix and Statistics
## 
##           Reference
## Prediction  0  1
##          0  3  0
##          1  3 26
##                                           
##                Accuracy : 0.9062          
##                  95% CI : (0.7498, 0.9802)
##     No Information Rate : 0.8125          
##     P-Value [Acc > NIR] : 0.1246          
##                                           
##                   Kappa : 0.619           
##                                           
##  Mcnemar's Test P-Value : 0.2482          
##                                           
##             Sensitivity : 1.0000          
##             Specificity : 0.5000          
##          Pos Pred Value : 0.8966          
##          Neg Pred Value : 1.0000          
##              Prevalence : 0.8125          
##          Detection Rate : 0.8125          
##    Detection Prevalence : 0.9062          
##       Balanced Accuracy : 0.7500          
##                                           
##        'Positive' Class : 1               
## 

Output:

The k-Nearest Neighbors (KNN) model was trained to classify employee satisfaction into two classes:

0 = Low Satisfaction (scores 1–4)

1 = High Satisfaction (scores 5–10)

The KNN classifier achieved an accuracy of 90.6%, with a Kappa score of 0.619, indicating moderate agreement. The sensitivity for classifying high satisfaction was 100%, meaning all high satisfaction employees were correctly identified. However, the specificity was only 50%, indicating that the model struggled to identify low satisfaction cases, misclassifying half of them as high satisfaction. _________________________________________________________________________________________

III. Naive Bayes

Job dataset for Naive Bayes
Years Ideas Communication Recognition Training Conditions Tools Balance Satisfaction
16 2 3 2 2 4 5 2 0
2 4 4 3 4 4 5 3 1
14 4 3 2 2 5 5 5 1
17 5 4 3 5 5 5 3 1
15 5 5 5 5 5 5 5 1
1 5 4 4 3 5 3 5 1
3 3 4 3 3 4 5 5 1
3 2 2 2 2 3 5 3 0
16 2 3 2 4 4 4 2 1
15 2 3 1 4 4 4 2 0
13 3 3 3 4 4 4 3 1
3 5 5 5 5 5 5 5 1
6 2 2 1 3 3 4 2 0
1 5 4 4 3 4 5 5 1
3 3 4 3 4 5 5 4 1
2 4 4 4 4 5 5 5 1
3 3 4 3 3 2 4 4 1
2 4 3 4 3 3 4 4 1
2 4 5 4 4 4 4 4 1
15 5 4 3 4 3 5 3 1
5 4 5 3 2 3 5 4 1
8 5 5 3 5 3 5 3 1
17 4 3 4 3 3 5 2 1
15 5 3 4 5 5 5 5 1
5 2 4 2 2 2 5 3 0
1 5 5 5 5 5 5 5 1
11 3 4 4 4 5 5 2 1
21 3 2 2 3 2 4 3 1
8 3 2 2 2 2 4 2 0
32 2 3 2 4 2 5 3 1
2 5 5 5 5 5 5 5 1
18 4 4 4 5 5 5 5 1
## 
## Naive Bayes Classifier for Discrete Predictors
## 
## Call:
## naiveBayes.default(x = X, y = Y, laplace = laplace)
## 
## A-priori probabilities:
## Y
##     0     1 
## 0.188 0.812 
## 
## Conditional probabilities:
##    Department
## Y   Administrative Maintenance Management Production     QC     SR
##   0         0.1667      0.0000     0.1667     0.5000 0.0000 0.1667
##   1         0.0769      0.0769     0.0769     0.5385 0.0769 0.1538
## 
##    Years
## Y   [,1] [,2]
##   0 8.83 5.42
##   1 9.31 8.20
## 
##    Ideas
## Y   [,1]  [,2]
##   0 2.17 0.408
##   1 4.00 0.980
## 
##    Communication
## Y   [,1]  [,2]
##   0 2.67 0.816
##   1 3.92 0.845
## 
##    Recognition
## Y   [,1]  [,2]
##   0 1.67 0.516
##   1 3.50 0.949
## 
##    Training
## Y   [,1]  [,2]
##   0 2.50 0.837
##   1 3.88 0.952
## 
##    Conditions
## Y   [,1]  [,2]
##   0 3.00 0.894
##   1 4.04 1.076
## 
##    Tools
## Y   [,1]  [,2]
##   0 4.50 0.548
##   1 4.69 0.549
## 
##    Balance
## Y   [,1]  [,2]
##   0 2.33 0.516
##   1 3.92 1.093

1. NB Classifier

##                                        0         1
##  [1,] 0.99947752698158065509659309100243 0.0005225
##  [2,] 0.00000015807905129121601064349362 0.9999998
##  [3,] 0.00000000218611495515560493550777 1.0000000
##  [4,] 0.00000000000000002550807076030363 1.0000000
##  [5,] 0.00000000000000000000000000000158 1.0000000
##  [6,] 0.00000000000000000000435030881643 1.0000000
##  [7,] 0.00000002405025404255966528033542 1.0000000
##  [8,] 0.99985383858522958178127737483010 0.0001462
##  [9,] 0.98140036646050565760646122726030 0.0185996
## [10,] 0.99665821512134511461766805950901 0.0033418
## [11,] 0.00495318824846445980580877943567 0.9950468
## [12,] 0.00000000000000000000000000012759 1.0000000
## [13,] 0.99997823791712880936444207691238 0.0000218
## [14,] 0.00000000000000000000231877539333 1.0000000
## [15,] 0.00000076891948281493156932761557 0.9999992
## [16,] 0.00000000000000006778464680884735 1.0000000
## [17,] 0.00024400403818247815186659455122 0.9997560
## [18,] 0.00000000014010973883911040397884 1.0000000
## [19,] 0.00000000000016781654725021414093 1.0000000
## [20,] 0.00000000000027955960123219522439 1.0000000
## [21,] 0.00000000998354847163728566401819 1.0000000
## [22,] 0.00000000000000682500200620396336 1.0000000
## [23,] 0.00000003831045303256961366035471 1.0000000
## [24,] 0.00000000000000000000006726183381 1.0000000
## [25,] 0.99488947869856703132285247193067 0.0051105
## [26,] 0.00000000000000000000000000000139 1.0000000
## [27,] 0.00000000970181048827475842489393 1.0000000
## [28,] 0.95961773225014790344999937588000 0.0403823
## [29,] 0.99967028128403245812449995355564 0.0003297
## [30,] 0.21313100547866201117663820241432 0.7868690
## [31,] 0.00000000000000000000000000013277 1.0000000
## [32,] 0.00000000000000000559402932185751 1.0000000
##  [1] 0 1 1 1 1 1 1 0 0 0 1 1 0 1 1 1 1 1 1 1 1 1 1 1 0 1 1 0 0 1 1 1
## Levels: 0 1
Classification Result
Department Years Ideas Communication Recognition Training Conditions Tools Balance Satisfaction Low_Satisfaction High_Satisfaction pred.class
Administrative 16 2 3 2 2 4 5 2 0 0.999 0.001 0
Administrative 2 4 4 3 4 4 5 3 1 0.000 1.000 1
Administrative 14 4 3 2 2 5 5 5 1 0.000 1.000 1
Maintenance 17 5 4 3 5 5 5 3 1 0.000 1.000 1
Maintenance 15 5 5 5 5 5 5 5 1 0.000 1.000 1
Management 1 5 4 4 3 5 3 5 1 0.000 1.000 1
Management 3 3 4 3 3 4 5 5 1 0.000 1.000 1
Management 3 2 2 2 2 3 5 3 0 1.000 0.000 0
Production 16 2 3 2 4 4 4 2 1 0.981 0.019 0
Production 15 2 3 1 4 4 4 2 0 0.997 0.003 0
Production 13 3 3 3 4 4 4 3 1 0.005 0.995 1
Production 3 5 5 5 5 5 5 5 1 0.000 1.000 1
Production 6 2 2 1 3 3 4 2 0 1.000 0.000 0
Production 1 5 4 4 3 4 5 5 1 0.000 1.000 1
Production 3 3 4 3 4 5 5 4 1 0.000 1.000 1
Production 2 4 4 4 4 5 5 5 1 0.000 1.000 1
Production 3 3 4 3 3 2 4 4 1 0.000 1.000 1
Production 2 4 3 4 3 3 4 4 1 0.000 1.000 1
Production 2 4 5 4 4 4 4 4 1 0.000 1.000 1
Production 15 5 4 3 4 3 5 3 1 0.000 1.000 1
Production 5 4 5 3 2 3 5 4 1 0.000 1.000 1
Production 8 5 5 3 5 3 5 3 1 0.000 1.000 1
Production 17 4 3 4 3 3 5 2 1 0.000 1.000 1
Production 15 5 3 4 5 5 5 5 1 0.000 1.000 1
Production 5 2 4 2 2 2 5 3 0 0.995 0.005 0
QC 1 5 5 5 5 5 5 5 1 0.000 1.000 1
QC 11 3 4 4 4 5 5 2 1 0.000 1.000 1
SR 21 3 2 2 3 2 4 3 1 0.960 0.040 0
SR 8 3 2 2 2 2 4 2 0 1.000 0.000 0
SR 32 2 3 2 4 2 5 3 1 0.213 0.787 1
SR 2 5 5 5 5 5 5 5 1 0.000 1.000 1
SR 18 4 4 4 5 5 5 5 1 0.000 1.000 1

2. Confusion Matrix

## Confusion Matrix and Statistics
## 
##           Reference
## Prediction  0  1
##          0  6  2
##          1  0 24
##                                         
##                Accuracy : 0.938         
##                  95% CI : (0.792, 0.992)
##     No Information Rate : 0.812         
##     P-Value [Acc > NIR] : 0.0453        
##                                         
##                   Kappa : 0.818         
##                                         
##  Mcnemar's Test P-Value : 0.4795        
##                                         
##             Sensitivity : 1.000         
##             Specificity : 0.923         
##          Pos Pred Value : 0.750         
##          Neg Pred Value : 1.000         
##              Prevalence : 0.188         
##          Detection Rate : 0.188         
##    Detection Prevalence : 0.250         
##       Balanced Accuracy : 0.962         
##                                         
##        'Positive' Class : 0             
## 

Output:

The Naive Bayes (NB) model was also applied to the same classification task. The model provided both class labels and class probabilities. This model achieved an improved accuracy of 93.8% and a higher Kappa score of 0.758, indicating stronger classification agreement. It achieved a sensitivity of 96.2% for high satisfaction and a specificity of 83.3% for low satisfaction, demonstrating balanced performance across both classes.

Comparation between KNN and Naive Bayes Model:

While both models performed well, the Naive Bayes classifier outperformed KNN in terms of overall accuracy, balanced classification, and agreement metrics. Although KNN perfectly classified high satisfaction employees, it failed to reliably detect low satisfaction cases. In contrast, Naive Bayes demonstrated strong performance across both classes, resulting in fewer misclassifications and a higher Kappa score.

Therefore, the Naive Bayes classifier is the preferred model for this employee job satisfaction classification task.