Assignment 1: MSCI 4230 – BUSINESS ANALYTICS IN PRACTICE

The objective of this assignment is to predict job satisfaction using various predictors in the dataset. I will display this through visualizing the data, using KNN to classify satisfaction, and Naive Bayes to predict satisfaction.

About the Data

The dataset includes information on employees from different departments and factors that might affect job satisfaction. It has details like years of experience, communication, recognition, training, working conditions, tools, and work-life balance, along with job satisfaction scores. The data covers departments like Administrative, Maintenance, Management, Production, QC, and SR, making it possible to compare satisfaction across roles. Analyzing this can help find patterns and understand what influences job satisfaction.

Data Preview

Department Years Ideas Communication Recognition Training Conditions Tools Balance Satisfaction
Administrative 16 2 3 2 2 4 5 2 3
Administrative 2 4 4 3 4 4 5 3 9
Administrative 14 4 3 2 2 5 5 5 6
Maintenance 17 5 4 3 5 5 5 3 8
Maintenance 15 5 5 5 5 5 5 5 9

This is a preview of the first five rows of the dataset. The dataset consists of a mix of categorical and numerical data types. The Department column contains text values such as “Administrative,” “Maintenance,” and “Production,” which can be treated as a categorical variable. The other columns, such as Years, Ideas, Communication, Recognition, Training, Conditions, Tools, Balance, and Satisfaction, are numerical and contain ratings or numerical scores. These scores represent categories like satisfaction or skill level and could be considered ordinal data, even though they are stored as numeric. While these columns are stored as numbers, they represent rankings or ratings rather than continuous measurements.

Data Summary

Department Years Ideas Communication Recognition Training Conditions Tools Balance Satisfaction
Length:32 Min. : 1.000 Min. :2.000 Min. :2.000 Min. :1.000 Min. :2.000 Min. :2.000 Min. :3.000 Min. :2.000 Min. : 3.000
Class :character 1st Qu.: 2.750 1st Qu.:3.000 1st Qu.:3.000 1st Qu.:2.000 1st Qu.:3.000 1st Qu.:3.000 1st Qu.:4.000 1st Qu.:3.000 1st Qu.: 5.000
Mode :character Median : 7.000 Median :4.000 Median :4.000 Median :3.000 Median :4.000 Median :4.000 Median :5.000 Median :3.500 Median : 7.000
NA Mean : 9.219 Mean :3.656 Mean :3.688 Mean :3.156 Mean :3.625 Mean :3.844 Mean :4.656 Mean :3.625 Mean : 6.844
NA 3rd Qu.:15.000 3rd Qu.:5.000 3rd Qu.:4.000 3rd Qu.:4.000 3rd Qu.:4.250 3rd Qu.:5.000 3rd Qu.:5.000 3rd Qu.:5.000 3rd Qu.: 8.250
NA Max. :32.000 Max. :5.000 Max. :5.000 Max. :5.000 Max. :5.000 Max. :5.000 Max. :5.000 Max. :5.000 Max. :10.000

The summary of the dataset reveals employee ratings across various work-related factors, such as communication, recognition, and training. The Department column consists of categorical data with 32 entries from different departments. Numerical columns like Years range from 1 to 32, with an average of about 9.22 years of experience. Ratings for work factors, on a scale of 1 to 5, show that most employees report moderately positive experiences. Satisfaction, with a mean of 6.84 and a maximum of 10, indicates generally positive feedback. Other factors, like Tools and Conditions, have higher ratings, with employees feeling well-equipped and satisfied with their working conditions.

Quality of Data

The data is not missing any values. It includes a good amount of predictors though there are a few columns that should have been included that would affect satisfaction that are not mentioned. These include health and safety, company values, relationship with managers, pay scale, career development, and challenges.

A) Data Visualisation

1. Histogram of Job Satisfaction

This histogram displays the distribution of job satisfaction scores, ranging from 1 (least satisfied) to 10 (most satisfied). The distribution is left-skewed, indicating that most employees report relatively high satisfaction levels. The majority of scores cluster between 7 and 8, suggesting that a significant portion of employees are generally satisfied with their jobs. The left skewness further highlights that fewer employees report very low satisfaction scores, making dissatisfaction less common in this dataset.

2. Boxplot of Job Satisfaction by Department

The box plot of job satisfaction by department provides insight into how satisfaction levels vary across different roles. It highlights that QC and Maintenance have the highest overall job satisfaction scores, indicating that employees in these departments tend to be more satisfied with their work environment. Additionally, Management has the highest median satisfaction level, suggesting that, on average, management employees report greater satisfaction compared to other departments. This visualization helps identify trends in job satisfaction across various roles, which can be useful for targeted improvements.

3. Recognition vs. Satisfaction

As an employee receives more recognition, their satusfaction levels go up, which is expected.

4. Correlation Heatmap

This correlation heatmap illustrates the strength and direction of relationships among different variables. A key takeaway is that there is a positive relationship between satisfaction and recognition, meaning that as recognition increases, so does job satisfaction. On the other hand, the relationship between satisfaction and tools shows little correlation, suggesting that the availability of tools does not have a strong impact on job satisfaction. This visualization helps identify which factors are most closely related to satisfaction..

5. Scatter Plot: Years of Experience vs. Job Satisfaction

This scatter plot shows a negative linear relationship between years of experience and job satisfaction. This finding was surprising, as I initially assumed that gaining experience would lead to higher comfort and satisfaction in the workplace. However, the plot suggests the opposite: as employees gain more experience, their satisfaction tends to decrease. This could indicate that with more experience, employees develop higher expectations and may become dissatisfied with what previously satisfied them. This trend challenges the assumption that experience always leads to increased job satisfaction.

6. Average Satisfaction by Communication Level

The higher the communication level the more satisfied an employee is, which is expected.

7. Job Satisfaction by Ideas Contribution Level

The higher the contribution level, the more satisfied the employee is which is expected.

8. Tools vs. Satisfaction

As the tools an employee has increases, so does their satisfaction.

9. Satisfaction by Training Level

The more training an employee has, the more satisfied they are.

B) KNN to predict and classify Satisfaction

This analysis evaluates a KNN model’s performance in predicting job satisfaction levels (Low, Medium, High) based on various workplace factors. The model’s accuracy, precision, and balanced accuracy are assessed to determine its effectiveness, with a focus on identifying potential weaknesses in classifying Low satisfaction cases.

1. Dataset

Job dataset for KNN
Department Years Ideas Communication Recognition Training Conditions Tools Balance Satisfaction
Administrative 16 2 3 2 2 4 5 2 Low
Administrative 2 4 4 3 4 4 5 3 High
Administrative 14 4 3 2 2 5 5 5 Medium
Maintenance 17 5 4 3 5 5 5 3 High
Maintenance 15 5 5 5 5 5 5 5 High
Management 1 5 4 4 3 5 3 5 High
Management 3 3 4 3 3 4 5 5 High
Management 3 2 2 2 2 3 5 3 Low
Production 16 2 3 2 4 4 4 2 Medium
Production 15 2 3 1 4 4 4 2 Medium
Production 13 3 3 3 4 4 4 3 High
Production 3 5 5 5 5 5 5 5 High
Production 6 2 2 1 3 3 4 2 Medium
Production 1 5 4 4 3 4 5 5 High
Production 3 3 4 3 4 5 5 4 Medium
Production 2 4 4 4 4 5 5 5 High
Production 3 3 4 3 3 2 4 4 Medium
Production 2 4 3 4 3 3 4 4 Medium
Production 2 4 5 4 4 4 4 4 High
Production 15 5 4 3 4 3 5 3 Medium
Production 5 4 5 3 2 3 5 4 Medium
Production 8 5 5 3 5 3 5 3 High
Production 17 4 3 4 3 3 5 2 Medium
Production 15 5 3 4 5 5 5 5 High
Production 5 2 4 2 2 2 5 3 Low
QC 1 5 5 5 5 5 5 5 High
QC 11 3 4 4 4 5 5 2 Medium
SR 21 3 2 2 3 2 4 3 Medium
SR 8 3 2 2 2 2 4 2 Medium
SR 32 2 3 2 4 2 5 3 Medium
SR 2 5 5 5 5 5 5 5 High
SR 18 4 4 4 5 5 5 5 High

The Job dataset for KNN consists of data collected from employees across various departments, with the goal of predicting job satisfaction. The dataset includes columns for different features that may impact satisfaction, such as Years of experience, Ideas, Communication, Recognition, Training, Conditions, Tools, and Work-Life Balance. The Satisfaction column is the target variable, with three levels: Low, Medium, and High.

The dataset covers several departments, including Administrative, Maintenance, Management, Production, QC, and SR, providing a diverse set of roles. It contains 32 rows of employee data, with satisfaction levels distributed across the dataset, reflecting various combinations of job-related factors.

This dataset is used to train a KNN model, with the goal of understanding how these factors influence job satisfaction across different departments.

2. Choosing K value

## k-Nearest Neighbors 
## 
## 32 samples
##  9 predictor
##  3 classes: 'Low', 'Medium', 'High' 
## 
## No pre-processing
## Resampling: Cross-Validated (10 fold, repeated 3 times) 
## Summary of sample sizes: 29, 29, 28, 29, 28, 30, ... 
## Resampling results across tuning parameters:
## 
##   k  Accuracy   Kappa    
##   5  0.7100000  0.4744781
##   7  0.6544444  0.3883502
##   9  0.5516667  0.1687205
## 
## Accuracy was used to select the optimal model using the largest value.
## The final value used for the model was k = 5.

The final value used for the model was k = 5.

3. Plot

The plot shows the elbow occurs at 7.

4. Prediction

KNN classifier result
Department Years Ideas Communication Recognition Training Conditions Tools Balance Satisfaction prediction
Administrative 16 2 3 2 2 4 5 2 Low Medium
Administrative 2 4 4 3 4 4 5 3 High High
Administrative 14 4 3 2 2 5 5 5 Medium Medium
Maintenance 17 5 4 3 5 5 5 3 High High
Maintenance 15 5 5 5 5 5 5 5 High High
Management 1 5 4 4 3 5 3 5 High High
Management 3 3 4 3 3 4 5 5 High High
Management 3 2 2 2 2 3 5 3 Low Medium
Production 16 2 3 2 4 4 4 2 Medium Medium
Production 15 2 3 1 4 4 4 2 Medium Medium
Production 13 3 3 3 4 4 4 3 High Medium
Production 3 5 5 5 5 5 5 5 High High
Production 6 2 2 1 3 3 4 2 Medium Medium
Production 1 5 4 4 3 4 5 5 High High
Production 3 3 4 3 4 5 5 4 Medium High
Production 2 4 4 4 4 5 5 5 High High
Production 3 3 4 3 3 2 4 4 Medium Medium
Production 2 4 3 4 3 3 4 4 Medium High
Production 2 4 5 4 4 4 4 4 High High
Production 15 5 4 3 4 3 5 3 Medium High
Production 5 4 5 3 2 3 5 4 Medium Medium
Production 8 5 5 3 5 3 5 3 High Medium
Production 17 4 3 4 3 3 5 2 Medium Medium
Production 15 5 3 4 5 5 5 5 High High
Production 5 2 4 2 2 2 5 3 Low Medium
QC 1 5 5 5 5 5 5 5 High High
QC 11 3 4 4 4 5 5 2 Medium Medium
SR 21 3 2 2 3 2 4 3 Medium Medium
SR 8 3 2 2 2 2 4 2 Medium Medium
SR 32 2 3 2 4 2 5 3 Medium Medium
SR 2 5 5 5 5 5 5 5 High High
SR 18 4 4 4 5 5 5 5 High High

The prediction seems to be unable to predict low values while mire accurately predicting medium and high values.

5. Confusion Matrix

## Confusion Matrix and Statistics
## 
##           Reference
## Prediction Low Medium High
##     Low      0      0    0
##     Medium   3     11    2
##     High     0      3   13
## 
## Overall Statistics
##                                          
##                Accuracy : 0.75           
##                  95% CI : (0.566, 0.8854)
##     No Information Rate : 0.4688         
##     P-Value [Acc > NIR] : 0.001154       
##                                          
##                   Kappa : 0.5429         
##                                          
##  Mcnemar's Test P-Value : NA             
## 
## Statistics by Class:
## 
##                      Class: Low Class: Medium Class: High
## Sensitivity             0.00000        0.7857      0.8667
## Specificity             1.00000        0.7222      0.8235
## Pos Pred Value              NaN        0.6875      0.8125
## Neg Pred Value          0.90625        0.8125      0.8750
## Prevalence              0.09375        0.4375      0.4688
## Detection Rate          0.00000        0.3438      0.4062
## Detection Prevalence    0.00000        0.5000      0.5000
## Balanced Accuracy       0.50000        0.7540      0.8451

6. Output Discussion

This confusion matrix reveals that this model is 75% accurate and was able to predict most satisfaction levels. The Kappa level is 0.5429 shows a decent level of agreement between predicted and actual values. The sensitivity for class low is 0 which means the model was not able to correctly identify any of the satisfaction levels in thus category. 78.57% of the Medium satisfaction instances were correctly identified revealed by a sensitivity level of 0.7857. 86.67% of the High satisfaction instances were correctly identified revealed by a sensitivity level of 0.8667. As for specificity, The model correctly identified all non-Low instances as not Low which is because it did not classify any as low. The model correctly identified 72.22% of non-Medium instances as not Medium. 82.35% of non-High instances were correctly identified as not High.Precision: The model correctly predicts Medium satisfaction 68.75% of the time and High satisfaction 81.25% of the time. It never predicts Low, so its precision for Low is NaN. Negative Predictive Value (NPV): The model correctly identifies non-Low cases 90.63% of the time, non-Medium cases 81.25% of the time, and non-High cases 87.5% of the time. Balanced Accuracy: The model performs best for High (84.51%), followed by Medium (75.4%). Low has a 50% balanced accuracy, meaning the model struggles to detect Low satisfaction. The model performs well for predicting High and Medium satisfaction but completely fails to predict Low satisfaction, indicating a need for better class balance or feature adjustments.

C) Naive Bayes to predict Satisfaction

Naive Bayes is a probabilistic classifier that predicts job satisfaction by calculating the probability of each satisfaction level (Low, Medium, High) based on features like communication, recognition, and training. It assumes that the features are independent given the satisfaction level, making it a simple yet effective method for classification.

1. Dataset

This output shows the data with all categorical variables.

Job Dataset for Naive Bayes
Department Years Ideas Communication Recognition Training Conditions Tools Balance Satisfaction
Administrative 16-20 Low Medium Low Low High High Low Low
Administrative 1-5 Medium High Medium High High High Medium High
Administrative 11-15 Medium Medium Low Low High High High Medium
Maintenance 16-20 High High Medium High High High Medium High
Maintenance 11-15 High High High High High High High High
Management 1-5 High High High Medium High Medium High High
Management 1-5 Medium High Medium Medium High High High High
Management 1-5 Low Low Low Low Medium High Medium Low
Production 16-20 Low Medium Low High High High Low Medium
Production 11-15 Low Medium Low High High High Low Medium
Production 11-15 Medium Medium Medium High High High Medium High
Production 1-5 High High High High High High High High
Production 6-10 Low Low Low Medium Medium High Low Medium
Production 1-5 High High High Medium High High High High
Production 1-5 Medium High Medium High High High High Medium
Production 1-5 Medium High High High High High High High
Production 1-5 Medium High Medium Medium Low High High Medium
Production 1-5 Medium Medium High Medium Medium High High Medium
Production 1-5 Medium High High High High High High High
Production 11-15 High High Medium High Medium High Medium Medium
Production 1-5 Medium High Medium Low Medium High High Medium
Production 6-10 High High Medium High Medium High Medium High
Production 16-20 Medium Medium High Medium Medium High Low Medium
Production 11-15 High Medium High High High High High High
Production 1-5 Low High Low Low Low High Medium Low
QC 1-5 High High High High High High High High
QC 11-15 Medium High High High High High Low Medium
SR 21+ Medium Low Low Medium Low High Medium Medium
SR 6-10 Medium Low Low Low Low High Low Medium
SR 21+ Low Medium Low High Low High Medium Medium
SR 1-5 High High High High High High High High
SR 16-20 Medium High High High High High High High

2. Two types of probabilities

## 
## Naive Bayes Classifier for Discrete Predictors
## 
## Call:
## naiveBayes.default(x = X, y = Y, laplace = laplace)
## 
## A-priori probabilities:
## Y
##     Low  Medium    High 
## 0.09375 0.43750 0.46875 
## 
## Conditional probabilities:
##         Department
## Y        Administrative Maintenance Management Production         QC         SR
##   Low        0.33333333  0.00000000 0.33333333 0.33333333 0.00000000 0.00000000
##   Medium     0.07142857  0.00000000 0.00000000 0.64285714 0.07142857 0.21428571
##   High       0.06666667  0.13333333 0.13333333 0.46666667 0.06666667 0.13333333
## 
##         Years
## Y               1-5       6-10      11-15      16-20        21+
##   Low    0.66666667 0.00000000 0.00000000 0.33333333 0.00000000
##   Medium 0.28571429 0.14285714 0.28571429 0.14285714 0.14285714
##   High   0.60000000 0.06666667 0.20000000 0.13333333 0.00000000
## 
##         Ideas
## Y               Low     Medium       High
##   Low    1.00000000 0.00000000 0.00000000
##   Medium 0.28571429 0.64285714 0.07142857
##   High   0.00000000 0.40000000 0.60000000
## 
##         Communication
## Y              Low    Medium      High
##   Low    0.3333333 0.3333333 0.3333333
##   Medium 0.2142857 0.4285714 0.3571429
##   High   0.0000000 0.1333333 0.8666667
## 
##         Recognition
## Y              Low    Medium      High
##   Low    1.0000000 0.0000000 0.0000000
##   Medium 0.5000000 0.2857143 0.2142857
##   High   0.0000000 0.3333333 0.6666667
## 
##         Training
## Y              Low    Medium      High
##   Low    1.0000000 0.0000000 0.0000000
##   Medium 0.2142857 0.3571429 0.4285714
##   High   0.0000000 0.2000000 0.8000000
## 
##         Conditions
## Y               Low     Medium       High
##   Low    0.33333333 0.33333333 0.33333333
##   Medium 0.28571429 0.35714286 0.35714286
##   High   0.00000000 0.06666667 0.93333333
## 
##         Tools
## Y               Low     Medium       High
##   Low    0.00000000 0.00000000 1.00000000
##   Medium 0.00000000 0.00000000 1.00000000
##   High   0.00000000 0.06666667 0.93333333
## 
##         Balance
## Y              Low    Medium      High
##   Low    0.3333333 0.6666667 0.0000000
##   Medium 0.4285714 0.2142857 0.3571429
##   High   0.0000000 0.2666667 0.7333333

Now we want to use the NB Classifier to classify companies based on their predictors. We will use the whole dataset.

3. NB Classifier

##                Low       Medium         High
##  [1,] 9.772912e-01 2.270882e-02 1.225914e-12
##  [2,] 3.658135e-09 4.553691e-02 9.544631e-01
##  [3,] 1.032852e-07 9.999936e-01 6.334088e-06
##  [4,] 8.623011e-12 5.565793e-05 9.999443e-01
##  [5,] 4.703643e-18 1.686669e-05 9.999831e-01
##  [6,] 1.951173e-14 2.623751e-07 9.999997e-01
##  [7,] 4.178268e-12 6.742242e-04 9.993258e-01
##  [8,] 9.999593e-01 4.066221e-05 4.031836e-13
##  [9,] 2.385173e-03 9.976148e-01 1.675498e-08
## [10,] 3.586300e-06 9.999964e-01 1.259623e-08
## [11,] 6.576018e-12 5.893853e-01 4.106147e-01
## [12,] 9.944699e-14 1.031607e-03 9.989684e-01
## [13,] 1.721401e-05 9.999828e-01 2.699156e-12
## [14,] 3.968327e-13 3.430432e-03 9.965696e-01
## [15,] 2.879444e-13 3.584366e-02 9.641563e-01
## [16,] 1.472714e-13 1.374939e-02 9.862506e-01
## [17,] 1.192117e-11 9.893079e-01 1.069205e-02
## [18,] 8.947844e-12 8.353788e-01 1.646212e-01
## [19,] 1.472714e-13 1.374939e-02 9.862506e-01
## [20,] 1.931076e-11 1.602551e-01 8.397449e-01
## [21,] 1.598988e-08 9.952196e-01 4.780419e-03
## [22,] 5.363465e-11 2.225502e-01 7.774498e-01
## [23,] 2.975015e-09 9.999005e-01 9.951625e-05
## [24,] 2.843126e-15 2.359438e-02 9.764056e-01
## [25,] 9.663192e-01 3.368080e-02 1.772771e-11
## [26,] 2.088866e-15 8.025448e-04 9.991975e-01
## [27,] 1.164327e-13 9.662455e-01 3.375451e-02
## [28,] 3.442861e-10 1.000000e+00 6.169606e-11
## [29,] 1.434525e-07 9.999999e-01 6.426672e-14
## [30,] 3.227681e-07 9.999997e-01 7.712005e-11
## [31,] 1.044014e-15 1.203334e-03 9.987967e-01
## [32,] 3.403249e-15 3.530338e-02 9.646966e-01
##  [1] Low    High   Medium High   High   High   High   Low    Medium Medium
## [11] Medium High   Medium High   High   High   Medium Medium High   High  
## [21] Medium High   Medium High   Low    High   Medium Medium Medium Medium
## [31] High   High  
## Levels: Low Medium High

4. Display the result of classification

Classification Result
Department Years Ideas Communication Recognition Training Conditions Tools Balance Satisfaction Low Medium High pred.class
Administrative 16-20 Low Medium Low Low High High Low Low 0.9772912 0.0227088 0.0000000 Low
Administrative 1-5 Medium High Medium High High High Medium High 0.0000000 0.0455369 0.9544631 High
Administrative 11-15 Medium Medium Low Low High High High Medium 0.0000001 0.9999936 0.0000063 Medium
Maintenance 16-20 High High Medium High High High Medium High 0.0000000 0.0000557 0.9999443 High
Maintenance 11-15 High High High High High High High High 0.0000000 0.0000169 0.9999831 High
Management 1-5 High High High Medium High Medium High High 0.0000000 0.0000003 0.9999997 High
Management 1-5 Medium High Medium Medium High High High High 0.0000000 0.0006742 0.9993258 High
Management 1-5 Low Low Low Low Medium High Medium Low 0.9999593 0.0000407 0.0000000 Low
Production 16-20 Low Medium Low High High High Low Medium 0.0023852 0.9976148 0.0000000 Medium
Production 11-15 Low Medium Low High High High Low Medium 0.0000036 0.9999964 0.0000000 Medium
Production 11-15 Medium Medium Medium High High High Medium High 0.0000000 0.5893853 0.4106147 Medium
Production 1-5 High High High High High High High High 0.0000000 0.0010316 0.9989684 High
Production 6-10 Low Low Low Medium Medium High Low Medium 0.0000172 0.9999828 0.0000000 Medium
Production 1-5 High High High Medium High High High High 0.0000000 0.0034304 0.9965696 High
Production 1-5 Medium High Medium High High High High Medium 0.0000000 0.0358437 0.9641563 High
Production 1-5 Medium High High High High High High High 0.0000000 0.0137494 0.9862506 High
Production 1-5 Medium High Medium Medium Low High High Medium 0.0000000 0.9893079 0.0106921 Medium
Production 1-5 Medium Medium High Medium Medium High High Medium 0.0000000 0.8353788 0.1646212 Medium
Production 1-5 Medium High High High High High High High 0.0000000 0.0137494 0.9862506 High
Production 11-15 High High Medium High Medium High Medium Medium 0.0000000 0.1602551 0.8397449 High
Production 1-5 Medium High Medium Low Medium High High Medium 0.0000000 0.9952196 0.0047804 Medium
Production 6-10 High High Medium High Medium High Medium High 0.0000000 0.2225502 0.7774498 High
Production 16-20 Medium Medium High Medium Medium High Low Medium 0.0000000 0.9999005 0.0000995 Medium
Production 11-15 High Medium High High High High High High 0.0000000 0.0235944 0.9764056 High
Production 1-5 Low High Low Low Low High Medium Low 0.9663192 0.0336808 0.0000000 Low
QC 1-5 High High High High High High High High 0.0000000 0.0008025 0.9991975 High
QC 11-15 Medium High High High High High Low Medium 0.0000000 0.9662455 0.0337545 Medium
SR 21+ Medium Low Low Medium Low High Medium Medium 0.0000000 1.0000000 0.0000000 Medium
SR 6-10 Medium Low Low Low Low High Low Medium 0.0000001 0.9999999 0.0000000 Medium
SR 21+ Low Medium Low High Low High Medium Medium 0.0000003 0.9999997 0.0000000 Medium
SR 1-5 High High High High High High High High 0.0000000 0.0012033 0.9987967 High
SR 16-20 Medium High High High High High High High 0.0000000 0.0353034 0.9646966 High

5. Confusion Matrix and Statistics

## Confusion Matrix and Statistics
## 
##           Reference
## Prediction Low Medium High
##     Low      3      0    0
##     Medium   0     12    1
##     High     0      2   14
## 
## Overall Statistics
##                                           
##                Accuracy : 0.9062          
##                  95% CI : (0.7498, 0.9802)
##     No Information Rate : 0.4688          
##     P-Value [Acc > NIR] : 2.331e-07       
##                                           
##                   Kappa : 0.8381          
##                                           
##  Mcnemar's Test P-Value : NA              
## 
## Statistics by Class:
## 
##                      Class: Low Class: Medium Class: High
## Sensitivity             1.00000        0.8571      0.9333
## Specificity             1.00000        0.9444      0.8824
## Pos Pred Value          1.00000        0.9231      0.8750
## Neg Pred Value          1.00000        0.8947      0.9375
## Prevalence              0.09375        0.4375      0.4688
## Detection Rate          0.09375        0.3750      0.4375
## Detection Prevalence    0.09375        0.4062      0.5000
## Balanced Accuracy       1.00000        0.9008      0.9078

6. Output Discussion

The Naive Bayes model achieved 90.62% accuracy, with a strong Kappa score of 0.8381, indicating high agreement between predictions and actual values. Low satisfaction was perfectly classified (100% sensitivity and precision), while Medium (85.71% sensitivity, 92.31% precision) and High (93.33% sensitivity, 87.50% precision) had minor misclassifications. The balanced accuracy remains high across all classes, making this model effective for predicting job satisfaction trends.