This report will assess the dataset designed to understand the factors that lead a person to leave current job for HR researches. By certain variabels that use the some credentials,demographics information, experience data, this report will try to predict the probability of a candidate to look for a new job or will work for the company, as well as interpreting affected factors on employee decision.
This report will try to evaluate three Models of Classification: Naive Bayes Model, Random Forest Model, and Decision Tree Model to predict all of the predictor variables of the data test. And based on the evaluation of each model, the report will then try to answer what will be the optimal model that can represents the business decision.
Later on after some pre-processing, the data will be divided into two: train data and test data. The train data will be used to for modeling and model selection, meanwhile the test data will be used to test the prediction based on the selected models. The data is based from csv-type file dataset with the following information:
job <- read.csv("job.csv", stringsAsFactors = T, na.strings=c("","","NA"))
str(job)#> 'data.frame': 19158 obs. of 14 variables:
#> $ enrollee_id : int 8949 29725 11561 33241 666 21651 28806 402 27107 699 ...
#> $ city : Factor w/ 123 levels "city_1","city_10",..: 6 78 65 15 51 58 50 84 6 6 ...
#> $ city_development_index: num 0.92 0.776 0.624 0.789 0.767 0.764 0.92 0.762 0.92 0.92 ...
#> $ gender : Factor w/ 3 levels "Female","Male",..: 2 2 NA NA 2 NA 2 2 2 NA ...
#> $ relevent_experience : Factor w/ 2 levels "Has relevent experience",..: 1 2 2 2 1 1 1 1 1 1 ...
#> $ enrolled_university : Factor w/ 3 levels "Full time course",..: 2 2 1 NA 2 3 2 2 2 2 ...
#> $ education_level : Factor w/ 5 levels "Graduate","High School",..: 1 1 1 1 3 1 2 1 1 1 ...
#> $ major_discipline : Factor w/ 6 levels "Arts","Business Degree",..: 6 6 6 2 6 6 NA 6 6 6 ...
#> $ experience : Factor w/ 22 levels "<1",">20","1",..: 2 9 18 1 2 5 18 7 20 11 ...
#> $ company_size : Factor w/ 8 levels "<10","10/49",..: NA 6 NA NA 6 NA 6 1 6 5 ...
#> $ company_type : Factor w/ 6 levels "Early Stage Startup",..: NA 6 NA 6 2 NA 2 6 6 6 ...
#> $ last_new_job : Factor w/ 6 levels ">4","1","2","3",..: 2 1 6 6 5 2 2 1 2 1 ...
#> $ training_hours : int 36 47 83 52 8 24 24 18 46 123 ...
#> $ target : num 1 0 0 1 0 1 0 1 1 0 ...
Checking for Data Types and NA value of Data:
summary(job)#> enrollee_id city city_development_index gender
#> Min. : 1 city_103:4355 Min. :0.4480 Female: 1238
#> 1st Qu.: 8554 city_21 :2702 1st Qu.:0.7400 Male :13221
#> Median :16983 city_16 :1533 Median :0.9030 Other : 191
#> Mean :16875 city_114:1336 Mean :0.8288 NA's : 4508
#> 3rd Qu.:25170 city_160: 845 3rd Qu.:0.9200
#> Max. :33380 city_136: 586 Max. :0.9490
#> (Other) :7801
#> relevent_experience enrolled_university
#> Has relevent experience:13792 Full time course: 3757
#> No relevent experience : 5366 no_enrollment :13817
#> Part time course: 1198
#> NA's : 386
#>
#>
#>
#> education_level major_discipline experience
#> Graduate :11598 Arts : 253 >20 : 3286
#> High School : 2017 Business Degree: 327 5 : 1430
#> Masters : 4361 Humanities : 669 4 : 1403
#> Phd : 414 No Major : 223 3 : 1354
#> Primary School: 308 Other : 381 6 : 1216
#> NA's : 460 STEM :14492 (Other):10404
#> NA's : 2813 NA's : 65
#> company_size company_type last_new_job training_hours
#> 50-99 :3083 Early Stage Startup: 603 >4 :3290 Min. : 1.00
#> 100-500 :2571 Funded Startup :1001 1 :8040 1st Qu.: 23.00
#> 10000+ :2019 NGO : 521 2 :2900 Median : 47.00
#> 10/49 :1471 Other : 121 3 :1024 Mean : 65.37
#> 1000-4999:1328 Public Sector : 955 4 :1029 3rd Qu.: 88.00
#> (Other) :2748 Pvt Ltd :9817 never:2452 Max. :336.00
#> NA's :5938 NA's :6140 NA's : 423
#> target
#> Min. :0.0000
#> 1st Qu.:0.0000
#> Median :0.0000
#> Mean :0.2493
#> 3rd Qu.:0.0000
#> Max. :1.0000
#>
glimpse(job)#> Rows: 19,158
#> Columns: 14
#> $ enrollee_id <int> 8949, 29725, 11561, 33241, 666, 21651, 28806, 4~
#> $ city <fct> city_103, city_40, city_21, city_115, city_162,~
#> $ city_development_index <dbl> 0.920, 0.776, 0.624, 0.789, 0.767, 0.764, 0.920~
#> $ gender <fct> Male, Male, NA, NA, Male, NA, Male, Male, Male,~
#> $ relevent_experience <fct> Has relevent experience, No relevent experience~
#> $ enrolled_university <fct> no_enrollment, no_enrollment, Full time course,~
#> $ education_level <fct> Graduate, Graduate, Graduate, Graduate, Masters~
#> $ major_discipline <fct> STEM, STEM, STEM, Business Degree, STEM, STEM, ~
#> $ experience <fct> >20, 15, 5, <1, >20, 11, 5, 13, 7, 17, 2, 5, >2~
#> $ company_size <fct> NA, 50-99, NA, NA, 50-99, NA, 50-99, <10, 50-99~
#> $ company_type <fct> NA, Pvt Ltd, NA, Pvt Ltd, Funded Startup, NA, F~
#> $ last_new_job <fct> 1, >4, never, never, 4, 1, 1, >4, 1, >4, never,~
#> $ training_hours <int> 36, 47, 83, 52, 8, 24, 24, 18, 46, 123, 32, 108~
#> $ target <dbl> 1, 0, 0, 1, 0, 1, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0,~
These data cleansing steps will be done in order to create a more optimal on both data train and data test set:
enrollee_id: will be ommitted as it will not be used on the report.
target : will be changed as factor data type
job <- job %>%
dplyr::select(-enrollee_id) %>%
mutate(target = as.factor(target))
glimpse(job)#> Rows: 19,158
#> Columns: 13
#> $ city <fct> city_103, city_40, city_21, city_115, city_162,~
#> $ city_development_index <dbl> 0.920, 0.776, 0.624, 0.789, 0.767, 0.764, 0.920~
#> $ gender <fct> Male, Male, NA, NA, Male, NA, Male, Male, Male,~
#> $ relevent_experience <fct> Has relevent experience, No relevent experience~
#> $ enrolled_university <fct> no_enrollment, no_enrollment, Full time course,~
#> $ education_level <fct> Graduate, Graduate, Graduate, Graduate, Masters~
#> $ major_discipline <fct> STEM, STEM, STEM, Business Degree, STEM, STEM, ~
#> $ experience <fct> >20, 15, 5, <1, >20, 11, 5, 13, 7, 17, 2, 5, >2~
#> $ company_size <fct> NA, 50-99, NA, NA, 50-99, NA, 50-99, <10, 50-99~
#> $ company_type <fct> NA, Pvt Ltd, NA, Pvt Ltd, Funded Startup, NA, F~
#> $ last_new_job <fct> 1, >4, never, never, 4, 1, 1, >4, 1, >4, never,~
#> $ training_hours <int> 36, 47, 83, 52, 8, 24, 24, 18, 46, 123, 32, 108~
#> $ target <fct> 1, 0, 0, 1, 0, 1, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0,~
anyNA(job)#> [1] TRUE
colSums(is.na(job))#> city city_development_index gender
#> 0 0 4508
#> relevent_experience enrolled_university education_level
#> 0 386 460
#> major_discipline experience company_size
#> 2813 65 5938
#> company_type last_new_job training_hours
#> 6140 423 0
#> target
#> 0
Note:
Column target as the target variabel has been changed to factor data type.
From the data proportion checking, it can be concluded that the initial train data is quite imbalanced; hence we will have to balance the data train before using it fr modelling.
There are also ‘NA’ values based on preliminary check on columns:
In order to create a better overall result, we will try to replace any missing/ NA values based on its types:
Data with missing Numeric type values: will be replaced by its mean values (using mean() function).
Data value with the factor data type will be replaced with value that has highest number of occurrences in its set of data (using mode() function).
Creating Function for data cleansing:
Mode = function(x){
a = table(x)
b = max(a)
if(all(a == b))
mod = NA
else if(is.numeric(x))
mod = as.numeric(names(a))[a==b]
else
mod = names(a)[a==b]
return(mod)
}job$gender[is.na(job$gender)] <- Mode(job$gender)
job$enrolled_university[is.na(job$enrolled_university)] <- Mode(job$enrolled_university)
job$education_level[is.na(job$education_level)] <- Mode(job$education_level)
job$major_discipline[is.na(job$major_discipline)] <- Mode(job$major_discipline)
job$company_type[is.na(job$company_type)] <- Mode(job$company_type)
job$experience[is.na(job$experience)] <- Mode(job$experience)
job$company_size[is.na(job$company_size)] <- Mode(job$company_size)
job$last_new_job[is.na(job$last_new_job)] <- Mode(job$last_new_job)summary(job)#> city city_development_index gender
#> city_103:4355 Min. :0.4480 Female: 1238
#> city_21 :2702 1st Qu.:0.7400 Male :17729
#> city_16 :1533 Median :0.9030 Other : 191
#> city_114:1336 Mean :0.8288
#> city_160: 845 3rd Qu.:0.9200
#> city_136: 586 Max. :0.9490
#> (Other) :7801
#> relevent_experience enrolled_university
#> Has relevent experience:13792 Full time course: 3757
#> No relevent experience : 5366 no_enrollment :14203
#> Part time course: 1198
#>
#>
#>
#>
#> education_level major_discipline experience company_size
#> Graduate :12058 Arts : 253 >20 :3351 50-99 :9021
#> High School : 2017 Business Degree: 327 5 :1430 100-500 :2571
#> Masters : 4361 Humanities : 669 4 :1403 10000+ :2019
#> Phd : 414 No Major : 223 3 :1354 10/49 :1471
#> Primary School: 308 Other : 381 6 :1216 1000-4999:1328
#> STEM :17305 2 :1127 <10 :1308
#> (Other):9277 (Other) :1440
#> company_type last_new_job training_hours target
#> Early Stage Startup: 603 >4 :3290 Min. : 1.00 0:14381
#> Funded Startup : 1001 1 :8463 1st Qu.: 23.00 1: 4777
#> NGO : 521 2 :2900 Median : 47.00
#> Other : 121 3 :1024 Mean : 65.37
#> Public Sector : 955 4 :1029 3rd Qu.: 88.00
#> Pvt Ltd :15957 never:2452 Max. :336.00
#>
In order to do the cross validation, the data will be splitted into two parts, the train data (consiste of 80% od data), and test data (consist of 20% od data).
The train data will be used for training of the model, where the test data will be used for testing the model performance. The model then will also be tested to predict the test data. The predicted results and actual data from the test data will then be compared to validate the model performance.
RNGkind(sample.kind = "Rounding")
set.seed(123)
index <- sample(nrow(job),
nrow(job)*0.8)
job_train <- job[index,]
job_test <- job[-index,]
head(job_train)#> city city_development_index gender relevent_experience
#> 5510 city_152 0.698 Male Has relevent experience
#> 15102 city_45 0.890 Male Has relevent experience
#> 7835 city_75 0.939 Male No relevent experience
#> 16915 city_21 0.624 Male No relevent experience
#> 18014 city_173 0.878 Male Has relevent experience
#> 873 city_9 0.743 Female No relevent experience
#> enrolled_university education_level major_discipline experience
#> 5510 Full time course Graduate STEM 9
#> 15102 no_enrollment Graduate STEM 15
#> 7835 no_enrollment Masters STEM 15
#> 16915 Full time course High School STEM <1
#> 18014 no_enrollment High School STEM 6
#> 873 Full time course High School STEM 4
#> company_size company_type last_new_job training_hours target
#> 5510 <10 Pvt Ltd >4 9 0
#> 15102 500-999 Pvt Ltd never 74 0
#> 7835 10000+ Pvt Ltd >4 52 0
#> 16915 50-99 Pvt Ltd never 111 0
#> 18014 50-99 Pvt Ltd 1 85 0
#> 873 50-99 Pvt Ltd never 22 0
tail(job_test)#> city city_development_index gender relevent_experience
#> 19136 city_65 0.802 Male Has relevent experience
#> 19142 city_23 0.899 Male Has relevent experience
#> 19149 city_21 0.624 Male Has relevent experience
#> 19150 city_103 0.920 Male Has relevent experience
#> 19152 city_149 0.689 Male No relevent experience
#> 19153 city_103 0.920 Female Has relevent experience
#> enrolled_university education_level major_discipline experience
#> 19136 no_enrollment Graduate STEM 8
#> 19142 no_enrollment Graduate STEM 17
#> 19149 no_enrollment Masters STEM 3
#> 19150 no_enrollment Masters STEM 9
#> 19152 Full time course Graduate STEM 2
#> 19153 no_enrollment Graduate Humanities 7
#> company_size company_type last_new_job training_hours target
#> 19136 50-99 Public Sector 2 136 0
#> 19142 10/49 Funded Startup 3 12 0
#> 19149 100-500 Pvt Ltd 3 40 1
#> 19150 50-99 Pvt Ltd 1 36 1
#> 19152 50-99 Pvt Ltd 1 60 0
#> 19153 10/49 Funded Startup 1 25 0
We will also check the proportion of target variable (looking for a job vs. not looking for a job).
prop.table(table(job_train$target))#>
#> 0 1
#> 0.7508809 0.2491191
As the proportion shows, the class on the target variabels does not have a balances proportion, but due to the data and time constraint, it will considered as adequate to continue with the modeling.
model_naive <- naiveBayes(target~., data = job_train)
model_naive#>
#> Naive Bayes Classifier for Discrete Predictors
#>
#> Call:
#> naiveBayes.default(x = X, y = Y, laplace = laplace)
#>
#> A-priori probabilities:
#> Y
#> 0 1
#> 0.7508809 0.2491191
#>
#> Conditional probabilities:
#> city
#> Y city_1 city_10 city_100 city_101 city_102
#> 0 0.00156412930 0.00547445255 0.01511991658 0.00208550574 0.01772679875
#> 1 0.00078575170 0.00235725511 0.01283394447 0.00916710320 0.01126244107
#> city
#> Y city_103 city_104 city_105 city_106 city_107
#> 0 0.23618352450 0.01885644769 0.00434480361 0.00034758429 0.00017379214
#> 1 0.19696176008 0.00654793085 0.00130958617 0.00052383447 0.00078575170
#> city
#> Y city_109 city_11 city_111 city_114 city_115
#> 0 0.00017379214 0.00695168578 0.00026068822 0.08507125478 0.00217240181
#> 1 0.00052383447 0.03195390257 0.00000000000 0.02671555788 0.00419067575
#> city
#> Y city_116 city_117 city_118 city_12 city_120
#> 0 0.00790754258 0.00069516858 0.00112964894 0.00078206465 0.00034758429
#> 1 0.00392875851 0.00104766894 0.00157150340 0.00052383447 0.00026191723
#> city
#> Y city_121 city_123 city_126 city_127 city_128
#> 0 0.00017379214 0.00408411540 0.00078206465 0.00069516858 0.00278067431
#> 1 0.00026191723 0.00392875851 0.00366684128 0.00026191723 0.01073860660
#> city
#> Y city_129 city_13 city_131 city_133 city_134
#> 0 0.00017379214 0.00286757039 0.00034758429 0.00060827251 0.00234619395
#> 1 0.00000000000 0.00104766894 0.00078575170 0.00052383447 0.00235725511
#> city
#> Y city_136 city_138 city_139 city_14 city_140
#> 0 0.03710462287 0.00799443865 0.00008689607 0.00139033716 0.00008689607
#> 1 0.01309586171 0.00183342064 0.00078575170 0.00130958617 0.00000000000
#> city
#> Y city_141 city_142 city_143 city_144 city_145
#> 0 0.00156412930 0.00260688217 0.00173792145 0.00121654501 0.00173792145
#> 1 0.00052383447 0.00288108958 0.00340492404 0.00183342064 0.00811943426
#> city
#> Y city_146 city_149 city_150 city_152 city_155
#> 0 0.00026068822 0.00538755648 0.00312825860 0.00269377824 0.00026068822
#> 1 0.00078575170 0.00392875851 0.00340492404 0.00235725511 0.00261917234
#> city
#> Y city_157 city_158 city_159 city_16 city_160
#> 0 0.00156412930 0.00182481752 0.00538755648 0.09289190129 0.04449078902
#> 1 0.00052383447 0.00314300681 0.00314300681 0.03771608172 0.04112100576
#> city
#> Y city_162 city_165 city_166 city_167 city_171
#> 0 0.00625651721 0.00408411540 0.00017379214 0.00060827251 0.00000000000
#> 1 0.00680984809 0.00392875851 0.00026191723 0.00078575170 0.00000000000
#> city
#> Y city_173 city_175 city_176 city_179 city_18
#> 0 0.00929787974 0.00060827251 0.00104275287 0.00008689607 0.00026068822
#> 1 0.00419067575 0.00052383447 0.00157150340 0.00078575170 0.00026191723
#> city
#> Y city_180 city_19 city_2 city_20 city_21
#> 0 0.00043448036 0.00530066041 0.00034758429 0.00182481752 0.07838025721
#> 1 0.00052383447 0.00969093766 0.00000000000 0.00078575170 0.33368255631
#> city
#> Y city_23 city_24 city_25 city_26 city_27
#> 0 0.01112269725 0.00356273896 0.00008689607 0.00156412930 0.00312825860
#> 1 0.00366684128 0.00340492404 0.00052383447 0.00104766894 0.00183342064
#> city
#> Y city_28 city_30 city_31 city_33 city_36
#> 0 0.01251303441 0.00104275287 0.00026068822 0.00043448036 0.01051442475
#> 1 0.00288108958 0.00026191723 0.00026191723 0.00235725511 0.00288108958
#> city
#> Y city_37 city_39 city_40 city_41 city_42
#> 0 0.00078206465 0.00034758429 0.00382342718 0.00460549183 0.00026068822
#> 1 0.00052383447 0.00000000000 0.00235725511 0.00340492404 0.00157150340
#> city
#> Y city_43 city_44 city_45 city_46 city_48
#> 0 0.00043448036 0.00078206465 0.00669099757 0.00686478971 0.00043448036
#> 1 0.00104766894 0.00157150340 0.00340492404 0.00707176532 0.00130958617
#> city
#> Y city_50 city_53 city_54 city_55 city_57
#> 0 0.00842891901 0.00104275287 0.00086896072 0.00078206465 0.00643030935
#> 1 0.00340492404 0.00104766894 0.00052383447 0.00078575170 0.00288108958
#> city
#> Y city_59 city_61 city_62 city_64 city_65
#> 0 0.00052137643 0.01216545012 0.00043448036 0.00634341328 0.01060132082
#> 1 0.00052383447 0.00419067575 0.00000000000 0.00314300681 0.00471451021
#> city
#> Y city_67 city_69 city_7 city_70 city_71
#> 0 0.02537365311 0.00121654501 0.00165102537 0.00199860966 0.01581508516
#> 1 0.01257202724 0.00026191723 0.00130958617 0.00392875851 0.00995285490
#> city
#> Y city_72 city_73 city_74 city_75 city_76
#> 0 0.00112964894 0.01364268335 0.00382342718 0.01946472019 0.00260688217
#> 1 0.00026191723 0.01571503405 0.01073860660 0.00654793085 0.00235725511
#> city
#> Y city_77 city_78 city_79 city_8 city_80
#> 0 0.00243309002 0.00121654501 0.00034758429 0.00034758429 0.00095585680
#> 1 0.00000000000 0.00314300681 0.00052383447 0.00000000000 0.00052383447
#> city
#> Y city_81 city_82 city_83 city_84 city_89
#> 0 0.00043448036 0.00026068822 0.00947167188 0.00139033716 0.00347584289
#> 1 0.00000000000 0.00000000000 0.00471451021 0.00157150340 0.00340492404
#> city
#> Y city_9 city_90 city_91 city_93 city_94
#> 0 0.00095585680 0.00895029545 0.00217240181 0.00156412930 0.00078206465
#> 1 0.00104766894 0.01309586171 0.00314300681 0.00104766894 0.00261917234
#> city
#> Y city_97 city_98 city_99
#> 0 0.00651720542 0.00495307612 0.00582203684
#> 1 0.00104766894 0.00130958617 0.00340492404
#>
#> city_development_index
#> Y [,1] [,2]
#> 0 0.8528382 0.1057588
#> 1 0.7562677 0.1436121
#>
#> gender
#> Y Female Male Other
#> 0 0.063694821 0.926485923 0.009819256
#> 1 0.070717653 0.918543740 0.010738607
#>
#> relevent_experience
#> Y Has relevent experience No relevent experience
#> 0 0.7539103 0.2460897
#> 1 0.6165532 0.3834468
#>
#> enrolled_university
#> Y Full time course no_enrollment Part time course
#> 0 0.16388599 0.77189781 0.06421620
#> 1 0.29701414 0.64012572 0.06286014
#>
#> education_level
#> Y Graduate High School Masters Phd Primary School
#> 0 0.605665624 0.113051790 0.239051095 0.024070212 0.018161279
#> 1 0.690937664 0.083551598 0.204819277 0.012310110 0.008381351
#>
#> major_discipline
#> Y Arts Business Degree Humanities No Major Other STEM
#> 0 0.01338200 0.01746611 0.03597497 0.01129649 0.01981230 0.90206813
#> 1 0.01126244 0.01754845 0.02985856 0.01100052 0.02147721 0.90885280
#>
#> experience
#> Y <1 >20 1 10 11 12
#> 0 0.018943344 0.195429267 0.023461940 0.054396941 0.035279805 0.027546055
#> 1 0.046883185 0.113933997 0.047668937 0.042430592 0.031168151 0.017810372
#> experience
#> Y 13 14 15 16 17 18
#> 0 0.022506083 0.032412235 0.037799791 0.030674314 0.019638512 0.017379214
#> 1 0.015976951 0.023834468 0.024358303 0.012833944 0.011786276 0.009690938
#> experience
#> Y 19 2 20 3 4 5
#> 0 0.017292318 0.051529371 0.007733750 0.061087939 0.065172054 0.071080987
#> 1 0.010476689 0.075432163 0.007071765 0.099266632 0.098218963 0.087218439
#> experience
#> Y 6 7 8 9
#> 0 0.060740355 0.052572124 0.042752868 0.054570733
#> 1 0.069408067 0.067050812 0.041121006 0.046359350
#>
#> company_size
#> Y <10 10/49 100-500 1000-4999 10000+ 50-99
#> 0 0.07559958 0.07881474 0.15059089 0.07864095 0.11478971 0.41979493
#> 1 0.04557360 0.07255107 0.08721844 0.03981142 0.08067051 0.62074384
#> company_size
#> Y 500-999 5000-9999
#> 0 0.04961766 0.03215155
#> 1 0.03195390 0.02147721
#>
#> company_type
#> Y Early Stage Startup Funded Startup NGO Other Public Sector
#> 0 0.033368092 0.060305874 0.029544665 0.006430309 0.051007994
#> 1 0.029072813 0.031953903 0.020167627 0.005500262 0.043740178
#> company_type
#> Y Pvt Ltd
#> 0 0.819343066
#> 1 0.869565217
#>
#> last_new_job
#> Y >4 1 2 3 4 never
#> 0 0.18300313 0.43248175 0.15154675 0.05622176 0.05848106 0.11826555
#> 1 0.12807753 0.47249869 0.15217391 0.04924044 0.04740702 0.15060241
#>
#> training_hours
#> Y [,1] [,2]
#> 0 65.93491 61.08793
#> 1 63.17575 57.32707
From the result we can see that there is no indication of skewness due to data scarcity, as all of the class variables are having probability values more than zero (>0). The model also creates the conditional probability for each feature separately. From the result also indicating a-priori probabilities which indicates the distribution of the data.
pred_naive <- predict(model_naive, newdata = job_test)
pred_naive#> [1] 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 1 0 0 0 1 0 0 0 0 0
#> [38] 0 0 0 1 0 1 0 1 1 0 0 0 0 1 0 0 1 0 0 0 0 1 0 0 1 0 0 0 0 0 0 1 0 1 0 0 0
#> [75] 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 1 1 0 0 1 1 0 0 1 0 0 1 0 0
#> [112] 0 0 1 1 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0
#> [149] 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 1 0
#> [186] 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 1 0 0 0 0 0 1 0 1 0 1 1 0
#> [223] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 1 1 0 1 0 0 0 0 1 0 0 0 1 0 0 1 0 1 1
#> [260] 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 1 0 0 1
#> [297] 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1
#> [334] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0
#> [371] 0 0 0 1 0 0 0 1 1 0 1 1 0 0 1 0 0 0 0 1 0 0 0 0 0 0 1 0 1 1 0 1 1 0 1 0 0
#> [408] 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 1 1 0 0 0 1 0 0 0 1 1 0 0 0 0 1 0 0 0 0 0 0
#> [445] 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 1 0 0 0
#> [482] 1 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 0 1 1 0 1 0 0 0 1 0 1
#> [519] 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 1 0 1 0 0 1 0 0 0 1
#> [556] 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0
#> [593] 0 0 1 0 0 1 0 0 0 0 0 0 1 1 0 0 0 1 1 1 0 1 0 0 1 0 0 0 0 0 0 0 1 1 0 0 1
#> [630] 0 1 0 0 0 0 1 0 1 1 0 0 0 1 0 0 0 0 0 0 0 1 0 1 0 0 0 0 1 0 1 0 0 0 1 0 0
#> [667] 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 1 0 1 1 0 0 1 0 0 1 0 0 1 0 0 1 1 1
#> [704] 0 0 0 1 0 1 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0
#> [741] 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 1 0 1 1 1 1 0 1 0 0 0 0
#> [778] 0 0 0 0 1 0 0 0 1 0 0 1 0 0 1 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 1 1 0 0 0 0 0
#> [815] 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 1 0
#> [852] 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 1 0 1 0 0 1 0 0
#> [889] 0 1 0 0 0 0 1 1 0 1 0 1 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 1 1 0 0 0 0 0 0
#> [926] 0 1 1 1 1 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0
#> [963] 0 0 0 0 0 0 1 0 0 0 0 1 0 0 1 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 1
#> [1000] 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 1 0 1 1 1 0 0 0 0 0
#> [1037] 0 0 0 1 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 1 1 0 1 1 0 1 1 0 0 0 0 0 1 0 0 0
#> [1074] 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 0 0 0 1 1 0 1 0
#> [1111] 0 0 0 1 0 1 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 1 0 1 0 0 0 0 0 0 0 1
#> [1148] 0 0 0 0 1 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 1 1 0 0 1 0 0 0 1 0 0 0 0 1 0 1
#> [1185] 0 0 0 0 0 1 0 1 1 0 0 0 1 1 0 0 0 0 1 1 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0
#> [1222] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0
#> [1259] 1 0 0 1 0 1 0 0 1 1 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 1 1 1 1 1 0 0 0 0 0
#> [1296] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 1 0 1 1 0 0 0 0
#> [1333] 1 0 1 1 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0
#> [1370] 0 1 1 0 0 1 0 1 0 0 0 0 0 1 0 1 1 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0
#> [1407] 1 0 1 0 1 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0
#> [1444] 0 1 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 1 0 0 0 0 0 1 1 0 1 0 0 0 0 0 0 0
#> [1481] 1 0 0 0 0 1 0 1 1 0 0 0 1 0 0 1 1 0 0 1 0 0 0 1 0 1 0 0 1 0 0 1 0 0 1 0 1
#> [1518] 1 0 0 1 1 1 1 0 0 0 1 0 0 0 0 0 0 0 1 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0
#> [1555] 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 1 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0
#> [1592] 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0
#> [1629] 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 1 0 0 0 1 0 0 1 0 0 0 0 0 0
#> [1666] 0 1 0 1 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 1 0 0 1 0 0 0 1 1 0 1 0 0
#> [1703] 0 0 0 0 0 1 0 0 1 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0
#> [1740] 0 1 0 0 0 0 1 0 1 0 0 0 1 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 1 1 1 0 0 0 0 0
#> [1777] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1
#> [1814] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 1 0 0 0 0 0 0 0 0 0 1 0 1 0
#> [1851] 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0
#> [1888] 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 1 0 0
#> [1925] 0 1 0 1 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
#> [1962] 1 0 0 0 1 1 0 0 1 1 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1
#> [1999] 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 1 0 0 0
#> [2036] 0 1 0 0 0 0 1 0 0 1 1 0 0 0 0 0 0 1 0 0 1 0 0 0 0 1 1 0 0 0 0 0 1 1 0 0 1
#> [2073] 1 0 0 0 1 0 0 1 1 1 1 0 0 0 1 0 0 0 1 1 0 0 0 0 0 0 1 0 0 1 0 1 0 0 0 0 0
#> [2110] 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0
#> [2147] 0 0 0 1 1 0 1 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0
#> [2184] 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 1 0 1 0 1 0 0 0 0 0 1 0
#> [2221] 0 0 0 1 1 0 0 0 1 0 1 0 0 1 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 1 0 0 1 0
#> [2258] 1 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 1 0 0 1 0 1 0 0 0 0 0 0 0
#> [2295] 1 0 1 0 0 0 1 1 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 1 1 0 0 0 1 0 0 0 1 1
#> [2332] 0 0 1 1 0 0 0 0 0 0 1 0 0 0 1 0 1 1 0 0 1 1 0 1 1 1 0 1 0 0 0 0 0 0 0 0 0
#> [2369] 0 0 0 0 0 1 0 0 1 0 1 0 0 0 1 1 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 1 1 0 1 0 0
#> [2406] 0 1 1 0 0 0 1 0 0 0 0 0 0 1 1 0 1 0 1 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 1
#> [2443] 0 0 0 0 0 0 0 1 0 1 0 0 0 1 1 0 1 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 1 0 0
#> [2480] 0 0 0 0 1 1 1 0 1 0 0 0 0 0 0 1 0 0 1 0 1 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0
#> [2517] 1 0 0 0 0 0 0 0 0 0 0 1 1 0 0 1 0 1 0 0 0 0 0 0 0 1 0 0 0 1 1 0 0 1 0 0 0
#> [2554] 0 0 1 0 1 0 1 1 0 0 1 1 0 0 0 0 0 0 0 1 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 1
#> [2591] 0 0 0 0 0 0 1 0 1 0 1 0 0 0 0 1 0 0 0 0 0 0 0 1 1 0 0 0 1 0 0 0 0 0 0 0 1
#> [2628] 0 0 0 0 1 1 0 0 0 0 1 1 1 1 0 0 0 0 0 0 1 1 0 1 0 0 0 1 0 1 1 0 1 1 0 0 0
#> [2665] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 1 0
#> [2702] 0 1 1 1 0 0 1 0 1 0 0 0 0 1 0 1 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
#> [2739] 0 0 0 0 1 1 0 1 0 1 1 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 1 0 0 0 0
#> [2776] 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0
#> [2813] 0 0 0 0 0 1 0 0 0 0 0 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 1
#> [2850] 0 1 0 1 0 0 0 0 0 0 1 1 0 0 0 0 0 0 1 0 0 1 0 0 1 1 0 0 0 1 0 0 1 0 0 0 0
#> [2887] 0 0 0 0 0 0 1 1 0 0 1 0 0 0 0 0 0 1 1 0 1 0 0 0 0 1 0 0 0 1 0 0 0 1 0 1 0
#> [2924] 0 0 0 1 0 0 1 0 0 0 0 1 0 0 0 1 0 0 1 1 0 1 0 0 1 1 1 0 1 0 0 1 1 0 1 0 0
#> [2961] 0 0 1 0 0 1 1 1 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 1 0 1 0 0 0 1 0
#> [2998] 0 0 0 1 0 0 1 0 0 1 0 1 0 0 0 0 1 0 0 0 0 1 0 1 0 0 1 0 1 0 0 0 0 1 0 1 1
#> [3035] 0 0 1 0 0 0 1 0 0 1 1 1 0 0 0 0 0 1 0 0 1 1 1 0 1 1 0 0 0 0 0 0 0 0 0 0 0
#> [3072] 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
#> [3109] 0 1 0 1 0 0 0 1 0 1 1 1 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0
#> [3146] 1 0 0 0 1 1 0 0 0 1 0 0 0 0 0 1 0 0 0 0 1 0 1 0 0 0 0 0 0 1 1 0 0 0 0 0 0
#> [3183] 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0
#> [3220] 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 1 1 1 0 0 0 0 1 1 0 0 0 0 0 0 0
#> [3257] 1 0 1 0 0 0 0 0 1 0 1 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 1 0 0 0 0 1 1 0 1 0 0
#> [3294] 0 1 0 1 0 0 1 0 1 0 0 0 1 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 1 1 1 0 0 0 0 0
#> [3331] 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 1 1 0 0 0 1 0 0 1 1 0 0 1 1 0 0 0 0 0 0 1
#> [3368] 0 1 0 0 0 1 0 0 1 1 0 0 0 0 0 0 1 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1
#> [3405] 1 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 1
#> [3442] 0 1 1 0 0 1 1 1 0 0 1 0 0 1 1 1 0 0 0 0 0 0 0 1 0 1 0 0 0 1 0 0 0 1 0 0 0
#> [3479] 1 0 1 1 1 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0
#> [3516] 0 0 0 0 0 0 1 1 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 1 0 1 1 0 0
#> [3553] 0 0 0 1 0 1 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 1 1 0 0 1 1
#> [3590] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 1 1 1 0 1 0 0 0 0 0 1 0 0 0 0 0
#> [3627] 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 1 0 0 0 1 0 1 0 1 1 0 0 0 0 0
#> [3664] 1 0 0 0 0 0 0 1 0 0 0 1 1 0 0 0 0 1 0 1 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0
#> [3701] 0 0 0 1 0 0 1 0 0 0 1 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 1 0 1 0 1
#> [3738] 0 0 1 0 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 1 0 1 1 0 1 1 0 1 1 0 0 0 0 0 1 0 1
#> [3775] 0 0 0 0 0 0 0 0 0 1 0 1 0 0 1 0 0 1 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 1 0 0 0
#> [3812] 0 0 1 0 1 0 0 0 0 0 1 0 1 0 1 0 0 1 0 1 0
#> Levels: 0 1
plot(pred_naive)confusionMatrix(data = pred_naive, reference = job_test$target, positive = "1")#> Confusion Matrix and Statistics
#>
#> Reference
#> Prediction 0 1
#> 0 2484 471
#> 1 389 488
#>
#> Accuracy : 0.7756
#> 95% CI : (0.762, 0.7887)
#> No Information Rate : 0.7497
#> P-Value [Acc > NIR] : 0.0001018
#>
#> Kappa : 0.3844
#>
#> Mcnemar's Test P-Value : 0.0057435
#>
#> Sensitivity : 0.5089
#> Specificity : 0.8646
#> Pos Pred Value : 0.5564
#> Neg Pred Value : 0.8406
#> Prevalence : 0.2503
#> Detection Rate : 0.1273
#> Detection Prevalence : 0.2289
#> Balanced Accuracy : 0.6867
#>
#> 'Positive' Class : 1
#>
From the result above, it can be concluded that the model is able to determine 2484 out of 2873 of “0” (Not looking for job change) cases correctly, and 488 out of 959 of “1” (Looking for job change) cases correctly. This means the ability of Naive Bayes algorithm to predict “0” cases is about 86.5%, but it then falls down to about 50.9% of the “1” cases resulting in an overall accuracy of about 77.56%
The report will also gather prediction using Random Forest model as an alternative model and prediction for the test data set.
The Random Forest model is a model that can be used as a classification method based on the ensamble method. The Random Forest model is built from several Decision Trees model with different characteristics. Each tree will use different observations and predictors from the sampling results.
Creating Random Forest Model for all available predictor using k-fold cross validation:
set.seed(123)
ctrl <- trainControl(method = "repeatedcv", number = 5, repeats = 3)# job_forest <- train(target ~., data = job_train, method = "rf", trControl= ctrl)# saveRDS(object = job_forest, file = "job_forest.RDS")Reading the model that has been run and saved on RDS file:
job_forest <- readRDS("job_forest.RDS")Checking the summary of the final model been built using model_rf$finalModel.
summary(job_forest$finalModel)#> Length Class Mode
#> call 4 -none- call
#> type 1 -none- character
#> predicted 15326 factor numeric
#> err.rate 1500 -none- numeric
#> confusion 6 -none- numeric
#> votes 30652 matrix numeric
#> oob.times 15326 -none- numeric
#> classes 2 -none- character
#> importance 176 -none- numeric
#> importanceSD 0 -none- NULL
#> localImportance 0 -none- NULL
#> proximity 0 -none- NULL
#> ntree 1 -none- numeric
#> mtry 1 -none- numeric
#> forest 14 -none- list
#> y 15326 factor numeric
#> test 0 -none- NULL
#> inbag 0 -none- NULL
#> xNames 176 -none- character
#> problemType 1 -none- character
#> tuneValue 1 data.frame list
#> obsLevels 2 -none- character
#> param 0 -none- list
plot(job_forest)In practice, the random forest already have out-of-bag estimates (OOB) that represent an unbiased estimate of its accuracy on unseen data.
Based on the model summary above, the out-of-bag estimate of error rate is 23.54%. This means that we have of about 23.54% of error rate of unseen data.
After building the Random Forest model, we will predict the train and test data set based on the predetermined model.
pred_test_jf <- predict(object = job_forest,
newdata = job_test,
type = "raw")
pred_test_jf#> [1] 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0
#> [38] 0 0 0 0 0 1 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 1 0 0 1 0 0 0 0 0 0 1 0 1 0 0 0
#> [75] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 1 0 1 1 0 0 1 1 0 0 1 0 0 0 0 0
#> [112] 0 0 0 1 0 0 0 1 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0
#> [149] 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0
#> [186] 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 1 0 0 0 0
#> [223] 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 1 0 1 1 0 1 0 1 0 0 0 0 0 0 1 0 0 1 0 0 0
#> [260] 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0
#> [297] 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 1
#> [334] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 0 0 1 1 0 1 0 0
#> [371] 0 0 0 0 0 0 0 1 0 0 0 1 0 0 1 0 0 0 0 1 0 0 0 0 0 0 1 1 1 1 0 1 1 0 1 0 0
#> [408] 0 0 1 0 0 0 1 0 0 0 0 1 0 0 0 1 1 0 0 0 0 0 0 0 1 1 0 0 0 0 1 0 0 0 0 0 0
#> [445] 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 1 0 1 1 0 0 1 0 0 0 1 0 0 0
#> [482] 1 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 1 0 1 1 1
#> [519] 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1
#> [556] 0 0 0 1 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0
#> [593] 0 0 1 0 0 1 1 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 1 1 0 0 1
#> [630] 0 1 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0
#> [667] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0
#> [704] 0 0 1 1 0 1 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 1 0
#> [741] 0 0 0 0 0 0 0 0 0 1 0 1 0 1 0 1 0 0 0 0 1 0 0 0 0 1 0 0 1 1 1 0 1 0 0 0 0
#> [778] 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0
#> [815] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 1 1 0
#> [852] 0 1 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0
#> [889] 0 1 0 1 0 0 1 1 0 0 0 1 0 0 0 1 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0
#> [926] 0 1 1 1 1 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0
#> [963] 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0
#> [1000] 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 1 0 0 1 0 0 1 1 0 0 0 0 0
#> [1037] 0 0 0 1 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0
#> [1074] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 1 0 0 0 0 0 1 1 0 0 1
#> [1111] 0 0 0 1 0 1 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 1 0 1 0 0 0 0 1 0 0 1
#> [1148] 1 0 0 1 1 0 1 0 0 0 1 0 0 0 0 1 0 1 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1
#> [1185] 0 0 0 0 0 0 0 1 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0
#> [1222] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 1 0 0 0 0 0 0 0
#> [1259] 1 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 1 1 0 0 1 0 0 0 0 0
#> [1296] 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0
#> [1333] 0 0 1 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 1 0 0 0 0 0 0 0 0
#> [1370] 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 1 0 0
#> [1407] 1 0 0 0 1 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0
#> [1444] 0 1 0 0 0 0 0 0 1 0 0 0 0 0 1 1 0 1 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0
#> [1481] 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 1 0 1 0 0 0 1 0 1 0 0 1 0 0 0 0 0 1 0 1
#> [1518] 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 1 1 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0
#> [1555] 0 0 1 1 0 1 0 0 0 0 1 0 0 0 0 0 1 0 0 0 1 1 0 0 0 0 0 0 1 1 0 1 0 0 0 0 0
#> [1592] 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1
#> [1629] 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0
#> [1666] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 1 0 0 1 1 0 0 0 0 0
#> [1703] 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 1 0 0
#> [1740] 0 1 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 1 1 0 0 0 0 0
#> [1777] 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 1 0 1 0 0 1 0 0 0 0 1 1 0 0 0 1 0 0 0 0 0
#> [1814] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0
#> [1851] 0 1 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0
#> [1888] 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0
#> [1925] 0 0 0 1 0 0 0 0 0 0 1 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
#> [1962] 1 0 0 0 1 1 0 0 0 1 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1
#> [1999] 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0
#> [2036] 0 1 1 0 0 0 1 0 0 1 0 0 1 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0
#> [2073] 1 1 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0
#> [2110] 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0
#> [2147] 1 0 0 1 0 0 1 0 1 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0
#> [2184] 1 0 1 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 1 0
#> [2221] 0 0 0 0 1 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0
#> [2258] 1 0 1 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0
#> [2295] 0 0 0 0 0 0 1 1 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 1 1 0 0 1 1 0 0 1 0 0
#> [2332] 0 0 1 1 0 0 0 0 0 0 1 0 0 0 0 0 1 1 0 0 1 1 0 0 1 1 0 1 0 0 0 0 0 0 0 1 0
#> [2369] 0 0 0 0 0 1 0 1 0 0 1 0 0 0 1 0 0 0 0 1 0 0 0 0 0 1 1 0 0 0 0 0 1 0 0 0 0
#> [2406] 0 0 1 0 0 0 1 0 0 0 0 0 0 1 1 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 1
#> [2443] 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
#> [2480] 0 0 0 1 1 1 1 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0
#> [2517] 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 1 0 0 0
#> [2554] 0 0 1 0 1 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 1 0 1 0 0 0 0 0 0 0 0 0 0 1
#> [2591] 0 0 0 0 0 1 1 0 1 0 1 0 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 1 1
#> [2628] 0 0 0 0 0 1 0 0 0 0 1 1 0 1 0 0 0 0 0 0 1 1 0 1 0 0 0 1 0 1 1 0 1 1 0 0 0
#> [2665] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 1 0 0 1 0
#> [2702] 0 0 0 1 0 0 1 1 1 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0
#> [2739] 0 0 0 0 1 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0 1 0 0
#> [2776] 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0
#> [2813] 0 0 1 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1
#> [2850] 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0
#> [2887] 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 1 0 0 0 1 0 0 0 0 0 1 0
#> [2924] 0 0 0 1 0 0 0 0 1 0 0 1 1 0 0 1 0 0 0 1 0 1 0 0 1 1 1 0 1 0 1 0 0 0 1 0 0
#> [2961] 0 0 0 0 0 0 1 1 1 0 0 1 0 1 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0
#> [2998] 1 0 0 0 0 0 1 1 0 0 1 0 0 1 0 1 1 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0
#> [3035] 0 0 1 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 1 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1
#> [3072] 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0
#> [3109] 0 1 0 1 0 0 0 1 0 1 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0
#> [3146] 1 0 0 1 0 1 0 0 0 1 0 1 0 0 0 1 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0
#> [3183] 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 1 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0
#> [3220] 1 1 1 0 1 0 0 0 1 0 0 0 0 1 0 0 0 0 1 1 0 1 1 1 0 0 0 0 0 1 0 0 0 0 0 0 0
#> [3257] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 1 0 0 1 0 0 0 1 0 0 0 0 1 1 0 1 1 0
#> [3294] 0 1 0 0 0 0 1 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0
#> [3331] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 1
#> [3368] 0 1 0 0 0 0 0 0 1 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
#> [3405] 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0
#> [3442] 0 1 1 0 0 1 1 0 0 0 1 1 1 0 1 1 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0
#> [3479] 0 0 0 1 0 0 0 0 1 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0
#> [3516] 0 0 0 0 0 0 1 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 1 0 0 0
#> [3553] 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 1 0
#> [3590] 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0
#> [3627] 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0
#> [3664] 1 0 0 0 0 0 0 0 0 1 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
#> [3701] 0 0 0 1 1 0 1 0 0 0 1 0 0 0 0 0 0 1 0 1 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0
#> [3738] 0 0 0 0 1 1 1 1 1 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 1
#> [3775] 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0
#> [3812] 0 0 0 0 1 0 0 0 0 0 0 0 1 0 1 0 0 0 0 1 0
#> Levels: 0 1
We will evaluate the Random Forest model using the Confusion Matrix function and then try to evaluate the performance of the Random Forest model.
# data test
confusionMatrix(data = pred_test_jf,
reference = job_test$target,
positive = "1")#> Confusion Matrix and Statistics
#>
#> Reference
#> Prediction 0 1
#> 0 2560 580
#> 1 313 379
#>
#> Accuracy : 0.767
#> 95% CI : (0.7532, 0.7803)
#> No Information Rate : 0.7497
#> P-Value [Acc > NIR] : 0.006971
#>
#> Kappa : 0.3155
#>
#> Mcnemar's Test P-Value : < 0.00000000000000022
#>
#> Sensitivity : 0.3952
#> Specificity : 0.8911
#> Pos Pred Value : 0.5477
#> Neg Pred Value : 0.8153
#> Prevalence : 0.2503
#> Detection Rate : 0.0989
#> Detection Prevalence : 0.1806
#> Balanced Accuracy : 0.6431
#>
#> 'Positive' Class : 1
#>
We will then try to check which variable has a high significance to the prediction by using varImp function and visualize it to get the visualization.
# your code here
varImp_jf <- varImp(job_forest)
varImp_jf#> rf variable importance
#>
#> only 20 most important variables shown (out of 176)
#>
#> Overall
#> training_hours 100.000
#> city_development_index 67.950
#> citycity_21 16.975
#> company_size50-99 16.521
#> education_levelMasters 10.428
#> last_new_job1 10.393
#> enrolled_universityno_enrollment 9.673
#> relevent_experienceNo relevent experience 9.592
#> last_new_job2 7.666
#> education_levelHigh School 6.839
#> experience4 6.737
#> company_typePvt Ltd 6.585
#> experience5 6.472
#> genderMale 6.415
#> experience3 6.291
#> last_new_jobnever 6.247
#> experience>20 6.243
#> experience7 5.576
#> experience2 5.506
#> experience6 5.336
plot(varImp_jf)From the result above, we can conclude that the top five variable that have high significance to the prediction are as follows:
From the result above, it can be concluded that the model is able to determine 2560 out of 2873 of “0” (Not looking for job change) cases correctly, and 382 out of 959 of “1” (Looking for job change) cases correctly. This means the ability of Naive Bayes algorithm to predict “0” cases is about 89.1%, but it then falls down to about 39.8% of the “1” cases resulting in an overall accuracy of about 76.77%
The report will also gather prediction using Decision Tree model as an alternative model and prediction for the test data set.
Decision tree model is selected as it is powerful and versatile and also very interpretable. It also works by simplifying the rules for making decisions.
When building a decision tree model, we can determine how complex the rules by pruning method (by limiting the formation of branches in the tree / simplifying the tree being formed) to prevent overfitting. This report will use ctree_control function with the following parameters:
mincriterion: Value is 1 - P-value. Works as a “regulator” for tree depth. The smaller the value, the more complex the resulting tree will be.minsplit: Minimum number of observations on the node before splitting.minbucket: minimum number of observations at the terminal / leaf node.Creating Decision Tree Model by Pruning method:
library(party)
model_tree <- ctree(target ~.,
data = job_train,
control = ctree_control(mincriterion = 0.90))
plot(model_tree)After building the Decision Tree model, we will predict the train and test data set based on the predetermined model.
ctree_test <- predict(model_tree, newdata = job_test)
ctree_train <- predict(model_tree, newdata = job_train)We will evaluate the Decision Tree model using the Confusion Matrix function and then try to evaluate the performance of the Decision Tree model.
# confusion matrix
confusionMatrix(ctree_test, reference = job_test$target, positive = "1")#> Confusion Matrix and Statistics
#>
#> Reference
#> Prediction 0 1
#> 0 2495 477
#> 1 378 482
#>
#> Accuracy : 0.7769
#> 95% CI : (0.7634, 0.79)
#> No Information Rate : 0.7497
#> P-Value [Acc > NIR] : 0.00004697
#>
#> Kappa : 0.3843
#>
#> Mcnemar's Test P-Value : 0.0008037
#>
#> Sensitivity : 0.5026
#> Specificity : 0.8684
#> Pos Pred Value : 0.5605
#> Neg Pred Value : 0.8395
#> Prevalence : 0.2503
#> Detection Rate : 0.1258
#> Detection Prevalence : 0.2244
#> Balanced Accuracy : 0.6855
#>
#> 'Positive' Class : 1
#>
confusionMatrix(ctree_train, reference = job_train$target, positive = "1")#> Confusion Matrix and Statistics
#>
#> Reference
#> Prediction 0 1
#> 0 9997 1760
#> 1 1511 2058
#>
#> Accuracy : 0.7866
#> 95% CI : (0.78, 0.793)
#> No Information Rate : 0.7509
#> P-Value [Acc > NIR] : < 0.00000000000000022
#>
#> Kappa : 0.4168
#>
#> Mcnemar's Test P-Value : 0.0000145
#>
#> Sensitivity : 0.5390
#> Specificity : 0.8687
#> Pos Pred Value : 0.5766
#> Neg Pred Value : 0.8503
#> Prevalence : 0.2491
#> Detection Rate : 0.1343
#> Detection Prevalence : 0.2329
#> Balanced Accuracy : 0.7039
#>
#> 'Positive' Class : 1
#>
From the result above, it can be concluded that the model is able to determine 2498 out of 2873 of “0” (Not looking for job change) cases correctly, and 482 out of 959 of “1” (Looking for job change) cases correctly. This means the ability of Naive Bayes algorithm to predict “0” cases is about 86.9%, but it then falls down to about 50.3% of the “1” cases resulting in an overall accuracy of about 77.8%
Below are highlights that can be concluded based on the comparison of the three Models:
Considering between these three models, Naive Bayes model is having an overall accuracy of approximately 77.56%, where the Random Forest model’s overall accuracy is about 76.77%, and the Decision Tree model is showing an overall accuracy of approximately at 77.8%.
While Decision Tree model is considered as a quite powerful classification model (ie. between predictors can be interrelated / dependent) and is interpretable (easy to interpret)but the model is also having its limitations, for example it is tend to be overfitting and also that a small change in the data can lead to a large change in the structure of the optimal decision tree.
ALthough the accuracy difference between the three models are not far from each other, but the most optimal accuracy from all of the three models here in this report is likely the Decision Three model; However all of those models have their own limitations and also its strength; thus it is up to the business users on how these models may be used later on.