Kami mendapat amanah dari sebuah klien kami perusahaan yang melayani telekomunikasi untuk menganalisa dari proses rekrutasi dan seleksi yang mereka telah lakukan. Mereka memiliki data hasil assessment sebagai berikut
## Loading required package: lattice
## Loading required package: ggplot2
## Type 'citation("pROC")' for a citation.
##
## Attaching package: 'pROC'
## The following objects are masked from 'package:stats':
##
## cov, smooth, var
REKOMENDASI_DIVISI | REKOMENDASI_PIC_FA | OVERSEAS | SURAT_PERINGATAN | PSIKOTEST | TEST_ONLINE | TEST_PRAKTEK | KEBUGARAN | REKOMENDASI_ATASAN | KINERJA | JENIS_KELAMIN | LEVEL_PENDIDIKAN | HASIL |
---|---|---|---|---|---|---|---|---|---|---|---|---|
NO | NO | NO | TIDAK ADA SP | CUKUP | 57 | 60 | Sangat Kurang | DIREKOMENDASIKAN | C3 | Laki-laki | SMK | TIDAK LULUS |
NO | NO | NO | TIDAK ADA SP | CUKUP | 67 | 59 | Sangat Kurang | DIREKOMENDASIKAN | C3 | Laki-laki | SMK | TIDAK LULUS |
NO | NO | NO | TIDAK ADA SP | BAIK | 90 | 82 | Sangat Kurang | DIREKOMENDASIKAN | C3 | Laki-laki | SMK | TIDAK LULUS |
NO | NO | NO | TIDAK ADA SP | CUKUP | 73 | 79 | Sangat Kurang | DIREKOMENDASIKAN | C4 | Laki-laki | SMK | TIDAK LULUS |
NO | NO | NO | TIDAK ADA SP | SANGAT BAIK | 67 | 85 | Sedang | DIREKOMENDASIKAN | C2 | Laki-laki | S1 | TIDAK LULUS |
NO | NO | NO | TIDAK ADA SP | BAIK | 83 | 74 | Sedang | DIREKOMENDASIKAN | C3 | Laki-laki | SMK | LULUS |
Adapun nama-nama Variabel sebagai prediktor adalah
colnames(full)
## [1] "REKOMENDASI_DIVISI" "REKOMENDASI_PIC_FA" "OVERSEAS"
## [4] "SURAT_PERINGATAN" "PSIKOTEST" "TEST_ONLINE"
## [7] "TEST_PRAKTEK" "KEBUGARAN" "REKOMENDASI_ATASAN"
## [10] "KINERJA" "JENIS_KELAMIN" "LEVEL_PENDIDIKAN"
## [13] "HASIL"
Jumlah Kasus dan Variabel
dim(full)
## [1] 553 13
Target Prediksi
summary(full$HASIL)
## LULUS TIDAK LULUS
## 273 280
N=Jumlah yang lulus dan tidak lulus relatif sama, 273 dan 280 responden sehingga dapat dikatakan sampelnya balance.
digunakan splitting 70% untuk data training dan 30% untuk data test (validasi)
## REKOMENDASI_DIVISI REKOMENDASI_PIC_FA OVERSEAS SURAT_PERINGATAN
## NO :497 NO :536 NO :538 ADA SP : 2
## YES: 56 YES: 17 YES: 15 TIDAK ADA SP:551
##
##
##
##
##
## PSIKOTEST TEST_ONLINE TEST_PRAKTEK
## BAIK : 81 Min. :13.0 Min. :19.00
## CUKUP :311 1st Qu.:60.0 1st Qu.:73.00
## KURANG :102 Median :70.0 Median :81.00
## SANGAT BAIK : 28 Mean :67.9 Mean :77.59
## SANGAT KURANG: 25 3rd Qu.:77.0 3rd Qu.:87.00
## TIDAK HADIR : 6 Max. :97.0 Max. :98.00
##
## KEBUGARAN REKOMENDASI_ATASAN KINERJA
## Baik : 40 DIREKOMENDASIKAN:533 C1\n: 36
## Baik Sekali : 8 Not Rec : 20 C2 :111
## Belum Test : 9 C3 :329
## Kurang :190 C4\n: 64
## Sangat Kurang :158 C5\n: 13
## Sedang :145
## baik sekali dan terlatih: 3
## JENIS_KELAMIN LEVEL_PENDIDIKAN HASIL
## Laki-laki:547 Diploma: 51 LULUS :273
## Perempuan: 6 S1 : 68 TIDAK LULUS:280
## S2 : 2
## SMK :432
##
##
##
## [1] 553 13
## LULUS TIDAK LULUS
## 273 280
Penyusunan model dengan Tree R part dan menghasilkan Akurasi sebesar :
## note: only 4 possible values of the max tree depth from the initial fit.
## Truncating the grid to 4 .
## Tree AUC 0.8678862
## [1] 86.78862
Penyusunan model dengan Random Forest dan menghasilkan Akurasi sebesar :
## REKOMENDASI_DIVISI REKOMENDASI_PIC_FA OVERSEAS
## 3.4221371 1.2374188 0.7798658
## SURAT_PERINGATAN PSIKOTEST TEST_ONLINE
## 0.8293573 40.7390202 30.3778808
## TEST_PRAKTEK KEBUGARAN REKOMENDASI_ATASAN
## 20.3500226 37.3713312 3.4438925
## KINERJA JENIS_KELAMIN LEVEL_PENDIDIKAN
## 4.8206446 0.2007534 3.7037181
## Random Forest AUC 0.9256678
## [1] 92.56678
Penyusunan model dengan Logistik Linier dan menghasilkan Akurasi sebesar :
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
## GLM AUC 0.914489
## [1] 91.4489
Penyusunan model dengan XGBooster dan menghasilkan Akurasi sebesar :
## [1] train-auc:0.950399+0.006178 test-auc:0.908051+0.055735
## Multiple eval metrics are present. Will use test_auc for early stopping.
## Will train until test_auc hasn't improved in 100 rounds.
##
## [2] train-auc:0.960447+0.006939 test-auc:0.916025+0.051212
## [3] train-auc:0.966129+0.004665 test-auc:0.918533+0.052387
## [4] train-auc:0.969846+0.003949 test-auc:0.925122+0.048459
## [5] train-auc:0.972052+0.003564 test-auc:0.927544+0.049063
## [6] train-auc:0.975013+0.002964 test-auc:0.931116+0.050727
## [7] train-auc:0.977497+0.002065 test-auc:0.933700+0.046238
## [8] train-auc:0.979194+0.002034 test-auc:0.930402+0.050142
## [9] train-auc:0.981060+0.002109 test-auc:0.934473+0.045584
## [10] train-auc:0.983599+0.001572 test-auc:0.926993+0.048164
## [11] train-auc:0.985051+0.001914 test-auc:0.927943+0.047518
## [12] train-auc:0.986341+0.002234 test-auc:0.927347+0.045882
## [13] train-auc:0.987249+0.002138 test-auc:0.930803+0.040130
## [14] train-auc:0.988457+0.001982 test-auc:0.929709+0.041078
## [15] train-auc:0.989368+0.002122 test-auc:0.930748+0.039837
## [16] train-auc:0.990189+0.002137 test-auc:0.928489+0.042699
## [17] train-auc:0.991193+0.002010 test-auc:0.929488+0.043918
## [18] train-auc:0.991835+0.002013 test-auc:0.930305+0.043508
## [19] train-auc:0.992414+0.001902 test-auc:0.930542+0.043066
## [20] train-auc:0.992881+0.001703 test-auc:0.930779+0.044836
## [21] train-auc:0.993342+0.001803 test-auc:0.930778+0.045275
## [22] train-auc:0.993801+0.001703 test-auc:0.931608+0.044929
## [23] train-auc:0.994417+0.001682 test-auc:0.930527+0.045735
## [24] train-auc:0.994853+0.001617 test-auc:0.931305+0.045193
## [25] train-auc:0.995248+0.001565 test-auc:0.929462+0.046837
## [26] train-auc:0.995684+0.001413 test-auc:0.930819+0.045371
## [27] train-auc:0.995918+0.001328 test-auc:0.929435+0.046683
## [28] train-auc:0.996302+0.001184 test-auc:0.929225+0.045763
## [29] train-auc:0.996599+0.001111 test-auc:0.928686+0.047463
## [30] train-auc:0.996866+0.001121 test-auc:0.929266+0.046610
## [31] train-auc:0.997108+0.001080 test-auc:0.930069+0.045925
## [32] train-auc:0.997267+0.001060 test-auc:0.929529+0.046033
## [33] train-auc:0.997501+0.000893 test-auc:0.929016+0.044278
## [34] train-auc:0.997820+0.000756 test-auc:0.928739+0.044334
## [35] train-auc:0.997958+0.000741 test-auc:0.925525+0.045795
## [36] train-auc:0.998166+0.000599 test-auc:0.927658+0.045082
## [37] train-auc:0.998311+0.000541 test-auc:0.926786+0.047466
## [38] train-auc:0.998536+0.000407 test-auc:0.927341+0.046248
## [39] train-auc:0.998588+0.000398 test-auc:0.925983+0.044747
## [40] train-auc:0.998717+0.000439 test-auc:0.927341+0.044765
## [41] train-auc:0.998740+0.000454 test-auc:0.927604+0.043866
## [42] train-auc:0.998845+0.000459 test-auc:0.927631+0.043177
## [43] train-auc:0.998970+0.000437 test-auc:0.927327+0.042952
## [44] train-auc:0.999022+0.000444 test-auc:0.926510+0.044632
## [45] train-auc:0.999150+0.000419 test-auc:0.925220+0.045219
## [46] train-auc:0.999205+0.000398 test-auc:0.927367+0.044721
## [47] train-auc:0.999271+0.000373 test-auc:0.926618+0.044553
## [48] train-auc:0.999287+0.000357 test-auc:0.926619+0.044311
## [49] train-auc:0.999337+0.000357 test-auc:0.926564+0.044363
## [50] train-auc:0.999416+0.000319 test-auc:0.926368+0.044600
## [51] train-auc:0.999443+0.000330 test-auc:0.926605+0.045023
## [52] train-auc:0.999492+0.000290 test-auc:0.927935+0.044797
## [53] train-auc:0.999509+0.000318 test-auc:0.927935+0.045335
## [54] train-auc:0.999565+0.000282 test-auc:0.927408+0.045752
## [55] train-auc:0.999591+0.000245 test-auc:0.927145+0.045401
## [56] train-auc:0.999624+0.000216 test-auc:0.925829+0.045474
## [57] train-auc:0.999627+0.000245 test-auc:0.925261+0.047222
## [58] train-auc:0.999693+0.000208 test-auc:0.925801+0.047280
## [59] train-auc:0.999749+0.000170 test-auc:0.925774+0.046641
## [60] train-auc:0.999792+0.000136 test-auc:0.925524+0.046570
## [61] train-auc:0.999825+0.000126 test-auc:0.925524+0.046406
## [62] train-auc:0.999861+0.000086 test-auc:0.924970+0.046072
## [63] train-auc:0.999868+0.000082 test-auc:0.924707+0.047317
## [64] train-auc:0.999875+0.000113 test-auc:0.924679+0.047776
## [65] train-auc:0.999885+0.000101 test-auc:0.924167+0.047112
## [66] train-auc:0.999895+0.000092 test-auc:0.924749+0.046186
## [67] train-auc:0.999898+0.000081 test-auc:0.925026+0.044769
## [68] train-auc:0.999904+0.000084 test-auc:0.924458+0.045415
## [69] train-auc:0.999901+0.000079 test-auc:0.924749+0.045749
## [70] train-auc:0.999904+0.000081 test-auc:0.924458+0.045399
## [71] train-auc:0.999937+0.000048 test-auc:0.924444+0.045447
## [72] train-auc:0.999944+0.000042 test-auc:0.924707+0.045068
## [73] train-auc:0.999950+0.000037 test-auc:0.924721+0.044181
## [74] train-auc:0.999951+0.000042 test-auc:0.924998+0.044279
## [75] train-auc:0.999964+0.000037 test-auc:0.924749+0.044677
## [76] train-auc:0.999967+0.000033 test-auc:0.924749+0.044630
## [77] train-auc:0.999970+0.000031 test-auc:0.924458+0.044862
## [78] train-auc:0.999977+0.000021 test-auc:0.923959+0.044900
## [79] train-auc:0.999977+0.000021 test-auc:0.923141+0.045385
## [80] train-auc:0.999977+0.000021 test-auc:0.922878+0.045662
## [81] train-auc:0.999977+0.000021 test-auc:0.922076+0.046992
## [82] train-auc:0.999980+0.000022 test-auc:0.922366+0.046846
## [83] train-auc:0.999980+0.000022 test-auc:0.922643+0.046845
## [84] train-auc:0.999984+0.000017 test-auc:0.921575+0.046623
## [85] train-auc:0.999984+0.000017 test-auc:0.921852+0.047144
## [86] train-auc:0.999984+0.000017 test-auc:0.923142+0.046664
## [87] train-auc:0.999987+0.000016 test-auc:0.922061+0.045668
## [88] train-auc:0.999990+0.000015 test-auc:0.922602+0.046082
## [89] train-auc:0.999993+0.000013 test-auc:0.922837+0.045117
## [90] train-auc:0.999993+0.000013 test-auc:0.922048+0.045724
## [91] train-auc:0.999993+0.000013 test-auc:0.921798+0.046218
## [92] train-auc:0.999997+0.000010 test-auc:0.922075+0.045404
## [93] train-auc:0.999997+0.000010 test-auc:0.922046+0.045102
## [94] train-auc:0.999997+0.000010 test-auc:0.922060+0.044762
## [95] train-auc:0.999993+0.000013 test-auc:0.921797+0.045832
## [96] train-auc:0.999993+0.000013 test-auc:0.920966+0.044633
## [97] train-auc:0.999990+0.000015 test-auc:0.920164+0.046826
## [98] train-auc:0.999993+0.000013 test-auc:0.920454+0.047247
## [99] train-auc:0.999993+0.000013 test-auc:0.920704+0.047214
## [100] train-auc:0.999993+0.000013 test-auc:0.920454+0.046811
## [101] train-auc:0.999993+0.000013 test-auc:0.920704+0.047142
## [102] train-auc:0.999997+0.000010 test-auc:0.920204+0.047130
## [103] train-auc:0.999997+0.000010 test-auc:0.920509+0.046022
## [104] train-auc:0.999997+0.000010 test-auc:0.921535+0.045978
## [105] train-auc:0.999997+0.000010 test-auc:0.920204+0.046934
## [106] train-auc:0.999997+0.000010 test-auc:0.920191+0.047256
## [107] train-auc:0.999997+0.000010 test-auc:0.920441+0.047366
## [108] train-auc:0.999997+0.000010 test-auc:0.920441+0.046837
## [109] train-auc:0.999997+0.000010 test-auc:0.920150+0.048457
## Stopping. Best iteration:
## [9] train-auc:0.981060+0.002109 test-auc:0.934473+0.045584
## [1] train-error:0.087855
## [2] train-error:0.085271
## [3] train-error:0.077519
## [4] train-error:0.074935
## [5] train-error:0.072351
## [6] train-error:0.072351
## [7] train-error:0.072351
## [8] train-error:0.069767
## [9] train-error:0.069767
## Tree AUC 0.8678862
## XGB AUC 0.9110046
## [1] 91.10046
Setelah dilakukan penyusunan model seperti diatas maka perbandingan ke empat model di atasa adalah sebagai berikut:
Menarik hasilnya ternyata 3 dari 4 algoritma menghasilkan akurasi diatas 90%, yang tertinggi XGBooster 92.57%, disusul dengan Random Forest 91.45%, dan Linier Logistik sebesar 91.1%, sedangkan tree-rpart berada pada 86.79% (bukan angka yang buruk untuk sebuah prediksi)
Untuk selanjutnya kita akan membedah lebih dalam lagi algoritma dari yang paling tinggi dalam hal ini XGBooster.
Algoritma memiliki kemampuan untuk menganalisa derajat kepentingan dari prediktor terhadap targetnya, dalam kasus ini kita mendapat gambaran sebagai berikut :
Dari bagan diatas kita memperoleh insight bahwa psikotest, menjadi prediktor yang teramat penting, kebugaran, lalu test online, dan test praktek, menarik kinerja selama magang ternyata bukan prediktor yang penting, begitu juga level pendidikan.
Dalam hal ini menarik dimana level pendidikan dan kinerja bukan menjadi prediktor yang baik untuk kasus ini.
Pada studi terdahulu algoritma machine learning diibaratkan sebuah “black Box” dimana kita tidak mengetahui proses perhitungan didalamnya dalam proses memprediksi.
Saat ini terdapat paket explainer yang dapat menjelaskan secara proses bagaimana terjadinya proses prediksi hingga menghasilkan suatu prediksi.
misalkan kita punya data baru yang terdapat dalam data test urutan ke 22,
##
## Creating the trees of the xgboost model...
## Getting the leaf nodes for the training set observations...
## Building the Explainer...
## STEP 1 of 2
##
## Recalculating the cover for each non-leaf...
##
|
| | 0%
|
| | 1%
|
|= | 1%
|
|= | 2%
|
|== | 3%
|
|=== | 4%
|
|=== | 5%
|
|==== | 5%
|
|==== | 6%
|
|==== | 7%
|
|===== | 7%
|
|===== | 8%
|
|====== | 9%
|
|======= | 10%
|
|======= | 11%
|
|======== | 12%
|
|======== | 13%
|
|========= | 14%
|
|========== | 15%
|
|========== | 16%
|
|=========== | 16%
|
|=========== | 17%
|
|=========== | 18%
|
|============ | 18%
|
|============ | 19%
|
|============= | 20%
|
|============== | 21%
|
|============== | 22%
|
|=============== | 23%
|
|=============== | 24%
|
|================ | 24%
|
|================ | 25%
|
|================= | 26%
|
|================== | 27%
|
|================== | 28%
|
|=================== | 29%
|
|=================== | 30%
|
|==================== | 30%
|
|==================== | 31%
|
|===================== | 32%
|
|====================== | 33%
|
|====================== | 34%
|
|======================= | 35%
|
|======================= | 36%
|
|======================== | 36%
|
|======================== | 37%
|
|========================= | 38%
|
|========================= | 39%
|
|========================== | 40%
|
|========================== | 41%
|
|=========================== | 41%
|
|=========================== | 42%
|
|============================ | 43%
|
|============================= | 44%
|
|============================= | 45%
|
|============================== | 46%
|
|============================== | 47%
|
|=============================== | 47%
|
|=============================== | 48%
|
|================================ | 49%
|
|================================ | 50%
|
|================================= | 51%
|
|================================== | 52%
|
|================================== | 53%
|
|=================================== | 53%
|
|=================================== | 54%
|
|==================================== | 55%
|
|==================================== | 56%
|
|===================================== | 57%
|
|====================================== | 58%
|
|====================================== | 59%
|
|======================================= | 59%
|
|======================================= | 60%
|
|======================================== | 61%
|
|======================================== | 62%
|
|========================================= | 63%
|
|========================================= | 64%
|
|========================================== | 64%
|
|========================================== | 65%
|
|=========================================== | 66%
|
|=========================================== | 67%
|
|============================================ | 68%
|
|============================================= | 69%
|
|============================================= | 70%
|
|============================================== | 70%
|
|============================================== | 71%
|
|=============================================== | 72%
|
|=============================================== | 73%
|
|================================================ | 74%
|
|================================================= | 75%
|
|================================================= | 76%
|
|================================================== | 76%
|
|================================================== | 77%
|
|=================================================== | 78%
|
|=================================================== | 79%
|
|==================================================== | 80%
|
|===================================================== | 81%
|
|===================================================== | 82%
|
|====================================================== | 82%
|
|====================================================== | 83%
|
|====================================================== | 84%
|
|======================================================= | 84%
|
|======================================================= | 85%
|
|======================================================== | 86%
|
|========================================================= | 87%
|
|========================================================= | 88%
|
|========================================================== | 89%
|
|========================================================== | 90%
|
|=========================================================== | 91%
|
|============================================================ | 92%
|
|============================================================ | 93%
|
|============================================================= | 93%
|
|============================================================= | 94%
|
|============================================================= | 95%
|
|============================================================== | 95%
|
|============================================================== | 96%
|
|=============================================================== | 97%
|
|================================================================ | 98%
|
|================================================================ | 99%
|
|=================================================================| 99%
|
|=================================================================| 100%
##
## Finding the stats for the xgboost trees...
##
|
| | 0%
|
|======= | 11%
|
|============== | 22%
|
|====================== | 33%
|
|============================= | 44%
|
|==================================== | 56%
|
|=========================================== | 67%
|
|=================================================== | 78%
|
|========================================================== | 89%
|
|=================================================================| 100%
##
## STEP 2 of 2
##
## Getting breakdown for each leaf of each tree...
##
|
| | 0%
|
|======= | 11%
|
|============== | 22%
|
|====================== | 33%
|
|============================= | 44%
|
|==================================== | 56%
|
|=========================================== | 67%
|
|=================================================== | 78%
|
|========================================================== | 89%
|
|=================================================================| 100%
##
## DONE!
##
##
## Extracting the breakdown of each prediction...
##
|
| | 0%
|
|======= | 11%
|
|============== | 22%
|
|====================== | 33%
|
|============================= | 44%
|
|==================================== | 56%
|
|=========================================== | 67%
|
|=================================================== | 78%
|
|========================================================== | 89%
|
|=================================================================| 100%
##
## DONE!
## Breakdown Complete
## 3.409838e-07
datanya adalah sebagai berikut konfigurasinya
## REKOMENDASI_DIVISI REKOMENDASI_PIC_FA OVERSEAS SURAT_PERINGATAN
## 1: NO NO 1 TIDAK ADA SP
## PSIKOTEST TEST_ONLINE TEST_PRAKTEK KEBUGARAN REKOMENDASI_ATASAN KINERJA
## 1: CUKUP 77 81 4 1 C4\n
## JENIS_KELAMIN LEVEL_PENDIDIKAN
## 1: 1 4
Plotnya adalah sebagai berikut :
##
##
## Extracting the breakdown of each prediction...
##
|
| | 0%
|
|======= | 11%
|
|============== | 22%
|
|====================== | 33%
|
|============================= | 44%
|
|==================================== | 56%
|
|=========================================== | 67%
|
|=================================================== | 78%
|
|========================================================== | 89%
|
|=================================================================| 100%
##
## DONE!
##
## Prediction: 0.08228986
## Weight: -2.411634
## Breakdown
## intercept KEBUGARAN PSIKOTEST TEST_ONLINE
## 0.03921876 -0.82748244 -0.78412593 -0.48354184
## TEST_PRAKTEK LEVEL_PENDIDIKAN
## -0.39117309 0.03547085
Dari plotnya tampak bahwa dari data testnya akan diprediksi menjadi TIDAK LULUS
Mari kita coba data yang kedua untuk data test urutan ke 22 :
## REKOMENDASI_DIVISI REKOMENDASI_PIC_FA OVERSEAS SURAT_PERINGATAN
## 1: NO NO 1 TIDAK ADA SP
## PSIKOTEST TEST_ONLINE TEST_PRAKTEK KEBUGARAN REKOMENDASI_ATASAN KINERJA
## 1: CUKUP 50 71 6 1 C4\n
## JENIS_KELAMIN LEVEL_PENDIDIKAN
## 1: 1 4
##
##
## Extracting the breakdown of each prediction...
##
|
| | 0%
|
|======= | 11%
|
|============== | 22%
|
|====================== | 33%
|
|============================= | 44%
|
|==================================== | 56%
|
|=========================================== | 67%
|
|=================================================== | 78%
|
|========================================================== | 89%
|
|=================================================================| 100%
##
## DONE!
##
## Prediction: 0.9127843
## Weight: 2.348115
## Breakdown
## intercept TEST_ONLINE PSIKOTEST LEVEL_PENDIDIKAN
## 0.03921876 2.74987283 -0.60364736 0.24339123
## KEBUGARAN TEST_PRAKTEK
## -0.06692265 -0.01379771
Dari plotnya tampak bahwa dari data testnya akan diprediksi LULUS.
Mudah-mudahan penjelasan ini dapat memberikan sedikit pencerahan bagaimana cara kerja algoritmik assessment.
Depok 2 Januari 2018