Klasifikasi dengan Machine Learning
Klik disini untuk ke halaman rpubs.
Studi Kasus
Salah satu instrumen investasi adalah Obligasi (surat utang). Salah satu obligasi adalah obligasi pemerintah (seperti surat utang negara, dll). Investor yang berinvestasi ke obligasi pemerintah ada yang berasal dari investor asing dan investor luar negeri. Fokus pada studi kasus kali ini adalah data investor asing ke obligasi pemerintah. Tersedia data dari sebuah perusahaan sekuritas (manajemen investasi) yang merupakan data bulanan dengan rincian sebagai berikut:
Y: inflow/outflow dana dari investor asing ke obligasi pemerintah. Inflow berarti dana asing masuk ke Indonesia. Outflow berarti investor asing menarik dananya ke luar.Data y positif berarti uang asing lebih banyak yang masuk ke obligasi pemerintah sedangkan data y yang negatif berarti uang asing banyak yang keluar.
X: ada sekitar 24 variabel, seperti tingkat suku bunga di Indonesia, Amerika, dll. Deskripsi mengenai variabel selengkapnya dapat dilihat pada pembahasan daskripsi data di bawah.
Apa yang memengaruhi dana investor asing masuk atau keluar?
Tahapan:
Lakukan eksplorasi data
Lakukan pemodelan dengan tree dan ensemble tree (minimal 1 model di ensemble tree) (lakukan pengecekan performa model dan optimasi hyperparameter)
Berikan interpretasi/insight
Packages
library(tidyverse)
library(readxl)
library(rpart)
library(rpart.plot)
library(mlr3verse)
library(mlr3extralearners)
library(ggpubr)
library(mlr3learners)
library(mlr3tuning)
library(yardstick)
library(data.table)
library(GGally)
library(ggplot2)
library(plotly)
library(rsample) # Initial Split
library(partykit)
library(caret) # Confussion Matrix
library(randomForest) # random forest
library(gridExtra)
library(grid)
library(knitr)
library(cowplot)
library(formattable)
library(iml)
library(rio)
library(tidyverse)
library(ggridges)
library(ggplot2)
library(ROSE)
library(rpart)
library(ROCit)
library(rpart.plot)
library(caret)
library(ipred)
library(xgboost)
library(Matrix)
library(magrittr)
library(ISLR)
library(caret) # cross-validation
library(gridExtra) # combining graphs
library(gam) # generalized additive models
library(tidyverse)
library(splines)
library(rsample)
library(ggplot2)
library(dplyr)
library(purrr)
library(splines)
library(dplyr)
library(knitr)
library(DT)
library(kableExtra)Data
Read Data
Berikut adalah proses untuk memasukkan data dari excel ke dalam objek R.
dataTelp <- read_excel("Foreign Fund Flows.xlsx")
dataTelp$cat.Y[dataTelp$Y>0] <- 1
dataTelp$cat.Y[dataTelp$Y<0] <- 0
dataTelp$cat.Y <- as.factor(dataTelp$cat.Y)
dataTelp <- dataTelp[,-1]
dataTelp$Y <-dataTelp$cat.Y
dataTelp <- dataTelp[,-25]
dataTelp# A tibble: 96 x 25
X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 3.71 9.72 6.01 2.6 3.71 4.90 2.33 1.3 0.25 6.5 6.25 5.14
2 3.67 9.82 6.15 2.1 3.8 4.45 2.13 1.48 0.25 6.5 6.25 4.55
3 3.72 9.33 5.61 2.3 3.43 4.48 2.24 1.6 0.25 6.5 6.25 5.12
4 3.83 8.80 4.97 2.2 3.91 3.26 2.4 1.29 0.25 6.5 6.25 4.54
5 3.40 8.95 5.54 2 4.17 3.37 1.99 1.32 0.25 6.5 6.25 4.08
6 3.19 8.51 5.32 1.1 5.06 1.36 1.82 1.15 0.25 6.5 6.25 2.29
7 2.98 8.22 5.23 1.2 6.23 0.203 1.8 1.14 0.25 6.5 6.25 1.22
8 2.69 8.04 5.35 1.1 6.45 -0.00259 1.52 0.95 0.25 6.5 6.25 0.9
9 2.66 7.97 5.31 1.1 5.81 0.602 1.78 0.75 0.25 6.5 6.25 1.54
10 2.51 7.28 4.76 1.2 5.67 0.295 2.13 0.5 0.25 6.5 6.25 1.78
# ... with 86 more rows, and 13 more variables: X13 <dbl>, X14 <dbl>,
# X15 <dbl>, X16 <dbl>, X17 <dbl>, X18 <dbl>, X19 <dbl>, X20 <dbl>,
# X21 <dbl>, X22 <dbl>, X23 <dbl>, X24 <dbl>, Y <fct>
Dilakukan terlebih dahulu pengkodean ulang terhadap variabel respon Y karena data yang akan digunakanan adalah variabel respon dengan kategori 1 (inflow atau uang asing yang masuk lebih banyak dibandingkan yang keluar atau yang nilai Y nya positif) dan kategori 0 (outflow atau uang asing yang keluar lebih banyak dibanding yang masuk atau yang nilai Y nya negatif). Karena variabel respon yang diperlukan berupa factor maka akan dilakukan perubahan tipe data pada variabel respon Y menjadi factor.
Deskripsi Data
Berikut adalah keterangan pada masing-masing variabel yang terdapat dalam dataset
| Variabel | Keterangan |
|---|---|
| X1 | Bunga yang dibayarkan oleh pemerintah Amerika jika meminjam dana dengan tenor 10 tahun |
| X2 | Bunga yang dibayarkan oleh pemerintah Indonesia jika meminjam dana dengan tenor 10 tahun |
| X3 | Selisih bunga antara UST10yr dan GIDN10yr. Spread ini menggambarkan premi risiko yang dihadapi oleh investor asing jika berinvestasi di obligasi pemerintah Indonesia. Risiko yang dihadapai asing ini antara lain (risiko kredit/jika pemerintah Indonesia default) dan risiko nilai tukar |
| X4 | Kenaikan harga barang dan jasa secara umum di Amerika dalam jangka waktu tertentu. Diukur dari perubahan Consumer Price Index data |
| X5 | Kenaikan harga barang dan jasa secara umum di Indonesia dalam jangka waktu tertentu. Diukur dari perubahan Consumer Price Index data |
| X6 | Real yield adalah selisih antara nominal yield dengan inflasi untuk periode yang sama. Real yield soread adalah selisih antara real yield Amerika dengan Indonesia untuk tenor 10yr. Bisa diartikan sebagai risk premium investasi di obligasi pemerintah Indonesia setelah mengurangkan faktor inflasinya. |
| X7 | Selisih antara Treasury Bond 10 tahun dan Treasury Inflation Protected Securities (TIPS), menggambarkan indikasi ekspektasi inflasi pasar selama periode 10 tahun ke depan |
| X8 | Instrumen investasi yang memberikan garansi return lebih tinggi dari inflasi jika dipegang sampai dengan jatuh tempo. |
| X9 | Suku bunga acuan bank sentral Amerika. Menjadi acuan untuk suku bunga lainnya |
| X10 | Suku bunga acuan Bank Indonesia. Menjadi acuan untuk suku bunga lainnya |
| X11 | Selisih bunga antara suku bunga acuan bank sentral Amerika dan Bank Indonesia |
| X12 | Selisih bunga antara suku bunga acuan bank sentral Amerika dan Bank Indonesia yang sudah menghilangkan faktor inflasi |
| X13 | Menggambarkan indikator utama terhadap volatilitas pasar saham dan sentimen investor. Indeks ini merupakan ukuran ekspektasi pasar terhadap volatilitas jangka pendek dari harga opsi indeks saham S&P 500 |
| X14 | Angka indeks yang merefleksikan sekaligus mengukur kekuatan mata uang US Dollar terhadap beebrapa mata uang utama dunia lainnya. Semakin tinggi nilai DXY index berarti USD seara umum menguat terhadap mata uang lainnya |
| X15 | Angka indeks membuat patokan untuk memantau pasar mata uang Asia yang aktif diperdagangkan secara agregat terhadap dolar Amerika Serikat, dihitung oleh JP Morgan |
| X16 | Neraca bank sentral Amerika. Jika neraca meningkat mengindikasikan bahwa bank sentral melakukan kebiijakan akomodatif (injeksi likuiditas ke sistem) |
| X17 | Menggambarkan kinerja obligasi rating BBB yang diterbitkan oleh perusahan-perusahaan swasta di Amerika. Jika risiko meningkat makan return obligasi tersebut akan negatif |
| X18 | Outstanding obligasi global yang memiliki yield negatif. Angka ini dihitung dari indeks obligasi yang disusun oleh JP Morgan |
| X19 | Mengukur volatilitas suku bunga AS yang dihitug berdasarkan volatilitas yield di pasar opsi obligasi AS, yang dihitung oleh BoA ML. MOVE index tinggi mengindikasikan volatilitas di pasar obligasi Amerika meningkat |
| X20 | FORWARD RATE AGREEMENT AND OVERNIGHT INDEX SWAP MARKET. Selisih anatara 3mo LIBOR rate dengan overnight index rate. Indikator yang mengukur ketahanan sistem perbankan di Amerika, jika FRA-OIS Spread meningkat berarti cost funding bank cenderung mahal |
| X21 | Selisih antara T-bill 3mo dengan tingkat pinjaman bank. Spread ini menggambarkan risiko kredit meningkat |
| X22 | Credit Default Swap Indonesia tenor 5 tahun, menggambarkan persepsi risiko kredit Indonesia di kaca mata investor asing. Semakin tinggi risiko default/gagal bayar maka premi CDS akan naik |
| X23 | Kontrak yang mengikat di pasar valuta asing yang mengunci nilai tukar Rupiah terhadap US Dollar untuk pembelian atau penjualan mata uang di masa 1 bulan yang akan mendatang. Jika risiko Rupiah melemah menigkat maka hedging costnya akan meningkat |
| X24 | Pergerakan nilai tukar Rupiah terhadap mata uang US Dollar |
| Y | Arus modal asing (investor asing) dimana kode 1: lebih banyak inflow dibandingkan outflow dan kode 0: lebih banyak outflow dibandingkan inflow. |
Eksplorasi Data
Summary Data
Untuk melihat ringkasan dari data maka digunakan fungsi summary
summary(dataTelp) X1 X2 X3 X4
Min. :1.485 Min. :5.211 Min. :3.300 Min. :-0.200
1st Qu.:1.952 1st Qu.:6.634 1st Qu.:4.418 1st Qu.: 1.100
Median :2.289 Median :7.629 Median :5.243 Median : 1.700
X5 X6 X7 X8
Min. :2.790 Min. :-1.8483 Min. :1.400 Min. :-0.7900
1st Qu.:3.808 1st Qu.: 0.0208 1st Qu.:1.788 1st Qu.: 0.1050
Median :4.530 Median : 1.9899 Median :2.050 Median : 0.4000
X9 X10 X11 X12
Min. :0.2500 Min. :4.250 Min. :2.750 Min. :-0.060
1st Qu.:0.2500 1st Qu.:5.750 1st Qu.:5.500 1st Qu.: 1.215
Median :0.2500 Median :6.500 Median :6.250 Median : 2.520
X13 X14 X15 X16
Min. :10.13 Min. : 74.24 Min. :103.4 Min. :2.247
1st Qu.:13.49 1st Qu.: 79.88 1st Qu.:108.1 1st Qu.:2.850
Median :15.77 Median : 82.39 Median :114.7 Median :4.067
X17 X18 X19 X20
Min. :1.280 Min. : 0.000000 Min. : 46.71 Min. :10.36
1st Qu.:1.772 1st Qu.: 0.004493 1st Qu.: 63.52 1st Qu.:14.69
Median :2.015 Median : 0.101453 Median : 74.41 Median :17.23
X21 X22 X23 X24 Y
Min. :0.120 Min. : 90.51 Min. :-17.320 Min. : 8529 0:24
1st Qu.:0.200 1st Qu.:141.57 1st Qu.: 3.572 1st Qu.: 9176 1:72
Median :0.245 Median :162.51 Median : 4.775 Median :11482
[ reached getOption("max.print") -- omitted 3 rows ]
Jika melihat dari summary data tersebut, terlihat bahwa respon Y dengan klasifikasi 0 (lebih banyak outflow dibandingkan inflow) cenderung lebih sedikit dari klasifikasi 1 (lebih banyak inflow dibandingkan outflow), yaitu sekitar 24% dari total data.
Plot pada Dataset
A <- ggplot(dataTelp, aes(x=X1, y= Y)) +
geom_boxplot()+
labs(x = "x1 UST 10yr Index",
y = "Klasifikasi")
B <- ggplot(dataTelp, aes(x=X2, y= Y)) +
geom_boxplot()+
labs(x = "x2 GIDN 10yr Index",
y = "Klasifikasi")
C <- ggplot(dataTelp, aes(x=X3, y= Y)) +
geom_boxplot()+
labs(x = "x3 Nominal yield Spread",
y = "Klasifikasi")
D <- ggplot(dataTelp, aes(x=X4, y= Y)) +
geom_boxplot()+
labs(x = "x4 Inflasi US",
y = "Klasifikasi")
E <- ggplot(dataTelp, aes(x=X5, y= Y)) +
geom_boxplot()+
labs(x = "x5 Inflasi Indo",
y = "Klasifikasi")
G <- ggplot(dataTelp, aes(x=X6, y= Y)) +
geom_boxplot()+
labs(x = "x6 Real yield spread)",
y = "Klasifikasi")
H <- ggplot(dataTelp, aes(x=X7, y= Y)) +
geom_boxplot()+
labs(x = "x7 US 10-Year Breakeven Inflation Rate",
y = "Klasifikasi")
I <- ggplot(dataTelp, aes(x=X8, y= Y)) +
geom_boxplot()+
labs(x = "x8 10-Year Treasury Inflation-Indexed Security",
y = "Klasifikasi")
J <- ggplot(dataTelp, aes(x=X9, y= Y)) +
geom_boxplot()+
labs(x = "x9 Fed Fund Rate",
y = "Klasifikasi")
K <- ggplot(dataTelp, aes(x=X10, y= Y)) +
geom_boxplot()+
labs(x = "x10 BI rate",
y = "Klasifikasi")
L <- ggplot(dataTelp, aes(x=X11, y= Y)) +
geom_boxplot()+
labs(x = "x11 Spread benchmark rate",
y = "Klasifikasi")
M <- ggplot(dataTelp, aes(x=X12, y= Y)) +
geom_boxplot()+
labs(x = "x12 Spread real benchmark rate",
y = "Klasifikasi")
N <- ggplot(dataTelp, aes(x=X13, y= Y)) +
geom_boxplot()+
labs(x = "x13 VIX",
y = "Klasifikasi")
O <- ggplot(dataTelp, aes(x=X14, y= Y)) +
geom_boxplot()+
labs(x = "x14 DXY",
y = "Klasifikasi")
P <- ggplot(dataTelp, aes(x=X15, y= Y)) +
geom_boxplot()+
labs(x = "x15 ADXY",
y = "Klasifikasi")
Q <- ggplot(dataTelp, aes(x=X16, y= Y)) +
geom_boxplot()+
labs(x = "x16 Fed balance sheet (eop)",
y = "Klasifikasi")
R <- ggplot(dataTelp, aes(x=X17, y= Y)) +
geom_boxplot()+
labs(x = "x17 ICE BofA BBB US Corporate Index Option-Adjusted Spread",
y = "Klasifikasi")
S <- ggplot(dataTelp, aes(x=X18, y= Y)) +
geom_boxplot()+
labs(x = "x18 Negative bond yield outstanding (USD tn)",
y = "Klasifikasi")
U <- ggplot(dataTelp, aes(x=X19, y= Y)) +
geom_boxplot()+
labs(x = "x19 MOVE Index",
y = "Klasifikasi")
V <- ggplot(dataTelp, aes(x=X20, y= Y)) +
geom_boxplot()+
labs(x = "x20 FRA-OIS Spread",
y = "Klasifikasi")
W <- ggplot(dataTelp, aes(x=X21, y= Y)) +
geom_boxplot()+
labs(x = "x21 TED spread (%)",
y = "Klasifikasi")
X <- ggplot(dataTelp, aes(x=X22, y= Y)) +
geom_boxplot()+
labs(x = "x22 INDO CDS 5-yr",
y = "Klasifikasi")
Y <- ggplot(dataTelp, aes(x=X23, y= Y)) +
geom_boxplot()+
labs(x = "x23 Hedging Forward 1month",
y = "Klasifikasi")
Z <- ggplot(dataTelp, aes(x=X24, y= Y)) +
geom_boxplot()+
labs(x = "x24 USD IDR Currncy",
y = "Klasifikasi")
ggarrange(B, C, G ,H,
ncol = 2, nrow = 2)ggarrange(K, M,O,Q,Z,
ncol = 3, nrow = 2)ggarrange(A, D, I , L,
ncol = 2, nrow = 2)ggarrange(E,P, U,
ncol = 2, nrow = 2)ggarrange(N, R ,V, W, X,Y,
ncol = 3, nrow = 3)ggarrange(J,S,
ncol = 2, nrow = 1)Hasil plot memperlihatkan hubungan antara masing-masing variabel x dengan variabel y.
Respon yang akan diukur berdasarkan data tersebut adalah berupa data kategorik untuk memprediksi apakah inflow lebih banyak dibanding outflow atau sebaliknya. Terdapat beberapa model pembelajaran mesin yang dapat digunakan. Dalam penelitian ini digunakan model KNN, pohon klasifikasi, Bagging, dan random forest.
Tahapan Pemodelan
Berikut adalah ilustrasi dari tahapan pemodelan yang dilakukan
Persiapan Data
Import Data ke Ekosistem mlr3
taskTelp = TaskClassif$new(id="telp",
backend = dataTelp,
target = "Y",
positive ="1")learner_knn <- lrn("classif.kknn",predict_type="prob",k=10,kernel="rectangular")
learner_tree <- lrn("classif.rpart", cp = 0.001, minsplit=12, predict_type="prob")
learner_bagging <- lrn(id="bagging clf", "classif.ranger", mtry=11, predict_type="prob", importance="impurity")
learner_rf <- lrn("classif.ranger",predict_type="prob",importance="impurity")Split Data
Split Data dapat dilakukan dengan beberapa cara. Cara yang dilakukan pada klasifikasi ini adalah holdout yang membagi data train dan data test pada rasio 0.8.
set.seed(123)
resampleTelp1 =rsmp("holdout", ratio = 0.8)
resampleTelp1$instantiate(task=taskTelp)KNN
K - Nearest Neighbor (KNN) merupakan salah satu metode klasifikasi yang menggunakan prinsip ketetanggaan dalam memprediksi kelas data baru (Siringoringo 2018). Kelas data baru ditentukan berdasarkan kelas tetangga terdekat yang paling banyak muncul (Karno 2016), namun apabila terdapat beberapa k tetangga terdekat yang memiliki frekuensi kemunculan yang sama, maka akan dipilih satu tetangga terdekat secara acak.
Parameter
learner_knn$param_set<ParamSet>
id class lower upper nlevels default value
1: k ParamInt 1 Inf Inf 7 10
2: distance ParamDbl 0 Inf Inf 2
3: kernel ParamFct NA NA 10 optimal rectangular
4: scale ParamLgl NA NA 2 TRUE
5: ykernel ParamUty NA NA Inf
ParamSet & Setting
learner_knn$param_set<ParamSet>
id class lower upper nlevels default value
1: k ParamInt 1 Inf Inf 7 10
2: distance ParamDbl 0 Inf Inf 2
3: kernel ParamFct NA NA 10 optimal rectangular
4: scale ParamLgl NA NA 2 TRUE
5: ykernel ParamUty NA NA Inf
param_bound <- ParamSet$new(params = list(ParamInt$new("k", lower = 2, upper = 30)))
terminate = trm("evals", n_evals = 30)
tuning_setting <- TuningInstanceSingleCrit$new(task = taskTelp,
learner = learner_knn,
resampling = resampleTelp1,
measure = msr("classif.acc"),
search_space = param_bound,
terminator = terminate
)Triggering Tuning
tuner <- tnr("grid_search", resolution = 30)
tuner$optimize(tuning_setting)INFO [14:27:04.425] [bbotk] Starting to optimize 1 parameter(s) with '<OptimizerGridSearch>' and '<TerminatorEvals> [n_evals=30]'
INFO [14:27:04.487] [bbotk] Evaluating 1 configuration(s)
INFO [14:27:04.540] [mlr3] Running benchmark with 1 resampling iterations
INFO [14:27:04.618] [mlr3] Applying learner 'classif.kknn' on task 'telp' (iter 1/1)
INFO [14:27:04.725] [mlr3] Finished benchmark
INFO [14:27:04.796] [bbotk] Result of batch 1:
INFO [14:27:04.799] [bbotk] k classif.acc uhash
INFO [14:27:04.799] [bbotk] 21 0.7894737 c8e00c79-df61-4716-bb08-6dbf013352fc
INFO [14:27:04.801] [bbotk] Evaluating 1 configuration(s)
INFO [14:27:04.831] [mlr3] Running benchmark with 1 resampling iterations
INFO [14:27:04.842] [mlr3] Applying learner 'classif.kknn' on task 'telp' (iter 1/1)
INFO [14:27:04.875] [mlr3] Finished benchmark
INFO [14:27:04.953] [bbotk] Result of batch 2:
INFO [14:27:04.956] [bbotk] k classif.acc uhash
INFO [14:27:04.956] [bbotk] 15 0.7368421 b536a22a-a18d-429d-8ed7-3029b498d9b7
INFO [14:27:04.958] [bbotk] Evaluating 1 configuration(s)
INFO [14:27:04.986] [mlr3] Running benchmark with 1 resampling iterations
INFO [14:27:04.997] [mlr3] Applying learner 'classif.kknn' on task 'telp' (iter 1/1)
INFO [14:27:05.030] [mlr3] Finished benchmark
INFO [14:27:05.106] [bbotk] Result of batch 3:
INFO [14:27:05.108] [bbotk] k classif.acc uhash
INFO [14:27:05.108] [bbotk] 4 0.6315789 77efe540-9d86-4529-9f96-59f2e493aea8
INFO [14:27:05.110] [bbotk] Evaluating 1 configuration(s)
INFO [14:27:05.139] [mlr3] Running benchmark with 1 resampling iterations
INFO [14:27:05.153] [mlr3] Applying learner 'classif.kknn' on task 'telp' (iter 1/1)
INFO [14:27:05.201] [mlr3] Finished benchmark
INFO [14:27:05.256] [bbotk] Result of batch 4:
INFO [14:27:05.258] [bbotk] k classif.acc uhash
INFO [14:27:05.258] [bbotk] 20 0.7894737 b2730125-fef3-42a7-91b2-800c68fdf9f5
INFO [14:27:05.260] [bbotk] Evaluating 1 configuration(s)
INFO [14:27:05.284] [mlr3] Running benchmark with 1 resampling iterations
INFO [14:27:05.293] [mlr3] Applying learner 'classif.kknn' on task 'telp' (iter 1/1)
INFO [14:27:05.320] [mlr3] Finished benchmark
INFO [14:27:05.380] [bbotk] Result of batch 5:
INFO [14:27:05.382] [bbotk] k classif.acc uhash
INFO [14:27:05.382] [bbotk] 10 0.7368421 e6dee8d6-41cd-45d3-9264-9f35a3c1dfef
INFO [14:27:05.384] [bbotk] Evaluating 1 configuration(s)
INFO [14:27:05.406] [mlr3] Running benchmark with 1 resampling iterations
INFO [14:27:05.417] [mlr3] Applying learner 'classif.kknn' on task 'telp' (iter 1/1)
INFO [14:27:05.444] [mlr3] Finished benchmark
INFO [14:27:05.504] [bbotk] Result of batch 6:
INFO [14:27:05.507] [bbotk] k classif.acc uhash
INFO [14:27:05.507] [bbotk] 18 0.7894737 2d5b74a0-d1bc-4f18-a4b3-134731bca721
INFO [14:27:05.511] [bbotk] Evaluating 1 configuration(s)
INFO [14:27:05.537] [mlr3] Running benchmark with 1 resampling iterations
INFO [14:27:05.547] [mlr3] Applying learner 'classif.kknn' on task 'telp' (iter 1/1)
INFO [14:27:05.576] [mlr3] Finished benchmark
INFO [14:27:05.639] [bbotk] Result of batch 7:
INFO [14:27:05.641] [bbotk] k classif.acc uhash
INFO [14:27:05.641] [bbotk] 6 0.5789474 558c1f2c-07b3-4689-b344-bc79e5cd8e1e
INFO [14:27:05.643] [bbotk] Evaluating 1 configuration(s)
INFO [14:27:05.670] [mlr3] Running benchmark with 1 resampling iterations
INFO [14:27:05.679] [mlr3] Applying learner 'classif.kknn' on task 'telp' (iter 1/1)
INFO [14:27:05.709] [mlr3] Finished benchmark
INFO [14:27:05.777] [bbotk] Result of batch 8:
INFO [14:27:05.779] [bbotk] k classif.acc uhash
INFO [14:27:05.779] [bbotk] 11 0.7368421 c853d328-90eb-43fa-b484-ef9670b7584f
INFO [14:27:05.781] [bbotk] Evaluating 1 configuration(s)
INFO [14:27:05.806] [mlr3] Running benchmark with 1 resampling iterations
INFO [14:27:05.816] [mlr3] Applying learner 'classif.kknn' on task 'telp' (iter 1/1)
INFO [14:27:05.843] [mlr3] Finished benchmark
INFO [14:27:05.901] [bbotk] Result of batch 9:
INFO [14:27:05.903] [bbotk] k classif.acc uhash
INFO [14:27:05.903] [bbotk] 24 0.7894737 42ba207b-5168-4057-b830-ba20e0082639
INFO [14:27:05.907] [bbotk] Evaluating 1 configuration(s)
INFO [14:27:05.932] [mlr3] Running benchmark with 1 resampling iterations
INFO [14:27:05.942] [mlr3] Applying learner 'classif.kknn' on task 'telp' (iter 1/1)
INFO [14:27:05.972] [mlr3] Finished benchmark
INFO [14:27:06.032] [bbotk] Result of batch 10:
INFO [14:27:06.033] [bbotk] k classif.acc uhash
INFO [14:27:06.033] [bbotk] 12 0.7368421 444436da-f6f3-426e-8685-56de792e79c9
INFO [14:27:06.036] [bbotk] Evaluating 1 configuration(s)
INFO [14:27:06.058] [mlr3] Running benchmark with 1 resampling iterations
INFO [14:27:06.069] [mlr3] Applying learner 'classif.kknn' on task 'telp' (iter 1/1)
INFO [14:27:06.100] [mlr3] Finished benchmark
INFO [14:27:06.169] [bbotk] Result of batch 11:
INFO [14:27:06.171] [bbotk] k classif.acc uhash
INFO [14:27:06.171] [bbotk] 28 0.7894737 7f240af5-9f51-4162-a9b6-2f84bff57c71
INFO [14:27:06.173] [bbotk] Evaluating 1 configuration(s)
INFO [14:27:06.197] [mlr3] Running benchmark with 1 resampling iterations
INFO [14:27:06.206] [mlr3] Applying learner 'classif.kknn' on task 'telp' (iter 1/1)
INFO [14:27:06.234] [mlr3] Finished benchmark
INFO [14:27:06.302] [bbotk] Result of batch 12:
INFO [14:27:06.303] [bbotk] k classif.acc uhash
INFO [14:27:06.303] [bbotk] 14 0.7368421 1ac6a640-054b-454d-a259-5f7c391bf5f1
INFO [14:27:06.305] [bbotk] Evaluating 1 configuration(s)
INFO [14:27:06.328] [mlr3] Running benchmark with 1 resampling iterations
INFO [14:27:06.336] [mlr3] Applying learner 'classif.kknn' on task 'telp' (iter 1/1)
INFO [14:27:06.364] [mlr3] Finished benchmark
INFO [14:27:06.423] [bbotk] Result of batch 13:
INFO [14:27:06.424] [bbotk] k classif.acc uhash
INFO [14:27:06.424] [bbotk] 22 0.7894737 a196874e-f62e-44d7-96bc-53d75cf3ffbe
INFO [14:27:06.426] [bbotk] Evaluating 1 configuration(s)
INFO [14:27:06.450] [mlr3] Running benchmark with 1 resampling iterations
INFO [14:27:06.458] [mlr3] Applying learner 'classif.kknn' on task 'telp' (iter 1/1)
INFO [14:27:06.489] [mlr3] Finished benchmark
INFO [14:27:06.558] [bbotk] Result of batch 14:
INFO [14:27:06.560] [bbotk] k classif.acc uhash
INFO [14:27:06.560] [bbotk] 3 0.6315789 cc3e3922-a788-4628-a8bd-ae64e82c3c8a
INFO [14:27:06.561] [bbotk] Evaluating 1 configuration(s)
INFO [14:27:06.585] [mlr3] Running benchmark with 1 resampling iterations
INFO [14:27:06.597] [mlr3] Applying learner 'classif.kknn' on task 'telp' (iter 1/1)
INFO [14:27:06.627] [mlr3] Finished benchmark
INFO [14:27:06.690] [bbotk] Result of batch 15:
INFO [14:27:06.691] [bbotk] k classif.acc uhash
INFO [14:27:06.691] [bbotk] 9 0.6842105 44d087d8-f197-47b7-a60d-8ff510da4be7
INFO [14:27:06.694] [bbotk] Evaluating 1 configuration(s)
INFO [14:27:06.718] [mlr3] Running benchmark with 1 resampling iterations
INFO [14:27:06.727] [mlr3] Applying learner 'classif.kknn' on task 'telp' (iter 1/1)
INFO [14:27:06.756] [mlr3] Finished benchmark
INFO [14:27:06.825] [bbotk] Result of batch 16:
INFO [14:27:06.826] [bbotk] k classif.acc uhash
INFO [14:27:06.826] [bbotk] 19 0.7894737 f33cb9c0-c75b-412a-a2a7-0be416eb210d
INFO [14:27:06.828] [bbotk] Evaluating 1 configuration(s)
INFO [14:27:06.852] [mlr3] Running benchmark with 1 resampling iterations
INFO [14:27:06.861] [mlr3] Applying learner 'classif.kknn' on task 'telp' (iter 1/1)
INFO [14:27:06.887] [mlr3] Finished benchmark
INFO [14:27:06.953] [bbotk] Result of batch 17:
INFO [14:27:06.955] [bbotk] k classif.acc uhash
INFO [14:27:06.955] [bbotk] 17 0.7368421 14b07d70-b12f-4177-a5c8-1e19dd97614a
INFO [14:27:06.957] [bbotk] Evaluating 1 configuration(s)
INFO [14:27:06.981] [mlr3] Running benchmark with 1 resampling iterations
INFO [14:27:06.989] [mlr3] Applying learner 'classif.kknn' on task 'telp' (iter 1/1)
INFO [14:27:07.019] [mlr3] Finished benchmark
INFO [14:27:07.078] [bbotk] Result of batch 18:
INFO [14:27:07.080] [bbotk] k classif.acc uhash
INFO [14:27:07.080] [bbotk] 13 0.7368421 f29735ae-5ba5-4270-b6e4-42764deb3b1e
INFO [14:27:07.082] [bbotk] Evaluating 1 configuration(s)
INFO [14:27:07.105] [mlr3] Running benchmark with 1 resampling iterations
INFO [14:27:07.115] [mlr3] Applying learner 'classif.kknn' on task 'telp' (iter 1/1)
INFO [14:27:07.148] [mlr3] Finished benchmark
INFO [14:27:07.212] [bbotk] Result of batch 19:
INFO [14:27:07.215] [bbotk] k classif.acc uhash
INFO [14:27:07.215] [bbotk] 25 0.7894737 6f87e8e4-2db3-47c2-9392-0b03a0d5dbe8
INFO [14:27:07.219] [bbotk] Evaluating 1 configuration(s)
INFO [14:27:07.241] [mlr3] Running benchmark with 1 resampling iterations
INFO [14:27:07.250] [mlr3] Applying learner 'classif.kknn' on task 'telp' (iter 1/1)
INFO [14:27:07.287] [mlr3] Finished benchmark
INFO [14:27:07.344] [bbotk] Result of batch 20:
INFO [14:27:07.346] [bbotk] k classif.acc uhash
INFO [14:27:07.346] [bbotk] 29 0.7894737 97d21aa2-bdf9-4050-a618-b092dda84a65
INFO [14:27:07.349] [bbotk] Evaluating 1 configuration(s)
INFO [14:27:07.372] [mlr3] Running benchmark with 1 resampling iterations
INFO [14:27:07.380] [mlr3] Applying learner 'classif.kknn' on task 'telp' (iter 1/1)
INFO [14:27:07.407] [mlr3] Finished benchmark
INFO [14:27:07.466] [bbotk] Result of batch 21:
INFO [14:27:07.468] [bbotk] k classif.acc uhash
INFO [14:27:07.468] [bbotk] 30 0.7894737 36a99c9a-5a81-46a3-b205-61af8801630a
INFO [14:27:07.470] [bbotk] Evaluating 1 configuration(s)
INFO [14:27:07.493] [mlr3] Running benchmark with 1 resampling iterations
INFO [14:27:07.502] [mlr3] Applying learner 'classif.kknn' on task 'telp' (iter 1/1)
INFO [14:27:07.538] [mlr3] Finished benchmark
INFO [14:27:07.600] [bbotk] Result of batch 22:
INFO [14:27:07.604] [bbotk] k classif.acc uhash
INFO [14:27:07.604] [bbotk] 8 0.6842105 4adc98cf-4b31-4188-a566-dbcd75414c27
INFO [14:27:07.606] [bbotk] Evaluating 1 configuration(s)
INFO [14:27:07.637] [mlr3] Running benchmark with 1 resampling iterations
INFO [14:27:07.648] [mlr3] Applying learner 'classif.kknn' on task 'telp' (iter 1/1)
INFO [14:27:07.676] [mlr3] Finished benchmark
INFO [14:27:07.738] [bbotk] Result of batch 23:
INFO [14:27:07.740] [bbotk] k classif.acc uhash
INFO [14:27:07.740] [bbotk] 23 0.7894737 9629e5b6-31cb-40a9-8b00-eaff4d6fb355
INFO [14:27:07.742] [bbotk] Evaluating 1 configuration(s)
INFO [14:27:07.765] [mlr3] Running benchmark with 1 resampling iterations
INFO [14:27:07.773] [mlr3] Applying learner 'classif.kknn' on task 'telp' (iter 1/1)
INFO [14:27:07.811] [mlr3] Finished benchmark
INFO [14:27:07.870] [bbotk] Result of batch 24:
INFO [14:27:07.872] [bbotk] k classif.acc uhash
INFO [14:27:07.872] [bbotk] 27 0.7894737 2b67c02e-24f7-4867-b070-ab6717c1ca51
INFO [14:27:07.874] [bbotk] Evaluating 1 configuration(s)
INFO [14:27:07.895] [mlr3] Running benchmark with 1 resampling iterations
INFO [14:27:07.909] [mlr3] Applying learner 'classif.kknn' on task 'telp' (iter 1/1)
INFO [14:27:07.942] [mlr3] Finished benchmark
INFO [14:27:08.001] [bbotk] Result of batch 25:
INFO [14:27:08.002] [bbotk] k classif.acc uhash
INFO [14:27:08.002] [bbotk] 2 0.6315789 9f47f46d-a257-449b-a290-53284ae83bfb
INFO [14:27:08.004] [bbotk] Evaluating 1 configuration(s)
INFO [14:27:08.027] [mlr3] Running benchmark with 1 resampling iterations
INFO [14:27:08.036] [mlr3] Applying learner 'classif.kknn' on task 'telp' (iter 1/1)
INFO [14:27:08.067] [mlr3] Finished benchmark
INFO [14:27:08.132] [bbotk] Result of batch 26:
INFO [14:27:08.135] [bbotk] k classif.acc uhash
INFO [14:27:08.135] [bbotk] 7 0.5789474 2826d83b-5265-46db-b032-4fa5115940c7
INFO [14:27:08.138] [bbotk] Evaluating 1 configuration(s)
INFO [14:27:08.164] [mlr3] Running benchmark with 1 resampling iterations
INFO [14:27:08.173] [mlr3] Applying learner 'classif.kknn' on task 'telp' (iter 1/1)
INFO [14:27:08.203] [mlr3] Finished benchmark
INFO [14:27:08.272] [bbotk] Result of batch 27:
INFO [14:27:08.274] [bbotk] k classif.acc uhash
INFO [14:27:08.274] [bbotk] 5 0.6315789 8c5a74f7-b490-474a-badc-1df62a09c42c
INFO [14:27:08.276] [bbotk] Evaluating 1 configuration(s)
INFO [14:27:08.297] [mlr3] Running benchmark with 1 resampling iterations
INFO [14:27:08.306] [mlr3] Applying learner 'classif.kknn' on task 'telp' (iter 1/1)
INFO [14:27:08.336] [mlr3] Finished benchmark
INFO [14:27:08.394] [bbotk] Result of batch 28:
INFO [14:27:08.397] [bbotk] k classif.acc uhash
INFO [14:27:08.397] [bbotk] 26 0.7894737 9068d7b0-ea58-4d43-a65e-ef60edfc5d61
INFO [14:27:08.405] [bbotk] Finished optimizing after 28 evaluation(s)
INFO [14:27:08.406] [bbotk] Result:
INFO [14:27:08.408] [bbotk] k learner_param_vals x_domain classif.acc
INFO [14:27:08.408] [bbotk] 21 <list[2]> <list[1]> 0.7894737
k learner_param_vals x_domain classif.acc
1: 21 <list[2]> <list[1]> 0.7894737
Tuning Result
tuning_setting$result k learner_param_vals x_domain classif.acc
1: 21 <list[2]> <list[1]> 0.7894737
as.data.table(tuning_setting$archive) k classif.acc uhash timestamp
1: 21 0.7894737 c8e00c79-df61-4716-bb08-6dbf013352fc 2021-06-10 14:27:04
2: 15 0.7368421 b536a22a-a18d-429d-8ed7-3029b498d9b7 2021-06-10 14:27:04
3: 4 0.6315789 77efe540-9d86-4529-9f96-59f2e493aea8 2021-06-10 14:27:05
4: 20 0.7894737 b2730125-fef3-42a7-91b2-800c68fdf9f5 2021-06-10 14:27:05
5: 10 0.7368421 e6dee8d6-41cd-45d3-9264-9f35a3c1dfef 2021-06-10 14:27:05
6: 18 0.7894737 2d5b74a0-d1bc-4f18-a4b3-134731bca721 2021-06-10 14:27:05
7: 6 0.5789474 558c1f2c-07b3-4689-b344-bc79e5cd8e1e 2021-06-10 14:27:05
8: 11 0.7368421 c853d328-90eb-43fa-b484-ef9670b7584f 2021-06-10 14:27:05
9: 24 0.7894737 42ba207b-5168-4057-b830-ba20e0082639 2021-06-10 14:27:05
10: 12 0.7368421 444436da-f6f3-426e-8685-56de792e79c9 2021-06-10 14:27:06
11: 28 0.7894737 7f240af5-9f51-4162-a9b6-2f84bff57c71 2021-06-10 14:27:06
12: 14 0.7368421 1ac6a640-054b-454d-a259-5f7c391bf5f1 2021-06-10 14:27:06
batch_nr x_domain_k
1: 1 21
2: 2 15
3: 3 4
4: 4 20
5: 5 10
6: 6 18
7: 7 6
8: 8 11
9: 9 24
10: 10 12
11: 11 28
12: 12 14
[ reached getOption("max.print") -- omitted 17 rows ]
tuning_result<-as.data.table(tuning_setting$archive)%>% select(k,classif.acc)
kbl(tuning_result, caption = "")%>% kable_styling()| k | classif.acc |
|---|---|
| 21 | 0.7894737 |
| 15 | 0.7368421 |
| 4 | 0.6315789 |
| 20 | 0.7894737 |
| 10 | 0.7368421 |
| 18 | 0.7894737 |
| 6 | 0.5789474 |
| 11 | 0.7368421 |
| 24 | 0.7894737 |
| 12 | 0.7368421 |
| 28 | 0.7894737 |
| 14 | 0.7368421 |
| 22 | 0.7894737 |
| 3 | 0.6315789 |
| 9 | 0.6842105 |
| 19 | 0.7894737 |
| 17 | 0.7368421 |
| 13 | 0.7368421 |
| 25 | 0.7894737 |
| 29 | 0.7894737 |
| 30 | 0.7894737 |
| 8 | 0.6842105 |
| 23 | 0.7894737 |
| 27 | 0.7894737 |
| 2 | 0.6315789 |
| 7 | 0.5789474 |
| 5 | 0.6315789 |
| 26 | 0.7894737 |
(p <- ggplot(tuning_result,aes(x=k,y=classif.acc))+
geom_line()+geom_point(size=4,color="blue"))Model KNN
set.seed(123)
train_test_telp_knn = resample(task = taskTelp,
learner = learner_knn,
resampling = resampleTelp1,
store_models = TRUE)INFO [14:27:09.134] [mlr3] Applying learner 'classif.kknn' on task 'telp' (iter 1/1)
Prediksi
prediksi_test = as.data.table(train_test_telp_knn$prediction())
head(prediksi_test) row_ids truth response prob.1 prob.0
1: 1 1 1 0.9 0.1
2: 2 1 1 0.8 0.2
3: 10 1 1 0.9 0.1
4: 11 0 1 0.9 0.1
5: 24 1 0 0.2 0.8
6: 28 1 0 0.5 0.5
Confusion Matrix
train_test_telp_knn$prediction()$confusion truth
response 1 0
1 13 4
0 2 0
accknn <- train_test_telp_knn$aggregate(list(msr("classif.acc"),
msr("classif.specificity"),
msr("classif.sensitivity")))
accknn classif.acc classif.specificity classif.sensitivity
0.6842105 0.0000000 0.8666667
Akurasi yang dihasilkan oleh KNN juga cukup baik yaitu 70 persen. Namun, specificity dari model ini 0%.
Plot ROC
autoplot(train_test_telp_knn, type = "roc")Pohon Klasifikasi
Classification and Regression Tree (CART) adalah metode klasifikasi dengan pendekatan statistik nonparametrik (Breiman 2001). CART menghasilkan pohon klasifikasi jika peubah responnya bersifat kategorik dan pohon regresi jika peubah responnya bersifat numerik. Pada penelitian ini menghasilkan pohon klasifikasi karena peubah yang digunakan dalam penelitian bersifat kategorik. Pohon klasifikasi digunakan untuk mengidentifikasi peubah apa yang dapat dijadikan sebagai pembeda antar kelas/kategori, dan memprediksi keanggotaan kelas dari suatu individu amatan berdasarkan karakteristiknya. Pembentukan pohon klasifikasi dimulai dengan melakukan penyekatan simpul utama yang memiliki beberapa kelas data menjadi simpul anak. Penyekatan ini dilakukan oleh peubah prediktor yang terpilih menjadi pemilah terbaik. Pemilah terbaik merupakan peubah prediktor yang memiliki pemilahan dengan memaksimalkan nilai penurunan keheterogenan di dalam masing-masing simpul anak relatif terhadap simpul utama. Penyekatan ini dilakukan secara berulang hingga simpul anak berisi amatan yang berasal dari satu kategori, simpul anak berisi amatan yang seluruh peubah prediktornya identik, dan kedalaman pohon sudah mencapai kedalaman maksimal (Breiman et al. 1993).
ParamSet & Setting
param_bound <- ParamSet$new(params = list(ParamDbl$new("cp", lower =0.001, upper = 0.01), ParamInt$new("minsplit", lower =5, upper = 30)))
terminate = trm("evals", n_evals = 30)
tuning_setting <- TuningInstanceSingleCrit$new(task = taskTelp,
learner = learner_tree,
resampling = resampleTelp1,
measure = msr("classif.acc"),
search_space = param_bound,
terminator = terminate
)Trigger Tuning
tuner <- tnr("grid_search", resolution = 30)
tuner$optimize(tuning_setting)INFO [14:27:10.127] [bbotk] Starting to optimize 2 parameter(s) with '<OptimizerGridSearch>' and '<TerminatorEvals> [n_evals=30]'
INFO [14:27:10.133] [bbotk] Evaluating 1 configuration(s)
INFO [14:27:10.183] [mlr3] Running benchmark with 1 resampling iterations
INFO [14:27:10.193] [mlr3] Applying learner 'classif.rpart' on task 'telp' (iter 1/1)
INFO [14:27:10.225] [mlr3] Finished benchmark
INFO [14:27:10.294] [bbotk] Result of batch 1:
INFO [14:27:10.296] [bbotk] cp minsplit classif.acc uhash
INFO [14:27:10.296] [bbotk] 0.009689655 29 0.7894737 0719907f-1aa1-46cd-bc08-32c1e015c507
INFO [14:27:10.300] [bbotk] Evaluating 1 configuration(s)
INFO [14:27:10.327] [mlr3] Running benchmark with 1 resampling iterations
INFO [14:27:10.337] [mlr3] Applying learner 'classif.rpart' on task 'telp' (iter 1/1)
INFO [14:27:10.356] [mlr3] Finished benchmark
INFO [14:27:10.418] [bbotk] Result of batch 2:
INFO [14:27:10.420] [bbotk] cp minsplit classif.acc uhash
INFO [14:27:10.420] [bbotk] 0.007827586 22 0.7894737 b868b093-99a3-4203-b259-d6a154a0a855
INFO [14:27:10.422] [bbotk] Evaluating 1 configuration(s)
INFO [14:27:10.448] [mlr3] Running benchmark with 1 resampling iterations
INFO [14:27:10.460] [mlr3] Applying learner 'classif.rpart' on task 'telp' (iter 1/1)
INFO [14:27:10.478] [mlr3] Finished benchmark
INFO [14:27:10.551] [bbotk] Result of batch 3:
INFO [14:27:10.553] [bbotk] cp minsplit classif.acc uhash
INFO [14:27:10.553] [bbotk] 0.009689655 16 0.7368421 bcd9f59e-0d6a-4053-af3d-4ccb97c3dfa6
INFO [14:27:10.554] [bbotk] Evaluating 1 configuration(s)
INFO [14:27:10.580] [mlr3] Running benchmark with 1 resampling iterations
INFO [14:27:10.589] [mlr3] Applying learner 'classif.rpart' on task 'telp' (iter 1/1)
INFO [14:27:10.609] [mlr3] Finished benchmark
INFO [14:27:10.682] [bbotk] Result of batch 4:
INFO [14:27:10.685] [bbotk] cp minsplit classif.acc uhash
INFO [14:27:10.685] [bbotk] 0.007517241 8 0.5789474 a46dcccf-ba34-45a9-96f3-745f4f2a77a8
INFO [14:27:10.688] [bbotk] Evaluating 1 configuration(s)
INFO [14:27:10.714] [mlr3] Running benchmark with 1 resampling iterations
INFO [14:27:10.726] [mlr3] Applying learner 'classif.rpart' on task 'telp' (iter 1/1)
INFO [14:27:10.745] [mlr3] Finished benchmark
INFO [14:27:10.809] [bbotk] Result of batch 5:
INFO [14:27:10.811] [bbotk] cp minsplit classif.acc uhash
INFO [14:27:10.811] [bbotk] 0.002241379 9 0.5789474 7fe67d52-1b0c-4abb-88dc-d3d647baaca0
INFO [14:27:10.812] [bbotk] Evaluating 1 configuration(s)
INFO [14:27:10.848] [mlr3] Running benchmark with 1 resampling iterations
INFO [14:27:10.856] [mlr3] Applying learner 'classif.rpart' on task 'telp' (iter 1/1)
INFO [14:27:10.875] [mlr3] Finished benchmark
INFO [14:27:10.944] [bbotk] Result of batch 6:
INFO [14:27:10.946] [bbotk] cp minsplit classif.acc uhash
INFO [14:27:10.946] [bbotk] 0.005965517 24 0.7894737 73ca40a4-96e8-4941-9b7d-552d279e05ea
INFO [14:27:10.948] [bbotk] Evaluating 1 configuration(s)
INFO [14:27:10.975] [mlr3] Running benchmark with 1 resampling iterations
INFO [14:27:10.984] [mlr3] Applying learner 'classif.rpart' on task 'telp' (iter 1/1)
INFO [14:27:11.003] [mlr3] Finished benchmark
INFO [14:27:11.066] [bbotk] Result of batch 7:
INFO [14:27:11.068] [bbotk] cp minsplit classif.acc uhash
INFO [14:27:11.068] [bbotk] 0.007206897 9 0.5789474 dcc26a39-8afb-4f98-8b56-b3515b7e70c4
INFO [14:27:11.070] [bbotk] Evaluating 1 configuration(s)
INFO [14:27:11.095] [mlr3] Running benchmark with 1 resampling iterations
INFO [14:27:11.104] [mlr3] Applying learner 'classif.rpart' on task 'telp' (iter 1/1)
INFO [14:27:11.125] [mlr3] Finished benchmark
INFO [14:27:11.517] [bbotk] Result of batch 8:
INFO [14:27:11.520] [bbotk] cp minsplit classif.acc uhash
INFO [14:27:11.520] [bbotk] 0.005034483 13 0.7368421 68599a5b-e6ae-4096-9622-32b8f11a2632
INFO [14:27:11.521] [bbotk] Evaluating 1 configuration(s)
INFO [14:27:11.550] [mlr3] Running benchmark with 1 resampling iterations
INFO [14:27:11.558] [mlr3] Applying learner 'classif.rpart' on task 'telp' (iter 1/1)
INFO [14:27:11.578] [mlr3] Finished benchmark
INFO [14:27:11.644] [bbotk] Result of batch 9:
INFO [14:27:11.647] [bbotk] cp minsplit classif.acc uhash
INFO [14:27:11.647] [bbotk] 0.008448276 25 0.7894737 f8d30e55-fcf8-4b0d-b635-ae7f4ee21a56
INFO [14:27:11.649] [bbotk] Evaluating 1 configuration(s)
INFO [14:27:11.682] [mlr3] Running benchmark with 1 resampling iterations
INFO [14:27:11.695] [mlr3] Applying learner 'classif.rpart' on task 'telp' (iter 1/1)
INFO [14:27:11.718] [mlr3] Finished benchmark
INFO [14:27:11.789] [bbotk] Result of batch 10:
INFO [14:27:11.791] [bbotk] cp minsplit classif.acc uhash
INFO [14:27:11.791] [bbotk] 0.004724138 28 0.7894737 4f234855-9779-4ec5-a585-61aec2f43422
INFO [14:27:11.792] [bbotk] Evaluating 1 configuration(s)
INFO [14:27:11.818] [mlr3] Running benchmark with 1 resampling iterations
INFO [14:27:11.826] [mlr3] Applying learner 'classif.rpart' on task 'telp' (iter 1/1)
INFO [14:27:11.845] [mlr3] Finished benchmark
INFO [14:27:11.908] [bbotk] Result of batch 11:
INFO [14:27:11.911] [bbotk] cp minsplit classif.acc uhash
INFO [14:27:11.911] [bbotk] 0.004413793 9 0.5789474 946a7bf5-6ac7-492c-819b-6223be6594f7
INFO [14:27:11.913] [bbotk] Evaluating 1 configuration(s)
INFO [14:27:11.942] [mlr3] Running benchmark with 1 resampling iterations
INFO [14:27:11.955] [mlr3] Applying learner 'classif.rpart' on task 'telp' (iter 1/1)
INFO [14:27:11.974] [mlr3] Finished benchmark
INFO [14:27:12.040] [bbotk] Result of batch 12:
INFO [14:27:12.042] [bbotk] cp minsplit classif.acc uhash
INFO [14:27:12.042] [bbotk] 0.004413793 30 0.7894737 9ec2b717-8499-4c77-b70c-771133888da1
INFO [14:27:12.044] [bbotk] Evaluating 1 configuration(s)
INFO [14:27:12.069] [mlr3] Running benchmark with 1 resampling iterations
INFO [14:27:12.078] [mlr3] Applying learner 'classif.rpart' on task 'telp' (iter 1/1)
INFO [14:27:12.096] [mlr3] Finished benchmark
INFO [14:27:12.160] [bbotk] Result of batch 13:
INFO [14:27:12.162] [bbotk] cp minsplit classif.acc uhash
INFO [14:27:12.162] [bbotk] 0.00162069 17 0.7894737 e0f76d43-75d5-4603-933d-2252520e1dc3
INFO [14:27:12.164] [bbotk] Evaluating 1 configuration(s)
INFO [14:27:12.197] [mlr3] Running benchmark with 1 resampling iterations
INFO [14:27:12.206] [mlr3] Applying learner 'classif.rpart' on task 'telp' (iter 1/1)
INFO [14:27:12.226] [mlr3] Finished benchmark
INFO [14:27:12.293] [bbotk] Result of batch 14:
INFO [14:27:12.295] [bbotk] cp minsplit classif.acc uhash
INFO [14:27:12.295] [bbotk] 0.001 12 0.7894737 2095fe05-7671-4643-9efb-1e1836f00b91
INFO [14:27:12.298] [bbotk] Evaluating 1 configuration(s)
INFO [14:27:12.323] [mlr3] Running benchmark with 1 resampling iterations
INFO [14:27:12.332] [mlr3] Applying learner 'classif.rpart' on task 'telp' (iter 1/1)
INFO [14:27:12.360] [mlr3] Finished benchmark
INFO [14:27:12.423] [bbotk] Result of batch 15:
INFO [14:27:12.425] [bbotk] cp minsplit classif.acc uhash
INFO [14:27:12.425] [bbotk] 0.007206897 24 0.7894737 687f4423-7d20-4fa5-b7f1-a09ea9e56521
INFO [14:27:12.427] [bbotk] Evaluating 1 configuration(s)
INFO [14:27:12.454] [mlr3] Running benchmark with 1 resampling iterations
INFO [14:27:12.462] [mlr3] Applying learner 'classif.rpart' on task 'telp' (iter 1/1)
INFO [14:27:12.482] [mlr3] Finished benchmark
INFO [14:27:12.549] [bbotk] Result of batch 16:
INFO [14:27:12.552] [bbotk] cp minsplit classif.acc uhash
INFO [14:27:12.552] [bbotk] 0.004413793 22 0.7894737 ccf91811-7192-4e8a-bf58-ce0474e45355
INFO [14:27:12.555] [bbotk] Evaluating 1 configuration(s)
INFO [14:27:12.580] [mlr3] Running benchmark with 1 resampling iterations
INFO [14:27:12.590] [mlr3] Applying learner 'classif.rpart' on task 'telp' (iter 1/1)
INFO [14:27:12.607] [mlr3] Finished benchmark
INFO [14:27:12.668] [bbotk] Result of batch 17:
INFO [14:27:12.669] [bbotk] cp minsplit classif.acc uhash
INFO [14:27:12.669] [bbotk] 0.008448276 15 0.7368421 3b627a0e-392e-4a29-a015-3eca8243f3f1
INFO [14:27:12.671] [bbotk] Evaluating 1 configuration(s)
INFO [14:27:12.703] [mlr3] Running benchmark with 1 resampling iterations
INFO [14:27:12.712] [mlr3] Applying learner 'classif.rpart' on task 'telp' (iter 1/1)
INFO [14:27:12.731] [mlr3] Finished benchmark
INFO [14:27:12.795] [bbotk] Result of batch 18:
INFO [14:27:12.797] [bbotk] cp minsplit classif.acc uhash
INFO [14:27:12.797] [bbotk] 0.007827586 24 0.7894737 b77a585e-766b-4e6b-87d0-e09b889e9153
INFO [14:27:12.799] [bbotk] Evaluating 1 configuration(s)
INFO [14:27:12.824] [mlr3] Running benchmark with 1 resampling iterations
INFO [14:27:12.834] [mlr3] Applying learner 'classif.rpart' on task 'telp' (iter 1/1)
INFO [14:27:12.854] [mlr3] Finished benchmark
INFO [14:27:12.922] [bbotk] Result of batch 19:
INFO [14:27:12.924] [bbotk] cp minsplit classif.acc uhash
INFO [14:27:12.924] [bbotk] 0.005655172 12 0.7894737 95db7313-de10-4964-9466-69e73ca70138
INFO [14:27:12.926] [bbotk] Evaluating 1 configuration(s)
INFO [14:27:12.953] [mlr3] Running benchmark with 1 resampling iterations
INFO [14:27:12.961] [mlr3] Applying learner 'classif.rpart' on task 'telp' (iter 1/1)
INFO [14:27:12.982] [mlr3] Finished benchmark
INFO [14:27:13.053] [bbotk] Result of batch 20:
INFO [14:27:13.055] [bbotk] cp minsplit classif.acc uhash
INFO [14:27:13.055] [bbotk] 0.002241379 29 0.7894737 83da9b9b-94b0-4101-906a-bde3ab6d5e29
INFO [14:27:13.057] [bbotk] Evaluating 1 configuration(s)
INFO [14:27:13.083] [mlr3] Running benchmark with 1 resampling iterations
INFO [14:27:13.091] [mlr3] Applying learner 'classif.rpart' on task 'telp' (iter 1/1)
INFO [14:27:13.109] [mlr3] Finished benchmark
INFO [14:27:13.173] [bbotk] Result of batch 21:
INFO [14:27:13.175] [bbotk] cp minsplit classif.acc uhash
INFO [14:27:13.175] [bbotk] 0.005965517 13 0.7368421 67018432-1631-410c-ba19-c77d43fabdce
INFO [14:27:13.177] [bbotk] Evaluating 1 configuration(s)
INFO [14:27:13.202] [mlr3] Running benchmark with 1 resampling iterations
INFO [14:27:13.212] [mlr3] Applying learner 'classif.rpart' on task 'telp' (iter 1/1)
INFO [14:27:13.233] [mlr3] Finished benchmark
INFO [14:27:13.295] [bbotk] Result of batch 22:
INFO [14:27:13.298] [bbotk] cp minsplit classif.acc uhash
INFO [14:27:13.298] [bbotk] 0.00937931 29 0.7894737 feaff9c2-552b-4a68-9bf7-c2f1ad813e84
INFO [14:27:13.300] [bbotk] Evaluating 1 configuration(s)
INFO [14:27:13.325] [mlr3] Running benchmark with 1 resampling iterations
INFO [14:27:13.334] [mlr3] Applying learner 'classif.rpart' on task 'telp' (iter 1/1)
INFO [14:27:13.353] [mlr3] Finished benchmark
INFO [14:27:13.412] [bbotk] Result of batch 23:
INFO [14:27:13.414] [bbotk] cp minsplit classif.acc uhash
INFO [14:27:13.414] [bbotk] 0.001310345 10 0.5789474 8815379e-614e-4052-9464-d495fe5415b8
INFO [14:27:13.416] [bbotk] Evaluating 1 configuration(s)
INFO [14:27:13.441] [mlr3] Running benchmark with 1 resampling iterations
INFO [14:27:13.453] [mlr3] Applying learner 'classif.rpart' on task 'telp' (iter 1/1)
INFO [14:27:13.471] [mlr3] Finished benchmark
INFO [14:27:13.541] [bbotk] Result of batch 24:
INFO [14:27:13.543] [bbotk] cp minsplit classif.acc uhash
INFO [14:27:13.543] [bbotk] 0.003482759 19 0.7894737 7b9177e5-aa1d-437a-a5ef-1595c8d27639
INFO [14:27:13.545] [bbotk] Evaluating 1 configuration(s)
INFO [14:27:13.571] [mlr3] Running benchmark with 1 resampling iterations
INFO [14:27:13.580] [mlr3] Applying learner 'classif.rpart' on task 'telp' (iter 1/1)
INFO [14:27:13.599] [mlr3] Finished benchmark
INFO [14:27:13.664] [bbotk] Result of batch 25:
INFO [14:27:13.667] [bbotk] cp minsplit classif.acc uhash
INFO [14:27:13.667] [bbotk] 0.007206897 13 0.7368421 f84d77c6-f26b-47d6-bb72-d0564d7a4b1e
INFO [14:27:13.670] [bbotk] Evaluating 1 configuration(s)
INFO [14:27:13.696] [mlr3] Running benchmark with 1 resampling iterations
INFO [14:27:13.705] [mlr3] Applying learner 'classif.rpart' on task 'telp' (iter 1/1)
INFO [14:27:13.729] [mlr3] Finished benchmark
INFO [14:27:13.808] [bbotk] Result of batch 26:
INFO [14:27:13.810] [bbotk] cp minsplit classif.acc uhash
INFO [14:27:13.810] [bbotk] 0.001 23 0.7894737 5b8d3b98-1413-4cd6-8d02-96f60f3527ec
INFO [14:27:13.812] [bbotk] Evaluating 1 configuration(s)
INFO [14:27:13.839] [mlr3] Running benchmark with 1 resampling iterations
INFO [14:27:13.848] [mlr3] Applying learner 'classif.rpart' on task 'telp' (iter 1/1)
INFO [14:27:13.867] [mlr3] Finished benchmark
INFO [14:27:13.935] [bbotk] Result of batch 27:
INFO [14:27:13.939] [bbotk] cp minsplit classif.acc uhash
INFO [14:27:13.939] [bbotk] 0.009068966 24 0.7894737 2dfceb54-b777-4cba-8fea-4119113bc9c3
INFO [14:27:13.941] [bbotk] Evaluating 1 configuration(s)
INFO [14:27:13.966] [mlr3] Running benchmark with 1 resampling iterations
INFO [14:27:13.977] [mlr3] Applying learner 'classif.rpart' on task 'telp' (iter 1/1)
INFO [14:27:13.997] [mlr3] Finished benchmark
INFO [14:27:14.060] [bbotk] Result of batch 28:
INFO [14:27:14.062] [bbotk] cp minsplit classif.acc uhash
INFO [14:27:14.062] [bbotk] 0.01 24 0.7894737 8669235d-bb02-4651-a021-c5f85d6cc634
INFO [14:27:14.064] [bbotk] Evaluating 1 configuration(s)
INFO [14:27:14.089] [mlr3] Running benchmark with 1 resampling iterations
INFO [14:27:14.097] [mlr3] Applying learner 'classif.rpart' on task 'telp' (iter 1/1)
INFO [14:27:14.115] [mlr3] Finished benchmark
INFO [14:27:14.176] [bbotk] Result of batch 29:
INFO [14:27:14.178] [bbotk] cp minsplit classif.acc uhash
INFO [14:27:14.178] [bbotk] 0.005034483 16 0.7368421 0202be97-53ff-46b7-84ec-b40f2d3c34b6
INFO [14:27:14.181] [bbotk] Evaluating 1 configuration(s)
INFO [14:27:14.209] [mlr3] Running benchmark with 1 resampling iterations
INFO [14:27:14.218] [mlr3] Applying learner 'classif.rpart' on task 'telp' (iter 1/1)
INFO [14:27:14.238] [mlr3] Finished benchmark
INFO [14:27:14.304] [bbotk] Result of batch 30:
INFO [14:27:14.306] [bbotk] cp minsplit classif.acc uhash
INFO [14:27:14.306] [bbotk] 0.003793103 29 0.7894737 e61017cf-3bf6-4aac-b136-2a3e1d68e3a1
INFO [14:27:14.314] [bbotk] Finished optimizing after 30 evaluation(s)
INFO [14:27:14.315] [bbotk] Result:
INFO [14:27:14.317] [bbotk] cp minsplit learner_param_vals x_domain classif.acc
INFO [14:27:14.317] [bbotk] 0.009689655 29 <list[3]> <list[2]> 0.7894737
cp minsplit learner_param_vals x_domain classif.acc
1: 0.009689655 29 <list[3]> <list[2]> 0.7894737
Tuning Resut
tuning_setting$result cp minsplit learner_param_vals x_domain classif.acc
1: 0.009689655 29 <list[3]> <list[2]> 0.7894737
tuning_result<-as.data.table(tuning_setting$archive)%>% select(cp,minsplit, classif.acc)
kbl(tuning_result, caption = "")%>% kable_styling()| cp | minsplit | classif.acc |
|---|---|---|
| 0.0096897 | 29 | 0.7894737 |
| 0.0078276 | 22 | 0.7894737 |
| 0.0096897 | 16 | 0.7368421 |
| 0.0075172 | 8 | 0.5789474 |
| 0.0022414 | 9 | 0.5789474 |
| 0.0059655 | 24 | 0.7894737 |
| 0.0072069 | 9 | 0.5789474 |
| 0.0050345 | 13 | 0.7368421 |
| 0.0084483 | 25 | 0.7894737 |
| 0.0047241 | 28 | 0.7894737 |
| 0.0044138 | 9 | 0.5789474 |
| 0.0044138 | 30 | 0.7894737 |
| 0.0016207 | 17 | 0.7894737 |
| 0.0010000 | 12 | 0.7894737 |
| 0.0072069 | 24 | 0.7894737 |
| 0.0044138 | 22 | 0.7894737 |
| 0.0084483 | 15 | 0.7368421 |
| 0.0078276 | 24 | 0.7894737 |
| 0.0056552 | 12 | 0.7894737 |
| 0.0022414 | 29 | 0.7894737 |
| 0.0059655 | 13 | 0.7368421 |
| 0.0093793 | 29 | 0.7894737 |
| 0.0013103 | 10 | 0.5789474 |
| 0.0034828 | 19 | 0.7894737 |
| 0.0072069 | 13 | 0.7368421 |
| 0.0010000 | 23 | 0.7894737 |
| 0.0090690 | 24 | 0.7894737 |
| 0.0100000 | 24 | 0.7894737 |
| 0.0050345 | 16 | 0.7368421 |
| 0.0037931 | 29 | 0.7894737 |
dipilih cp = 0.001, minsplit=12 karena memiliki nilai akurasi yang tinggi dan pada titik tersebut cenderung stabil.
Model Pohon Klasifikasi
learner_tree$train(task = taskTelp)
rpart.plot(learner_tree$model,roundint = F,type = 5,tweak = 2) Untuk dapat melihat peluang pada masing masing klasifikasi menggunakan pohon klasifikasi dapat dilakukan dengan menelusuri setiap garisnya.
Dari pohon klasifikasi terlihat bahwa x23 (hedging forward one month), x17 (Ice Bofa BBB US Corporate Index Option-Adjusted Spread), x22 (INDO CDS 5 year), dan x7 (US 10-Year Breakeven Inflation Rate, Percent) muncul sebagai peubah yang diduga berpengaruh terhadap arus dana investor asing ke obligasi pemerintah.
Secara lebih rinci arus dana investor asing ke obligasi pemerintah dapat dideskripsikan sebagai berikut :
- Terdapat empat kelompok yang terkategori sebagai Positif (Dana investor asing masuk ke obligasi pemerintah lebih banyak dibandingkan yang keluar), yaitu:
Pada saat X23 (hedging forward one month) <11 dan X17 (Ice Bofa BBB US Corporate Index Option-Adjusted Spread) <2
Pada saat X23 (hedging forward one month) <11, X17 (Ice Bofa BBB US Corporate Index Option-Adjusted Spread) >= 2, dan X23 (hedging forward one month) <3.5
Pada saat X23 (hedging forward one month) <11, X17 (Ice Bofa BBB US Corporate Index Option-Adjusted Spread) >=2, X23 (hedging forward one month) >= 3.5,dan X22 (INDO CDS 5 year) >= 192
Pada saat X23 (hedging forward one month) <11, X17 (Ice Bofa BBB US Corporate Index Option-Adjusted Spread) >=2, X23 (hedging forward one month) >= 3.5,dan X22 (INDO CDS 5 year) <192, dan X7 (US 10-Year Breakeven Inflation Rate, Percent) >= 2.4
Terdapat dua kelompok yang terkategori sebagai Negatif (Dana investor asing masuk ke obligasi pemerintah lebih sedikit dibandingkan yang keluar), yaitu:
Pada saat X23 (hedging forward one month) >=11
Pada saat X23 (hedging forward one month) <11, X17 (Ice Bofa BBB US Corporate Index Option-Adjusted Spread) >=2, X23 (hedging forward one month) >= 3.5,dan X22 (INDO CDS 5 year) <192, dan X7 (US 10-Year Breakeven Inflation Rate, Percent) < 2.4
set.seed(123)
train_test_telp_tree = resample(task = taskTelp,
learner = learner_tree,
resampling = resampleTelp1,
store_models = TRUE
)INFO [14:27:14.938] [mlr3] Applying learner 'classif.rpart' on task 'telp' (iter 1/1)
Hasil Prediksi
Hasil prediksi dilakukan untuk data test.
prediksi_test = as.data.table(train_test_telp_tree$prediction())
head(prediksi_test) row_ids truth response prob.1 prob.0
1: 1 1 1 0.9736842 0.02631579
2: 2 1 1 0.9736842 0.02631579
3: 10 1 1 0.9736842 0.02631579
4: 11 0 0 0.4285714 0.57142857
5: 24 1 0 0.2727273 0.72727273
6: 28 1 0 0.0000000 1.00000000
Confusion Matrix
train_test_telp_tree$prediction()$confusion truth
response 1 0
1 13 2
0 2 2
acctree<- train_test_telp_tree$aggregate(list(msr("classif.acc"),
msr("classif.specificity"),
msr("classif.sensitivity")
))
acctree classif.acc classif.specificity classif.sensitivity
0.7894737 0.5000000 0.8666667
Dibandingkan kedua metode sebelumnya, akurasi yang dihasilkan dari pohon klasifikasi sekitar 78% persen. Kemudian, sensitivity nya 86.67% dan specificity nya 50%.
Dapat terlihat dari confusion matrix bahwa pada kenyataan berkode 0 (negatif/dana investor lebih banyak keluar dari obligasi pemerintah) diprediksi menjadi 1 (positif/dana investor lebih banyak masuk dari obligasi pemerintah) mencapai 50% sehingga model ini masih dapat dikembangkan.
ROC Plot
autoplot(train_test_telp_tree, type = "roc")Bagging
Bagging merupakan suatu metode yang mampu meningkatkan akurasi dari suatu prediksi data yang dikenalkan pertama kali oleh Leo Breiman pada tahun 1996, metode ini mampu mengurangi keberagaman yang berkaitan dengan prediksi. Bagging terdiri dari dua tahapan utama yaitu bootstrap yang merupakan proses pengambilan contoh dari data data contoh yang dimiliki (resampling) dan aggregating yakni proses penggabungan beberapa nilai prediksi menjadi satu nilai prediksi (Sartono et al 2010). Bagging mampu menurunkan rata-rata kesalahan prediksi dengan cara melakukan pengulangan yang bertujuan untuk membentuk beberapa kemungkinan pohon klasifikasi dengan data yang berbeda untuk kemudian dilakukan penarikan kesimpulan secara agregat.
set.seed(123)
train_test_bagging = resample(task = taskTelp,
learner = learner_bagging,
resampling = resampleTelp1,
store_models = TRUE
)INFO [14:27:15.741] [mlr3] Applying learner 'bagging clf' on task 'telp' (iter 1/1)
train_test_bagging$prediction()$confusion truth
response 1 0
1 12 4
0 3 0
accbagging <- train_test_bagging$aggregate(list(msr("classif.acc"),
msr("classif.specificity"),
msr("classif.sensitivity")))
accbagging classif.acc classif.specificity classif.sensitivity
0.6315789 0.0000000 0.8000000
Akurasi yang dihasilkan dari bagging sebesar 80 persen, dengan sensitivity sebesar 100% dan specificity sebesar 0%.
Plot ROC
autoplot(train_test_bagging, type = "roc")Random Forest
Metode random forest adalah pengembangan dari metode CART, yaitu dengan menerapkan metode bootstrap aggregating (bagging) dan random feature selection (Breiman 2001). Metode random forest merupakan salah satu metode pohon gabungan dimana sebanyak k pohon ditumbuhkan sehingga terbentuk suatu hutan (forest), kemudian analisis dilakukan pada kumpulan pohon tersebut dengan cara menggabungkan hasil prediksi dari k pohon yang terbentuk.
Interpretasi Model
learner_rf$train(task=taskTelp)
learner_rf$model$variable.importance X1 X10 X11 X12 X13 X14 X15 X16
0.9687638 0.3609887 0.4950794 1.4409431 1.6909262 0.8496355 0.7773331 0.8583195
X17 X18 X19 X2 X20 X21 X22 X23
1.7231522 0.8560242 1.3678156 0.7866991 1.5952220 1.1618655 1.1861362 3.8070474
X24 X3 X4 X5 X6 X7 X8 X9
0.8960112 0.9330833 0.8399451 1.3704469 1.1776198 0.8327396 0.7861040 0.1057250
Nilai variable importance untuk Random Forest tersebut adalah nilai Gini Impurity. Semakin besar nilainya maka akan semakin berpengaruh prediktor tersebut. Untuk memudahkan melihat variable importance tersebut, maka dapat dibuat plot
#dataframe
importance <- data.frame(Predictors = names(learner_rf$model$variable.importance),
impurity = learner_rf$model$variable.importance)
rownames(importance) <- NULL
importance <- importance %>% arrange(desc(impurity))
#grafik
ggplot(importance,
aes(x=impurity,
y=reorder(Predictors,impurity))) +
geom_col(fill = "steelblue")+
geom_text(aes(label=round(impurity,2)),hjust=1.2)Terlihat bahwa variabel yang paling berpengaruh adalah X3 yaitu Kontrak yang mengikat di pasar valuta asing yang mengunci nilai tukar Rupiah terhadap US Dollar untuk pembelian atau penjualan mata uang di masa 1 bulan yang akan mendatang. Jika risiko Rupiah melemah menigkat maka hedging costnya akan meningkat. dan yang paling tidak berpengaruh adalah X9 yaitu Suku bunga acuan bank sentral Amerika. Menjadi acuan untuk suku bunga lainnya.
set.seed(123)
train_test_telp_forest = resample(task = taskTelp,
learner = learner_rf,
resampling = resampleTelp1,
store_models = TRUE
)INFO [14:27:17.207] [mlr3] Applying learner 'classif.ranger' on task 'telp' (iter 1/1)
Hasil Prediksi
Hasil prediksi dilakukan untuk data test.
prediksi_test = as.data.table(train_test_telp_forest$prediction())
head(prediksi_test) row_ids truth response prob.1 prob.0
1: 1 1 1 0.7979897 0.20201032
2: 2 1 1 0.7880190 0.21198095
3: 10 1 1 0.9148825 0.08511746
4: 11 0 1 0.8564714 0.14352857
5: 24 1 0 0.3153563 0.68464365
6: 28 1 0 0.4645659 0.53543413
Confusion Matrix
train_test_telp_forest$prediction()$confusion truth
response 1 0
1 12 4
0 3 0
accforest <- train_test_telp_forest$aggregate(list(msr("classif.acc"),
msr("classif.specificity"),
msr("classif.sensitivity")
))
accforest classif.acc classif.specificity classif.sensitivity
0.6315789 0.0000000 0.8000000
Akurasi dari random forest sebesar 7 persen sensitifity sebesar 87,5% dan specificity 0%.
Plot ROC
autoplot(train_test_telp_forest$prediction(), type = "roc")Komparasi Model
Berikut akan dilakukan perbandingan dari beberapa model dengan melihat uji kebaikan berupa akurasi dan AUC.
accall<- list( knn= accknn,pohon.klasifikasi=acctree,random.forest=accforest,bagging=accbagging)
accall$knn
classif.acc classif.specificity classif.sensitivity
0.6842105 0.0000000 0.8666667
$pohon.klasifikasi
classif.acc classif.specificity classif.sensitivity
0.7894737 0.5000000 0.8666667
$random.forest
classif.acc classif.specificity classif.sensitivity
0.6315789 0.0000000 0.8000000
$bagging
classif.acc classif.specificity classif.sensitivity
0.6315789 0.0000000 0.8000000
Dari nilai akurasi di atas terlihat bahwa model pohon klasifikasi memiliki akurasi yang paling tinggi. Akurasi ini merupakan persentase prediksi dari model yang benar dari total prediksi yang dibuat. Nilai AUC pada model pohon klasifikasi juga yang paling tinggi.
set.seed(123)
learner_telp <- list(learner_tree,
learner_rf,
learner_knn,
learner_bagging)set.seed(123)
design.telp <- benchmark_grid(tasks = taskTelp,
learners = learner_telp,
resamplings = resampleTelp1
)set.seed(123)
bmr = benchmark(design.telp, store_models = TRUE)INFO [14:27:18.485] [mlr3] Running benchmark with 4 resampling iterations
INFO [14:27:18.494] [mlr3] Applying learner 'classif.kknn' on task 'telp' (iter 1/1)
INFO [14:27:18.525] [mlr3] Applying learner 'classif.ranger' on task 'telp' (iter 1/1)
INFO [14:27:18.593] [mlr3] Applying learner 'classif.rpart' on task 'telp' (iter 1/1)
INFO [14:27:18.612] [mlr3] Applying learner 'bagging clf' on task 'telp' (iter 1/1)
INFO [14:27:18.694] [mlr3] Finished benchmark
bmr$aggregate(list(msr("classif.auc"))) nr resample_result task_id learner_id resampling_id iters
1: 1 <ResampleResult[20]> telp classif.rpart holdout 1
2: 2 <ResampleResult[20]> telp classif.ranger holdout 1
3: 3 <ResampleResult[20]> telp classif.kknn holdout 1
4: 4 <ResampleResult[20]> telp bagging clf holdout 1
classif.auc
1: 0.6166667
2: 0.3000000
3: 0.3416667
4: 0.3666667
autoplot(bmr,type = "roc")AUC (Area Under the Curve) adalah luas area di bawah curve ROC. Dengan kata lain, semakin besar luas area di bawah kurva ROC maka kemampuan prediksi yang dihasilkan oleh model semakin baik.
Dari grafik ROC di atas terlihat bahwa model Random Forest memiliki AUC terbesar sehingga dipilih model pohon klasifikasi sebagai model terbaik dalam mengklasifikasikan/memprediksi dana investor asing masuk atau keluar dari obligasi pemerintah.
Kesimpulan
Hasil dari model yang dibentuk untuk memprediksi dana investor asing masuk atau keluar dari obligasi pemerintah diperoleh bahwa model pohon klasifikasi adalah yang terbaik.
Peubah yang diduga berpengaruh terhadap arus dana investor asing ke obligasi pemerintah, yaitu x23 (hedging forward one month), x17 (Ice Bofa BBB US Corporate Index Option-Adjusted Spread), x22 (INDO CDS 5 year), dan x7 (US 10-Year Breakeven Inflation Rate, Percent).
Empat kelompok yang terkategori sebagai Positif (Dana investor asing masuk ke obligasi pemerintah lebih banyak dibandingkan yang keluar), yaitu:
Pada saat X23 (hedging forward one month) <11 dan X17 (Ice Bofa BBB US Corporate Index Option-Adjusted Spread) <2
Pada saat X23 (hedging forward one month) <11, X17 (Ice Bofa BBB US Corporate Index Option-Adjusted Spread) >= 2, dan X23 (hedging forward one month) <3.5
Pada saat X23 (hedging forward one month) <11, X17 (Ice Bofa BBB US Corporate Index Option-Adjusted Spread) >=2, X23 (hedging forward one month) >= 3.5,dan X22 (INDO CDS 5 year) >= 192
Pada saat X23 (hedging forward one month) <11, X17 (Ice Bofa BBB US Corporate Index Option-Adjusted Spread) >=2, X23 (hedging forward one month) >= 3.5,dan X22 (INDO CDS 5 year) <192, dan X7 (US 10-Year Breakeven Inflation Rate, Percent) >= 2.4
Dua kelompok yang terkategori sebagai Negatif (Dana investor asing masuk ke obligasi pemerintah lebih sedikit dibandingkan yang keluar), yaitu:
Pada saat X23 (hedging forward one month) >=11
Pada saat X23 (hedging forward one month) <11, X17 (Ice Bofa BBB US Corporate Index Option-Adjusted Spread) >=2, X23 (hedging forward one month) >= 3.5,dan X22 (INDO CDS 5 year) <192, dan X7 (US 10-Year Breakeven Inflation Rate, Percent) < 2.4
Saran untuk penelitian selanjutnya sebaiknya diperhatikan metode resampling yang digunakan. Jika dilihat dari confusion matrix yang dihasilkan, meskipun akurasinya besar, namun kemampuan model untuk memprediksi terkategori sebagai Negatif (Dana investor asing masuk ke obligasi pemerintah lebih sedikit dibandingkan yang keluar) cenderung rendah. Hal ini terlihat dari kesalahan klasifikasi terkategori sebagai Negatif (Dana investor asing masuk ke obligasi pemerintah lebih sedikit dibandingkan yang keluar) mencapai 50%. Model lain dapat dicoba untuk data yang cenderung imbalance seperti ini.
Referensi
Breiman L. 2001. Random Forests. Machine Learning. 45(1): 5–32.
Breiman L, Friedman JH, Olshen RA, Stone CJ. 1993. Classification and Regression Trees. New York (US): Chapman and Hill.
Dito, G.A. 2021. Neural Network dengan Keras di R. Retrieved from https://gerrydito.github.io/Neural-Network-with-Keras-in-R/
Dito, G.A. 2021. Statistical Machine Learning dengan mlr3. Retrieved from https://gerrydito.github.io/Statistical-Learning/
Dito, G.A. 2021. Tree Based Method dengan mlr3. Retrieved from https://gerrydito.github.io/Tree-Based-Methods/
Fawcett T. 2006. An Introduction to ROC Analysis. Pattern Recognition Letters. 2(1):861–874.
Karno. 2016. Penentuan Parameter Pada Algoritma Klasifikasi K-Nearest Neighbor Berbasis Algoritma Genetika. Seminar Nasional Teknologi Informasi (SNTI) 8 UNTAR; 2016 Oktober 29; Jakarta, Indonesia. Jakarta (ID): UNTAR. 165- 169.
Kohavi R. 1995. A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection. Prosiding of the 14th International Joint Conference on Artrificial Intelligence - Volume 2: 1137–1143.
Sartono, B.. (March 6, 2020). AdaBoost. Retrieved from https://rpubs.com/bagusco/adaboost
Sartono, B.. (March 28, 2020). Pohon Klasifikasi. Retrieved from https://rpubs.com/bagusco/pohon
Sartono, B.. (April 23, 2020). Bagging. Retrieved from https://rpubs.com/bagusco/bagging
Sartono, B.. (April 24, 2020). Random Forest. Retrieved https://rpubs.com/bagusco/randomforest
Sartono, B.. (September 29, 2020). Pohon Regresi. Retrieved from https://rpubs.com/bagusco/regressiontree
Sartono B, Syafitri UD. 2010. Metode pohon gabungan: solusi pilihan untuk mengatasi kelemahan pohon regresi dan klasifikasi tunggal. Forum Statistika dan Komputasi 15 (1): 1-7.
Sartono B. 2010. Pengenalan Algoritma Genetik Untuk Pemilihan Peubah Penjelas Dalam Model Regresi Menggunakan SAS/IML. Forum Statistika dan Komputasi 15 (2): 10-15.
Siringoringo R. 2018. Klasifikasi DAta Tidak Seimbang Menggunakan Algoritma SMOTE dan K-Nearest Neighbor. Jurnal ISD. 3(1): 44-49
Mahasiswa Pascasarjana Statistika dan Sains Data, IPB University, reniamelia@apps.ipb.ac.id↩︎