Model untuk Memprediksi Pelanggan yang Berpotensial untuk Menggunakan Channel - Laporan Tugas Praktikum STA581
Package
Pertama panggil package yang dibutuhkan dengan sintaks berikut:
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.0 ──
## ✓ ggplot2 3.3.3 ✓ purrr 0.3.4
## ✓ tibble 3.1.0 ✓ dplyr 1.0.5
## ✓ tidyr 1.1.3 ✓ stringr 1.4.0
## ✓ readr 1.4.0 ✓ forcats 0.5.0
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
## Loading required package: lattice
##
## Attaching package: 'caret'
## The following object is masked from 'package:purrr':
##
## lift
##
## Attaching package: 'mlr3extralearners'
## The following objects are masked from 'package:mlr3verse':
##
## lrn, lrns
## Loading required package: rpart
Data
Lalu, load data yang dibutuhkan
Eksplorasi Data
Algoritme Naive Bayes, KNN, Regresi Logistik, Random Forest, Neural Network dan Regression Tree
Susun Model
Menentukan Cara Pembagian Data dan Komparasi Model
learner_telkom <- list(learner_logreg,learner_knn,
learner_tree,learner_nb,learner_rf)
resample_telkom_cv = rsmp("cv",folds=10)
#mengatur seed
set.seed(123)
resample_telkom_cv$instantiate(task=task_telkom)
design <- benchmark_grid(tasks = task_telkom,
learners = learner_telkom,
resamplings = resample_telkom_cv
)
bmr = benchmark(design,store_models = TRUE)## INFO [00:51:20.312] [mlr3] Running benchmark with 50 resampling iterations
##
## Attaching package: 'mlr3'
## The following objects are masked from 'package:mlr3extralearners':
##
## lrn, lrns
## INFO [00:51:20.631] [mlr3] Applying learner 'classif.log_reg' on task 'telkom' (iter 1/10)
## INFO [00:51:21.077] [mlr3] Applying learner 'classif.ranger' on task 'telkom' (iter 2/10)
## Growing trees.. Progress: 81%. Estimated remaining time: 7 seconds.
## INFO [00:52:03.147] [mlr3] Applying learner 'classif.rpart' on task 'telkom' (iter 5/10)
## INFO [00:52:03.525] [mlr3] Applying learner 'classif.naive_bayes' on task 'telkom' (iter 7/10)
## INFO [00:52:05.039] [mlr3] Applying learner 'classif.ranger' on task 'telkom' (iter 5/10)
## Growing trees.. Progress: 74%. Estimated remaining time: 10 seconds.
## INFO [00:52:50.814] [mlr3] Applying learner 'classif.log_reg' on task 'telkom' (iter 6/10)
## INFO [00:52:51.427] [mlr3] Applying learner 'classif.rpart' on task 'telkom' (iter 2/10)
## INFO [00:52:51.778] [mlr3] Applying learner 'classif.naive_bayes' on task 'telkom' (iter 10/10)
## INFO [00:52:53.393] [mlr3] Applying learner 'classif.rpart' on task 'telkom' (iter 6/10)
## INFO [00:52:53.729] [mlr3] Applying learner 'classif.kknn' on task 'telkom' (iter 9/10)
## INFO [00:52:58.921] [mlr3] Applying learner 'classif.ranger' on task 'telkom' (iter 9/10)
## Growing trees.. Progress: 78%. Estimated remaining time: 8 seconds.
## INFO [00:53:41.681] [mlr3] Applying learner 'classif.naive_bayes' on task 'telkom' (iter 2/10)
## INFO [00:53:43.633] [mlr3] Applying learner 'classif.rpart' on task 'telkom' (iter 3/10)
## INFO [00:53:43.963] [mlr3] Applying learner 'classif.kknn' on task 'telkom' (iter 10/10)
## INFO [00:53:48.734] [mlr3] Applying learner 'classif.ranger' on task 'telkom' (iter 7/10)
## Growing trees.. Progress: 75%. Estimated remaining time: 10 seconds.
## INFO [00:54:33.666] [mlr3] Applying learner 'classif.log_reg' on task 'telkom' (iter 10/10)
## INFO [00:54:33.985] [mlr3] Applying learner 'classif.ranger' on task 'telkom' (iter 8/10)
## Growing trees.. Progress: 75%. Estimated remaining time: 10 seconds.
## INFO [00:55:18.723] [mlr3] Applying learner 'classif.ranger' on task 'telkom' (iter 10/10)
## Growing trees.. Progress: 69%. Estimated remaining time: 14 seconds.
## INFO [00:56:07.056] [mlr3] Applying learner 'classif.rpart' on task 'telkom' (iter 9/10)
## INFO [00:56:07.690] [mlr3] Applying learner 'classif.naive_bayes' on task 'telkom' (iter 4/10)
## INFO [00:56:09.701] [mlr3] Applying learner 'classif.kknn' on task 'telkom' (iter 5/10)
## INFO [00:56:15.271] [mlr3] Applying learner 'classif.kknn' on task 'telkom' (iter 7/10)
## INFO [00:56:20.620] [mlr3] Applying learner 'classif.log_reg' on task 'telkom' (iter 9/10)
## INFO [00:56:20.971] [mlr3] Applying learner 'classif.log_reg' on task 'telkom' (iter 4/10)
## INFO [00:56:21.260] [mlr3] Applying learner 'classif.naive_bayes' on task 'telkom' (iter 8/10)
## INFO [00:56:22.721] [mlr3] Applying learner 'classif.ranger' on task 'telkom' (iter 1/10)
## Growing trees.. Progress: 74%. Estimated remaining time: 10 seconds.
## INFO [00:57:09.373] [mlr3] Applying learner 'classif.ranger' on task 'telkom' (iter 6/10)
## Growing trees.. Progress: 80%. Estimated remaining time: 7 seconds.
## INFO [00:57:51.436] [mlr3] Applying learner 'classif.rpart' on task 'telkom' (iter 8/10)
## INFO [00:57:51.761] [mlr3] Applying learner 'classif.log_reg' on task 'telkom' (iter 2/10)
## INFO [00:57:52.070] [mlr3] Applying learner 'classif.log_reg' on task 'telkom' (iter 5/10)
## INFO [00:57:52.411] [mlr3] Applying learner 'classif.kknn' on task 'telkom' (iter 3/10)
## INFO [00:57:57.093] [mlr3] Applying learner 'classif.kknn' on task 'telkom' (iter 1/10)
## INFO [00:58:01.945] [mlr3] Applying learner 'classif.log_reg' on task 'telkom' (iter 3/10)
## INFO [00:58:02.228] [mlr3] Applying learner 'classif.naive_bayes' on task 'telkom' (iter 5/10)
## INFO [00:58:03.425] [mlr3] Applying learner 'classif.rpart' on task 'telkom' (iter 10/10)
## INFO [00:58:03.743] [mlr3] Applying learner 'classif.kknn' on task 'telkom' (iter 8/10)
## INFO [00:58:08.317] [mlr3] Applying learner 'classif.naive_bayes' on task 'telkom' (iter 1/10)
## INFO [00:58:09.505] [mlr3] Applying learner 'classif.log_reg' on task 'telkom' (iter 7/10)
## INFO [00:58:09.786] [mlr3] Applying learner 'classif.kknn' on task 'telkom' (iter 4/10)
## INFO [00:58:14.479] [mlr3] Applying learner 'classif.kknn' on task 'telkom' (iter 2/10)
## INFO [00:58:19.417] [mlr3] Applying learner 'classif.naive_bayes' on task 'telkom' (iter 9/10)
## INFO [00:58:20.735] [mlr3] Applying learner 'classif.naive_bayes' on task 'telkom' (iter 3/10)
## INFO [00:58:22.044] [mlr3] Applying learner 'classif.rpart' on task 'telkom' (iter 7/10)
## INFO [00:58:22.359] [mlr3] Applying learner 'classif.ranger' on task 'telkom' (iter 4/10)
## Growing trees.. Progress: 81%. Estimated remaining time: 7 seconds.
## INFO [00:59:04.470] [mlr3] Applying learner 'classif.log_reg' on task 'telkom' (iter 8/10)
## INFO [00:59:04.826] [mlr3] Applying learner 'classif.kknn' on task 'telkom' (iter 6/10)
## INFO [00:59:10.245] [mlr3] Applying learner 'classif.rpart' on task 'telkom' (iter 4/10)
## INFO [00:59:10.567] [mlr3] Applying learner 'classif.rpart' on task 'telkom' (iter 1/10)
## INFO [00:59:10.908] [mlr3] Applying learner 'classif.naive_bayes' on task 'telkom' (iter 6/10)
## INFO [00:59:12.603] [mlr3] Applying learner 'classif.ranger' on task 'telkom' (iter 3/10)
## Growing trees.. Progress: 81%. Estimated remaining time: 7 seconds.
## INFO [00:59:53.719] [mlr3] Finished benchmark
result13 = bmr$aggregate(list(msr("classif.acc"),
msr("classif.specificity"),
msr("classif.sensitivity")
))
result13Algoritma Neural Network
Metode ini memproses informasi dari data sebagai input layer, proses tersebut dilanjutkan pada layer berikutnya yaitu hidden layer, dimana setiap hidden layer dapat memuat beberapa nodes.
Neural network dapat dilakukan menggunakan fungsi nnet() yang terdapat pada package nnet, atau dapat pula dipanggil dari package mlr3extralearners.
## Warning in .__LearnerClassifNnet__initialize(self = self, private = private, :
## classif.nnet is now deprecated from mlr3extralearners, for use in the future
## please load mlr3learners >= 0.4.3.
## # weights: 105
## initial value 24465.972780
## iter 10 value 23201.609044
## iter 20 value 23118.626467
## iter 30 value 23028.008545
## iter 40 value 22973.340079
## iter 50 value 22954.967535
## iter 60 value 22948.231434
## iter 70 value 22945.481027
## iter 80 value 22944.054527
## iter 90 value 22936.785718
## iter 100 value 22932.353770
## final value 22932.353770
## stopped after 100 iterations
resample_diabetes1 = rsmp("holdout", ratio = 0.8)
set.seed(2020)
train_test_diabetes1 = resample(task = task_telkom,
learner = learner_nn,
resampling = resample_diabetes1,
store_models = TRUE
)## INFO [01:00:09.485] [mlr3] Applying learner 'classif.nnet' on task 'telkom' (iter 1/1)
## # weights: 105
## initial value 27766.594626
## iter 10 value 18788.127187
## iter 20 value 18722.119419
## iter 30 value 18637.545617
## iter 40 value 18615.702061
## iter 50 value 18524.450777
## iter 60 value 18515.878138
## iter 70 value 18503.384498
## iter 80 value 18500.491407
## iter 90 value 18438.253895
## iter 100 value 18410.227125
## final value 18410.227125
## stopped after 100 iterations
## truth
## response 1 0
## 1 0 0
## 0 1848 8152
train_test_diabetes1$aggregate(list(msr("classif.acc"),
msr("classif.specificity"),
msr("classif.sensitivity"),
msr("classif.auc")
))## classif.acc classif.specificity classif.sensitivity classif.auc
## 0.815200 1.000000 0.000000 0.644358
Hasil diatas menunjukan accuracy tertinggi senilai 82% ada pada model random forest, oleh karena itu selanjutnya akan dilakukan hyperparameter tuning pada random forest
Algoritma Random Forest Tuning
Tuning hiperparameter bertujuan untuk meningkatkan performa model dalam hal prediksi dengan menemukan nilai hiperparameter terbaik. Hiperparameter yang dituning yaitu mtry dan maxdepth. Hiperparamter Random Forest yang digunakan adalah mtry yaitu banyaknya variabel yang digunakan untuk splitting pada tiap node dimana pada tahap ini digunakan nilai rentang 1 sampai 9, max.depth yaitu maksimal kedalaman pada tiap node di pohon final dimana pada tahap ini digunakan nilai rentang 3 sampai 15,
Susun Model
Menentukan Metode Resampling dan Mendefinisikan Tuning Hiperparameter
Hiperparameter yang dituning yaitu mtry dan maxdepth. Hiperparamter Random Forest yang digunakan adalah mtry yaitu banyaknya variabel yang digunakan untuk splitting pada tiap node dimana pada tahap ini digunakan nilai rentang 1 sampai 9, max.depth yaitu maksimal kedalaman pada tiap node di pohon final dimana pada tahap ini digunakan nilai rentang 3 sampai 15,
## Loading required package: paradox
##
## Attaching package: 'mlr3tuning'
## The following object is masked from 'package:e1071':
##
## tune
library(mlr3extralearners)
library(readr)
house_price1 <- read_csv("~/Desktop/SEMESTER 1 & 2 S2/PMS/house_price1.csv")##
## ── Column specification ────────────────────────────────────────────────────────
## cols(
## .default = col_character(),
## Id = col_double(),
## MSSubClass = col_double(),
## LotFrontage = col_double(),
## LotArea = col_double(),
## OverallQual = col_double(),
## OverallCond = col_double(),
## YearBuilt = col_double(),
## YearRemodAdd = col_double(),
## MasVnrArea = col_double(),
## BsmtFinSF1 = col_double(),
## BsmtFinSF2 = col_double(),
## BsmtUnfSF = col_double(),
## TotalBsmtSF = col_double(),
## `1stFlrSF` = col_double(),
## `2ndFlrSF` = col_double(),
## LowQualFinSF = col_double(),
## GrLivArea = col_double(),
## BsmtFullBath = col_double(),
## BsmtHalfBath = col_double(),
## FullBath = col_double()
## # ... with 18 more columns
## )
## ℹ Use `spec()` for the full column specifications.
data_house=house_price1
data_house <- data_house %>%
select(-Id)%>%
select_if(is.numeric) %>%
select(-contains("Bsmt"))%>% na.omit()
colnames(data_house)[colnames(data_house)=='1stFlrSF'] = 'X1stFlrSF'
colnames(data_house)[colnames(data_house)=='2ndFlrSF'] = 'X2ndFlrSF'
colnames(data_house)[colnames(data_house)=='3SsnPorch'] = 'X3SsnPorch'
task_house = TaskRegr$new(id="house",backend = data_house,target = "SalePrice")
install_learners("regr.gbm")
model_gbm <- lrn("regr.gbm")
model_elastic <- lrn("regr.glmnet")
param_bound_gbm <- ParamSet$new(params = list(ParamInt$new("n.trees",
lower = 100,
upper =400 ), ParamDbl$new("shrinkage",lower=0.01,upper=0.1)
)
)
param_bound_rf <- ParamSet$new(params = list(ParamInt$new("classif.ranger.max.depth", lower = 3, upper = 15),
ParamInt$new("classif.ranger.mtry", lower =1, upper = 9)))
# Setting tuningMenentukan Stopping Criteria dan Menentukan Metode Optimisasi
terminate = trm("evals", n_evals = 2) #banyaknya iterasi
tuner <- tnr("random_search") # metode tuning
# setting random forest
tune_rf <- TuningInstanceSingleCrit$new(
task = task_telkom,
learner = model_rf_classif,
measure = msr("classif.acc"),
terminator = terminate,
search_space = param_bound_rf,
resampling = resample_cv)
tuner$optimize(inst = tune_rf)## INFO [01:00:16.372] [bbotk] Starting to optimize 2 parameter(s) with '<OptimizerRandomSearch>' and '<TerminatorEvals> [n_evals=2]'
## INFO [01:00:16.391] [bbotk] Evaluating 1 configuration(s)
## INFO [01:00:16.456] [mlr3] Running benchmark with 10 resampling iterations
## INFO [01:00:16.465] [mlr3] Applying learner 'classweights.classif.ranger' on task 'telkom' (iter 7/10)
## INFO [01:00:42.635] [mlr3] Applying learner 'classweights.classif.ranger' on task 'telkom' (iter 3/10)
## INFO [01:01:04.578] [mlr3] Applying learner 'classweights.classif.ranger' on task 'telkom' (iter 4/10)
## INFO [01:01:27.585] [mlr3] Applying learner 'classweights.classif.ranger' on task 'telkom' (iter 8/10)
## INFO [01:01:51.221] [mlr3] Applying learner 'classweights.classif.ranger' on task 'telkom' (iter 6/10)
## INFO [01:02:12.956] [mlr3] Applying learner 'classweights.classif.ranger' on task 'telkom' (iter 5/10)
## INFO [01:02:35.668] [mlr3] Applying learner 'classweights.classif.ranger' on task 'telkom' (iter 9/10)
## INFO [01:03:00.014] [mlr3] Applying learner 'classweights.classif.ranger' on task 'telkom' (iter 2/10)
## INFO [01:03:21.970] [mlr3] Applying learner 'classweights.classif.ranger' on task 'telkom' (iter 1/10)
## INFO [01:03:45.969] [mlr3] Applying learner 'classweights.classif.ranger' on task 'telkom' (iter 10/10)
## INFO [01:04:18.120] [mlr3] Finished benchmark
## INFO [01:04:18.322] [bbotk] Result of batch 1:
## INFO [01:04:18.328] [bbotk] classif.ranger.max.depth classif.ranger.mtry classif.acc
## INFO [01:04:18.328] [bbotk] 10 3 0.70064
## INFO [01:04:18.328] [bbotk] uhash
## INFO [01:04:18.328] [bbotk] 1fce9655-9236-4d57-9bdd-caedf906c7ae
## INFO [01:04:18.334] [bbotk] Evaluating 1 configuration(s)
## INFO [01:04:18.387] [mlr3] Running benchmark with 10 resampling iterations
## INFO [01:04:18.396] [mlr3] Applying learner 'classweights.classif.ranger' on task 'telkom' (iter 4/10)
## INFO [01:04:46.240] [mlr3] Applying learner 'classweights.classif.ranger' on task 'telkom' (iter 10/10)
## INFO [01:05:06.864] [mlr3] Applying learner 'classweights.classif.ranger' on task 'telkom' (iter 6/10)
## INFO [01:05:28.182] [mlr3] Applying learner 'classweights.classif.ranger' on task 'telkom' (iter 5/10)
## INFO [01:05:50.031] [mlr3] Applying learner 'classweights.classif.ranger' on task 'telkom' (iter 2/10)
## INFO [01:06:11.046] [mlr3] Applying learner 'classweights.classif.ranger' on task 'telkom' (iter 3/10)
## INFO [01:06:32.877] [mlr3] Applying learner 'classweights.classif.ranger' on task 'telkom' (iter 1/10)
## INFO [01:06:53.530] [mlr3] Applying learner 'classweights.classif.ranger' on task 'telkom' (iter 8/10)
## INFO [01:07:13.704] [mlr3] Applying learner 'classweights.classif.ranger' on task 'telkom' (iter 7/10)
## INFO [01:07:34.875] [mlr3] Applying learner 'classweights.classif.ranger' on task 'telkom' (iter 9/10)
## INFO [01:07:55.727] [mlr3] Finished benchmark
## INFO [01:07:55.855] [bbotk] Result of batch 2:
## INFO [01:07:55.857] [bbotk] classif.ranger.max.depth classif.ranger.mtry classif.acc
## INFO [01:07:55.857] [bbotk] 8 4 0.68168
## INFO [01:07:55.857] [bbotk] uhash
## INFO [01:07:55.857] [bbotk] 8a644668-5d2b-4b15-a865-c2f65f0a76c0
## INFO [01:07:55.876] [bbotk] Finished optimizing after 2 evaluation(s)
## INFO [01:07:55.877] [bbotk] Result:
## INFO [01:07:55.879] [bbotk] classif.ranger.max.depth classif.ranger.mtry learner_param_vals x_domain
## INFO [01:07:55.879] [bbotk] 10 3 <list[4]> <list[2]>
## INFO [01:07:55.879] [bbotk] classif.acc
## INFO [01:07:55.879] [bbotk] 0.70064
Run model dengan Hiperparameter Terbaik
model_rf_tuned <- po("classweights",minor_weight=5) %>>% lrn("classif.ranger",
predict_type="prob",
importance="impurity",
mtry=best_param_rf$classif.ranger.mtry,
max.depth=best_param_rf$classif.ranger.max.depth
)
learner_result2 <- list(
model_rf_tuned
)
design_classif13 <- benchmark_grid(tasks = task_telkom,
learners = learner_result2,
resamplings = resample_cv
)
bmr_classif13 = benchmark(design_classif13,store_models = TRUE)## INFO [01:07:56.201] [mlr3] Running benchmark with 10 resampling iterations
## INFO [01:07:56.210] [mlr3] Applying learner 'classweights.classif.ranger' on task 'telkom' (iter 6/10)
## INFO [01:08:17.728] [mlr3] Applying learner 'classweights.classif.ranger' on task 'telkom' (iter 2/10)
## INFO [01:08:40.077] [mlr3] Applying learner 'classweights.classif.ranger' on task 'telkom' (iter 4/10)
## INFO [01:09:04.047] [mlr3] Applying learner 'classweights.classif.ranger' on task 'telkom' (iter 5/10)
## INFO [01:09:26.643] [mlr3] Applying learner 'classweights.classif.ranger' on task 'telkom' (iter 3/10)
## INFO [01:09:49.116] [mlr3] Applying learner 'classweights.classif.ranger' on task 'telkom' (iter 10/10)
## INFO [01:10:15.335] [mlr3] Applying learner 'classweights.classif.ranger' on task 'telkom' (iter 1/10)
## INFO [01:10:39.585] [mlr3] Applying learner 'classweights.classif.ranger' on task 'telkom' (iter 8/10)
## INFO [01:11:09.835] [mlr3] Applying learner 'classweights.classif.ranger' on task 'telkom' (iter 7/10)
## INFO [01:11:32.556] [mlr3] Applying learner 'classweights.classif.ranger' on task 'telkom' (iter 9/10)
## INFO [01:11:54.819] [mlr3] Finished benchmark
result_classif133 = bmr_classif13$aggregate(
list(msr("classif.acc"),
msr("classif.sensitivity"),
msr("classif.specificity"))
)Hasil diatas menunjukan accuracy menjadi menurun menjadi 67% sehingga model yang lebih baik adalah random forest sebelum dituning
Algoritma Bagging dan Adaboost
Model yang selanjutnya dicobakan adalah model dengan algoritma Bagging dan Adaboost
Susun Model
Menentukan Cara Pembagian Data
Komparasi Model
design_classif1333 <- benchmark_grid(tasks = task_telkom,
learners = learner_credit,
resamplings = resample_cv_credit
)
bmr_classif1333 = benchmark(design_classif1333,store_models = TRUE)## INFO [01:11:58.464] [mlr3] Running benchmark with 20 resampling iterations
## INFO [01:11:58.479] [mlr3] Applying learner 'classif.AdaBoostM1' on task 'telkom' (iter 1/10)
## INFO [01:12:01.665] [mlr3] Applying learner 'classif.AdaBoostM1' on task 'telkom' (iter 10/10)
## INFO [01:12:04.002] [mlr3] Applying learner 'classif.AdaBoostM1' on task 'telkom' (iter 5/10)
## INFO [01:12:06.309] [mlr3] Applying learner 'bagging clf' on task 'telkom' (iter 3/10)
## INFO [01:12:39.036] [mlr3] Applying learner 'bagging clf' on task 'telkom' (iter 7/10)
## INFO [01:13:09.952] [mlr3] Applying learner 'classif.AdaBoostM1' on task 'telkom' (iter 6/10)
## INFO [01:13:12.320] [mlr3] Applying learner 'bagging clf' on task 'telkom' (iter 5/10)
## INFO [01:13:46.957] [mlr3] Applying learner 'classif.AdaBoostM1' on task 'telkom' (iter 8/10)
## INFO [01:13:49.609] [mlr3] Applying learner 'bagging clf' on task 'telkom' (iter 9/10)
## INFO [01:14:20.418] [mlr3] Applying learner 'classif.AdaBoostM1' on task 'telkom' (iter 3/10)
## INFO [01:14:22.809] [mlr3] Applying learner 'bagging clf' on task 'telkom' (iter 2/10)
## INFO [01:14:58.464] [mlr3] Applying learner 'bagging clf' on task 'telkom' (iter 6/10)
## Growing trees.. Progress: 97%. Estimated remaining time: 1 seconds.
## INFO [01:15:38.653] [mlr3] Applying learner 'classif.AdaBoostM1' on task 'telkom' (iter 9/10)
## INFO [01:15:42.110] [mlr3] Applying learner 'classif.AdaBoostM1' on task 'telkom' (iter 7/10)
## INFO [01:15:45.286] [mlr3] Applying learner 'bagging clf' on task 'telkom' (iter 10/10)
## Growing trees.. Progress: 97%. Estimated remaining time: 0 seconds.
## INFO [01:16:23.200] [mlr3] Applying learner 'classif.AdaBoostM1' on task 'telkom' (iter 4/10)
## INFO [01:16:25.338] [mlr3] Applying learner 'bagging clf' on task 'telkom' (iter 8/10)
## INFO [01:17:01.637] [mlr3] Applying learner 'classif.AdaBoostM1' on task 'telkom' (iter 2/10)
## INFO [01:17:03.973] [mlr3] Applying learner 'bagging clf' on task 'telkom' (iter 1/10)
## INFO [01:17:39.160] [mlr3] Applying learner 'bagging clf' on task 'telkom' (iter 4/10)
## INFO [01:18:13.084] [mlr3] Finished benchmark
result_classif1333 = bmr_classif1333$aggregate(list(msr("classif.acc"),
msr("classif.specificity"),
msr("classif.sensitivity"))
)
result_classif1333Hasil diatas menunjukan bahwa model bagging lebih baik dengan accuracy sebesar 82%, akurasi ini lebih tinggi daripada random forest, sehingga selanjutnya akan dicobakan feature engineering berupa diskretisasi pada peubah frekuensi yaitu X3 dan X10 menggunakan model bagging untuk melihat apakah accuracy akan meningkat
Algoritma bagging dengan diskretisasi
Setelah diperoleh model terbaik adalah bagging classifier, selanjutnya akan dicoba jika diterapkan feature engineering berupa diskretisasi pada data untuk melihat apakah performa model meningkat.
Feature engineering yang dilakukan adalah diskretisasi. Diskretisasi dilakukan dengan metode equal frequency menggunakan quantile. Diskretisasi dilakukan pada peubah yang menyatakan frekuensi yaitu peubah x3 dan x10. Peubah tsb akan dipecah menjadi 4 kategori yaitu sangat jarang, jarang, sering, sangat sering.
Untuk Peubah X3 dipisah sebagai berikut: 0-10, 10-24, 24-50, dan 50-1007 begitu juga untuk peubah X10
Panggil Data dan Lakukan Diskretisasi
Tugas1=Tugas
Tugas1$Y=as.factor(Tugas1$Y)
# equal frequency discretization
eqfreq<-classIntervals(Tugas1$X3, 4, style = 'quantile')
eqfreq$brks## [1] 0 10 24 50 1007
Tugas1$X3<-cut(Tugas1$X3, breaks=eqfreq$brks, label=1:4,
include.lowest=TRUE)
eqfreq<-classIntervals(Tugas1$X10, 4, style = 'quantile')
eqfreq$brks## [1] 0.0 3.3 6.0 9.7 30.7
Susun Model
learner_credit <- list(model_bagging_classif
)
resample_cv_credit = rsmp("cv", folds = 10)
set.seed(123)
resample_cv_credit$instantiate(task = task_telkom1)
design_classif133333 <- benchmark_grid(tasks = task_telkom1,
learners = learner_credit,
resamplings = resample_cv_credit
)
bmr_classif133333 = benchmark(design_classif133333,store_models = TRUE)## INFO [01:19:01.340] [mlr3] Running benchmark with 10 resampling iterations
## INFO [01:19:01.386] [mlr3] Applying learner 'bagging clf' on task 'telkom' (iter 1/10)
## INFO [01:19:37.274] [mlr3] Applying learner 'bagging clf' on task 'telkom' (iter 9/10)
## INFO [01:21:03.754] [mlr3] Applying learner 'bagging clf' on task 'telkom' (iter 5/10)
## Growing trees.. Progress: 84%. Estimated remaining time: 5 seconds.
## INFO [01:21:47.486] [mlr3] Applying learner 'bagging clf' on task 'telkom' (iter 8/10)
## INFO [01:22:21.797] [mlr3] Applying learner 'bagging clf' on task 'telkom' (iter 7/10)
## INFO [01:22:55.328] [mlr3] Applying learner 'bagging clf' on task 'telkom' (iter 2/10)
## INFO [01:23:28.779] [mlr3] Applying learner 'bagging clf' on task 'telkom' (iter 3/10)
## INFO [01:24:01.156] [mlr3] Applying learner 'bagging clf' on task 'telkom' (iter 6/10)
## INFO [01:24:37.381] [mlr3] Applying learner 'bagging clf' on task 'telkom' (iter 4/10)
## INFO [01:25:09.337] [mlr3] Applying learner 'bagging clf' on task 'telkom' (iter 10/10)
## INFO [01:25:49.218] [mlr3] Finished benchmark
result_classif133333 = bmr_classif133333$aggregate(list(msr("classif.acc"), msr("classif.specificity"),
msr("classif.sensitivity"))
)
result_classif133333Visualisasi Kepentingan Peubah untuk Model Bagging dengan Diskretisasi
## Growing trees.. Progress: 54%. Estimated remaining time: 25 seconds.
## NULL
#learner_rf<- lrn("classif.ranger",importance="impurity",mtry=3,max.depth=13,min.node.size=1,num.trees=242)
importance <- data.frame(Predictors = names(model_bagging_classif$model$variable.importance),
impurity =model_bagging_classif$model$variable.importance
)
rownames(importance) <- NULL
importance %>% arrange(desc(impurity))ggplot(importance,
aes(x=impurity,
y=reorder(Predictors,impurity))
) +
geom_col(fill = "skyblue")+
geom_text(aes(label=round(impurity,2)),hjust=1.2)+ylab("Nilai Kepentingan Peubah")Visualisasi Kepentingan Peubah untuk Model Bagging tanpa Diskretisasi
Dipilih model terbaik adalah Model Bagging tanpa Diskretisasi karena hasil akurasi yang diperoleh tertinggi
## Growing trees.. Progress: 81%. Estimated remaining time: 7 seconds.
## NULL
#learner_rf<- lrn("classif.ranger",importance="impurity",mtry=3,max.depth=13,min.node.size=1,num.trees=242)
importance <- data.frame(Predictors = names(model_bagging_classif$model$variable.importance),
impurity =model_bagging_classif$model$variable.importance
)
rownames(importance) <- NULL
importance %>% arrange(desc(impurity))ggplot(importance,
aes(x=impurity,
y=reorder(Predictors,impurity))
) +
geom_col(fill = "skyblue")+
geom_text(aes(label=round(impurity,2)),hjust=1.2)+ylab("Nilai Kepentingan Peubah")