Memanggil Packages
library(tidyverse)
## -- Attaching packages --------------------------------------- tidyverse 1.3.1 --
## v ggplot2 3.3.5 v purrr 0.3.4
## v tibble 3.1.5 v dplyr 1.0.7
## v tidyr 1.1.4 v stringr 1.4.0
## v readr 2.0.2 v forcats 0.5.1
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
library(mice)
##
## Attaching package: 'mice'
## The following object is masked from 'package:stats':
##
## filter
## The following objects are masked from 'package:base':
##
## cbind, rbind
library(DataExplorer)
library(mlr3verse)
## Loading required package: mlr3
library(mlr3extralearners)
##
## Attaching package: 'mlr3extralearners'
## The following objects are masked from 'package:mlr3':
##
## lrn, lrns
Data yang digunakan adalah data Tingkat Resiko Investasi yang dilihat dari kemampuan membayar utang untuk setiap Negara. Terdapat 117 Negara yang memiliki rating yang berbeda. Peubah penjelas sebanyak 14 dan 1 peubah respon yaitu Risk.Level yang terbagi menjadi resiko tinggi dan resiko rendah. Ketika resikonya rendah maka banyak investor yang akan masuk, namun ketika resikonya tinggi maka tidak sembarang investor yang akan berinvestasi.
Import data ke R
mydata_investment <- read.csv("D:/3. KULIAH PPs STATISTIKA/SEMESTER 1/581-SAINS DATA (Jumat, 14.30)/tugas STA581 - investment risk level.csv", header= TRUE, sep=";",stringsAsFactors = TRUE)
glimpse(mydata_investment)
## Rows: 117
## Columns: 16
## $ Country <fct> AD, AE, AE-AZ, AE-RK, AM, AO, AR, AT, AU, AW, AZ, BD, BE, B~
## $ X1 <fct> "17,5", "18,2", "18,7", "", "14", "", "23,2527", "18,574", ~
## $ X2 <fct> "38674,616", "40105,1201", "76037,9968", "27882,8286", "425~
## $ X3 <fct> "172,754", "103,5228", "31,03626", "24,78532", "89,61882", ~
## $ X4 <fct> "0,68", "1,766", "2,63056", "1,29416", "1,44", "22,35646", ~
## $ X5 <fct> "1,2206", "0,8698", "1,4893", "1,753", "0,2562", "3,3422", ~
## $ X6 <fct> "1,7856", "2,65884", "1,85034", "2,23192", "4,748", "-0,878~
## $ X7 <fct> "-2,0843", "-0,7254", "-1,9008", "-1,1355", "2,3318", "-5,2~
## $ X8 <fct> "55", "102,5273835", "102,5273835", "102,5273835", "166,808~
## $ X9 <fct> "-26,52", "-13,5989", "-56,2416", "24,78532", "47,27262", "~
## $ X10 <fct> "2,8578619", "352,9105754", "199,9284217", "10,1088923", "1~
## $ X11 <fct> "8", "8,155", "8,155", "", "6,6", "10,3", "10,6", "2,019", ~
## $ X12 <fct> "23,0841", "24,85976", "20,3994", "21,69104", "19,403", "31~
## $ X13 <fct> "26,94344", "32,4774", "31,03926", "17,30888", "15,11172", ~
## $ X14 <fct> "3", "2,45", "", "", "18,5", "10,5", "11,05", "6", "5,4478"~
## $ Risk.Level <fct> low, low, low, low, high, high, high, low, low, high, high,~
Dengan Keterengan peubah predictor sebagai berikut:
| Variabel | Keterangan |
|---|---|
| X1 | capital adequacy ratio/Rasio Kecukupan Modal (%) average from last 5 years |
| X2 | GDP / Produk Domestik Bruto per capita (USD) |
| X3 | Gross External Debt/Utang Luas Negeri (% of GDP) average from last 5 years |
| X4 | growth of consumer price (%) average from last 5 years |
| X5 | growth of population (%) average from last 5 years |
| X6 | growth of Real GDP (%) average from last 5 years |
| X7 | growth of Real GDP per cap. (%) average from last 5 years |
| X8 | Loan-deposit ratio (%) average from last 5 years |
| X9 | Net External Debt (% of GDP) average from last 5 years |
| X10 | Nominal GDP (USD bn) |
| X11 | Non-performing loans (% of gross loans) average from last 5 years |
| X12 | percentage of gross domestic investment to GDP (%) average from last 5 years |
| X13 | percentage of gross domestic saving to GDP (%) average from last 5 years |
| X14 | unemployment rate (% labour force) average from last 5 years |
| Y | Klasifikasi |
Selanjutnya Akan dilakukan cleaning data dengan menghilangkan variabel Country pada data karena variabel tersebut tidak akan digunakan dalam pemodelan.
mydata <- mydata_investment %>% select(-Country)
glimpse(mydata)
## Rows: 117
## Columns: 15
## $ X1 <fct> "17,5", "18,2", "18,7", "", "14", "", "23,2527", "18,574", ~
## $ X2 <fct> "38674,616", "40105,1201", "76037,9968", "27882,8286", "425~
## $ X3 <fct> "172,754", "103,5228", "31,03626", "24,78532", "89,61882", ~
## $ X4 <fct> "0,68", "1,766", "2,63056", "1,29416", "1,44", "22,35646", ~
## $ X5 <fct> "1,2206", "0,8698", "1,4893", "1,753", "0,2562", "3,3422", ~
## $ X6 <fct> "1,7856", "2,65884", "1,85034", "2,23192", "4,748", "-0,878~
## $ X7 <fct> "-2,0843", "-0,7254", "-1,9008", "-1,1355", "2,3318", "-5,2~
## $ X8 <fct> "55", "102,5273835", "102,5273835", "102,5273835", "166,808~
## $ X9 <fct> "-26,52", "-13,5989", "-56,2416", "24,78532", "47,27262", "~
## $ X10 <fct> "2,8578619", "352,9105754", "199,9284217", "10,1088923", "1~
## $ X11 <fct> "8", "8,155", "8,155", "", "6,6", "10,3", "10,6", "2,019", ~
## $ X12 <fct> "23,0841", "24,85976", "20,3994", "21,69104", "19,403", "31~
## $ X13 <fct> "26,94344", "32,4774", "31,03926", "17,30888", "15,11172", ~
## $ X14 <fct> "3", "2,45", "", "", "18,5", "10,5", "11,05", "6", "5,4478"~
## $ Risk.Level <fct> low, low, low, low, high, high, high, low, low, high, high,~
summary(mydata)
## X1 X2 X3 X4 X5
## :13 1010,6177 : 1 102,7306: 1 -0,15102: 1 -0,0317: 1
## 15,2 : 2 10274,3779: 1 1026,495: 1 0,00116 : 1 -0,0396: 1
## 16 : 2 10589,0517: 1 103,0604: 1 0,1051 : 1 -0,0948: 1
## 17,2 : 2 11288,8489: 1 103,0684: 1 0,1424 : 1 -0,1929: 1
## 19,1 : 2 11363,6075: 1 103,5228: 1 0,2252 : 1 -0,2217: 1
## 20,1 : 2 11954,589 : 1 103,9071: 1 0,26994 : 1 -0,2417: 1
## (Other):94 (Other) :111 (Other) :111 (Other) :111 (Other):111
## X6 X7 X8 X9 X10
## -0,2368 : 1 -0,0467: 1 : 9 -0,57864: 1 1,1708787 : 1
## -0,46082: 1 -0,1226: 1 102,5273835: 3 -12,2529: 1 1,4908268 : 1
## -0,878 : 1 -0,1248: 1 100,1929751: 1 -12,9597: 1 1,703701 : 1
## -1,66952: 1 -0,1381: 1 101,298342 : 1 -13,2673: 1 1,8445126 : 1
## -5,135 : 1 -0,1394: 1 102,2867095: 1 -13,5989: 1 10,1088923: 1
## 0,1 : 1 -0,3001: 1 104,1374705: 1 -145,438: 1 10,3320543: 1
## (Other) :111 (Other):111 (Other) :101 (Other) :111 (Other) :111
## X11 X12 X13 X14 Risk.Level
## :21 12,66678: 1 10,94992: 1 :12 high:64
## 1 : 2 14,07878: 1 11,0075 : 1 4 : 5 low :53
## 3,2 : 2 15,49512: 1 11,1102 : 1 6,5 : 5
## 5 : 2 15,78406: 1 11,3842 : 1 4,5 : 4
## 8,155 : 2 16,1042 : 1 11,70278: 1 5 : 4
## 0,3357 : 1 16,44666: 1 12,78448: 1 6 : 3
## (Other):87 (Other) :111 (Other) :111 (Other):84
Missing Data
summary(is.na(mydata))
## X1 X2 X3 X4
## Mode :logical Mode :logical Mode :logical Mode :logical
## FALSE:117 FALSE:117 FALSE:117 FALSE:117
## X5 X6 X7 X8
## Mode :logical Mode :logical Mode :logical Mode :logical
## FALSE:117 FALSE:117 FALSE:117 FALSE:117
## X9 X10 X11 X12
## Mode :logical Mode :logical Mode :logical Mode :logical
## FALSE:117 FALSE:117 FALSE:117 FALSE:117
## X13 X14 Risk.Level
## Mode :logical Mode :logical Mode :logical
## FALSE:117 FALSE:117 FALSE:117
p <- function(x){sum(is.na(x))/length(x)*100}
apply(mydata, 2, p)
## X1 X2 X3 X4 X5 X6 X7
## 0 0 0 0 0 0 0
## X8 X9 X10 X11 X12 X13 X14
## 0 0 0 0 0 0 0
## Risk.Level
## 0
Penanganan Missing Value
Penanganan terhadap missing value menggunakan nilai rata-rata.
data_baru<-mydata
data_baru$X1[is.na(data_baru$X1)]<-mean(data_baru$X1, na.rm=TRUE)
## Warning in mean.default(data_baru$X1, na.rm = TRUE): argument is not numeric or
## logical: returning NA
data_baru$X8[is.na(data_baru$X8)]<-mean(data_baru$X8, na.rm=TRUE)
## Warning in mean.default(data_baru$X8, na.rm = TRUE): argument is not numeric or
## logical: returning NA
data_baru$X11[is.na(data_baru$X11)]<-mean(data_baru$X11, na.rm=TRUE)
## Warning in mean.default(data_baru$X11, na.rm = TRUE): argument is not numeric or
## logical: returning NA
data_baru$X14[is.na(data_baru$X14)]<-mean(data_baru$X14, na.rm=TRUE)
## Warning in mean.default(data_baru$X14, na.rm = TRUE): argument is not numeric or
## logical: returning NA
summary(data_baru)
## X1 X2 X3 X4 X5
## :13 1010,6177 : 1 102,7306: 1 -0,15102: 1 -0,0317: 1
## 15,2 : 2 10274,3779: 1 1026,495: 1 0,00116 : 1 -0,0396: 1
## 16 : 2 10589,0517: 1 103,0604: 1 0,1051 : 1 -0,0948: 1
## 17,2 : 2 11288,8489: 1 103,0684: 1 0,1424 : 1 -0,1929: 1
## 19,1 : 2 11363,6075: 1 103,5228: 1 0,2252 : 1 -0,2217: 1
## 20,1 : 2 11954,589 : 1 103,9071: 1 0,26994 : 1 -0,2417: 1
## (Other):94 (Other) :111 (Other) :111 (Other) :111 (Other):111
## X6 X7 X8 X9 X10
## -0,2368 : 1 -0,0467: 1 : 9 -0,57864: 1 1,1708787 : 1
## -0,46082: 1 -0,1226: 1 102,5273835: 3 -12,2529: 1 1,4908268 : 1
## -0,878 : 1 -0,1248: 1 100,1929751: 1 -12,9597: 1 1,703701 : 1
## -1,66952: 1 -0,1381: 1 101,298342 : 1 -13,2673: 1 1,8445126 : 1
## -5,135 : 1 -0,1394: 1 102,2867095: 1 -13,5989: 1 10,1088923: 1
## 0,1 : 1 -0,3001: 1 104,1374705: 1 -145,438: 1 10,3320543: 1
## (Other) :111 (Other):111 (Other) :101 (Other) :111 (Other) :111
## X11 X12 X13 X14 Risk.Level
## :21 12,66678: 1 10,94992: 1 :12 high:64
## 1 : 2 14,07878: 1 11,0075 : 1 4 : 5 low :53
## 3,2 : 2 15,49512: 1 11,1102 : 1 6,5 : 5
## 5 : 2 15,78406: 1 11,3842 : 1 4,5 : 4
## 8,155 : 2 16,1042 : 1 11,70278: 1 5 : 4
## 0,3357 : 1 16,44666: 1 12,78448: 1 6 : 3
## (Other):87 (Other) :111 (Other) :111 (Other):84
Import data ke ekosistem mlr3
task_investment = TaskClassif$new(id="investment",
backend = mydata,
target = "Risk.Level",
positive ="low")
#Menentukan cara pembagian data
resample_cv_investment = rsmp("cv", folds = 5)
set.seed(123)
resample_cv_investment$instantiate(task = task_investment)
str(resample_cv_investment$train_set(1))
## int [1:93] 5 6 16 17 20 22 23 32 38 49 ...
str(resample_cv_investment$test_set(1))
## int [1:24] 1 3 9 13 21 26 28 30 33 35 ...
Menentukan Model Ada Boost
install_learners("classif.AdaBoostM1")
# Adaptive Boost
model_ada_classif <- lrn("classif.AdaBoostM1",predict_type="prob")
model_ada_classif
## <LearnerClassifAdaBoostM1:classif.AdaBoostM1>
## * Model: -
## * Parameters: list()
## * Packages: mlr3extralearners, RWeka
## * Predict Type: prob
## * Feature types: numeric, factor, ordered
## * Properties: multiclass, twoclass
train_aboost = resample(task = task_investment,
learner = model_ada_classif,
resampling = resample_cv_investment,
store_models = TRUE
)
## INFO [18:51:00.972] [mlr3] Applying learner 'classif.AdaBoostM1' on task 'investment' (iter 2/5)
## INFO [18:51:01.687] [mlr3] Applying learner 'classif.AdaBoostM1' on task 'investment' (iter 5/5)
## INFO [18:51:02.261] [mlr3] Applying learner 'classif.AdaBoostM1' on task 'investment' (iter 4/5)
## INFO [18:51:02.823] [mlr3] Applying learner 'classif.AdaBoostM1' on task 'investment' (iter 1/5)
## INFO [18:51:03.394] [mlr3] Applying learner 'classif.AdaBoostM1' on task 'investment' (iter 3/5)
prediksi_test_ab = as.data.table(train_aboost$prediction())
prediksi_test_ab
## row_ids truth response prob.low prob.high
## 1: 1 low high 0.4751744 0.5248256
## 2: 3 low low 0.5652740 0.4347260
## 3: 9 low high 0.4751744 0.5248256
## 4: 13 low high 0.4751744 0.5248256
## 5: 21 high high 0.4751744 0.5248256
## ---
## 113: 99 low high 0.4369636 0.5630364
## 114: 104 low high 0.4369636 0.5630364
## 115: 105 high high 0.4369636 0.5630364
## 116: 107 low high 0.4369636 0.5630364
## 117: 108 high high 0.4369636 0.5630364
Dari output diatas dapat dilihat bahwa terdapat beberapa negara yang dapat diprediksi dengan tepat, namun ada juga negara yang hasil prediksinya kurang tepat. Adapun untuk melihat seberapa banyak prediksi baik yang tepat maupun yang tidak tepat dapat dilihat menggunakan confusion matrix. ## Confusion Matrix
train_aboost$prediction()$confusion
## truth
## response low high
## low 1 1
## high 52 63
Berdasarkan output confusion matrix pada model Klasifikasi Pohon diatas dapat dilihat bahwa terdapat sebanyak 63 negara yang diprediksi tepat memiliki resiko tinggi dan sebanyak 1 negara diprediksi tepat tidak memiliki potensial. NKemudian tidak terdapat kesalahan dalam prediksi, dan sebanyak 52 negara diprediksi berpotensi yang seharusnya tidak berpotensi.
akurasi_aboost <- train_aboost$aggregate(list(msr("classif.acc"),
msr("classif.specificity"),
msr("classif.sensitivity")
))
akurasi_aboost
## classif.acc classif.specificity classif.sensitivity
## 0.5471014 0.9857143 0.0200000
Jika dilihat berdasarkan akurasinya ternyata model Klasifikasi Pohon memiliki akurasi yang tergolong rendah yaitu sebesar 54.47%.
autoplot(train_aboost, type = "roc")
Berdasarkan nilai Akurasi yang diperoleh dapat disimpulkan bahwa model yang dihasilkan memiliki akurasi yang rendah. Untuk itu diperlukan model klasifikasi yang lain yang sekiranya dapat menghasilkan prediksi yang lebih baik.