Investment Risk, atau risiko investasi, merujuk pada potensi kehilangan uang atau dampak negatif lainnya yang terkait dengan investasi. Risiko ini dapat berasal dari berbagai sumber, seperti volatilitas pasar saham, fluktuasi harga mata uang, kondisi ekonomi global, politik, dan geografis.
Berikut ini merupakan variabel yang akan digunakan untuk analisis:
Kode Peubah:
X1 rasio kecukupan modal (%) rata-rata 5 tahun terakhir
X2 PDB per kapita USD
X3 Rata-rata Utang Luar Negeri Bruto (% PDB) selama 5 tahun terakhir
X4 pertumbuhan harga konsumen (%) rata-rata 5 tahun terakhir
X5 pertumbuhan penduduk (%) rata-rata 5 tahun terakhir
X6 pertumbuhan PDB Riil (%) rata-rata 5 tahun terakhir
X7 Pertumbuhan PDB riil per kapita. (%) rata-rata 5 tahun terakhir
X8 Rata-rata rasio pinjaman terhadap simpanan (%) selama 5 tahun terakhir
X9 Rata-rata Utang Luar Negeri Bersih (% PDB) selama 5 tahun terakhir
X10 PDB Nominal (miliar USD)
X11 Rata-rata pinjaman bermasalah (% dari pinjaman bruto) selama 5 tahun terakhir
X12 persentase investasi domestik bruto terhadap PDB (%) rata-rata 5 tahun terakhir
X13 tabungan domestik
X14 tingkat pengangguran
Tujuan Analisis: Melakukan klasifikasi untuk memprediksi level risiko inverstasi.
library(rstatix)
##
## Attaching package: 'rstatix'
## The following object is masked from 'package:stats':
##
## filter
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.5.1 ✔ tibble 3.2.1
## ✔ lubridate 1.9.3 ✔ tidyr 1.3.1
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks rstatix::filter(), stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(ggplot2)
library(randomForest)
## randomForest 4.7-1.2
## Type rfNews() to see new features/changes/bug fixes.
##
## Attaching package: 'randomForest'
##
## The following object is masked from 'package:dplyr':
##
## combine
##
## The following object is masked from 'package:ggplot2':
##
## margin
library(tidyverse)
library(mlr3verse)
## Loading required package: mlr3
library(mlr3extralearners)
library(rpart.plot)
## Loading required package: rpart
library(cowplot)
##
## Attaching package: 'cowplot'
##
## The following object is masked from 'package:lubridate':
##
## stamp
library(randomForestSRC)
##
## randomForestSRC 3.3.1
##
## Type rfsrc.news() to see new features, changes, and bug fixes.
##
##
##
## Attaching package: 'randomForestSRC'
##
## The following object is masked from 'package:mlr3verse':
##
## tune
##
## The following object is masked from 'package:purrr':
##
## partial
library(graphics)
library(dplyr)
library(ranger)
##
## Attaching package: 'ranger'
##
## The following object is masked from 'package:randomForest':
##
## importance
library(caret)
## Loading required package: lattice
##
## Attaching package: 'caret'
##
## The following object is masked from 'package:purrr':
##
## lift
invest <- read.csv("investasi.csv", stringsAsFactors = TRUE)
testing <- read.csv("testing.csv", stringsAsFactors = TRUE)
invest
## Country X1 X2 X3 X4 X5 X6 X7
## 1 AD 17.5000 38674.6160 172.75400 0.68000 1.2206 1.78560 -2.0843
## 2 AE 18.2000 40105.1201 103.52280 1.76600 0.8698 2.65884 -0.7254
## 3 AE-AZ 18.7000 76037.9968 31.03626 2.63056 1.4893 1.85034 -1.9008
## 4 AE-RK NA 27882.8286 24.78532 1.29416 1.7530 2.23192 -1.1355
## 5 AM 14.0000 4251.3977 89.61882 1.44000 0.2562 4.74800 2.3318
## 6 AO NA 2033.8999 57.05566 22.35646 3.3422 -0.87800 -5.2032
## 7 AR 23.2527 9203.4287 43.25546 36.70346 0.9657 -0.23680 -3.7297
## 8 AT 18.5740 53174.2385 159.39690 1.52348 0.7259 1.88048 -0.3001
## 9 AU 15.7000 63972.3400 121.98890 1.65124 1.4790 2.44592 0.0306
## 10 AW 33.5000 24642.7034 92.84624 1.21694 0.7972 2.06486 -4.7211
## 11 AZ 25.3012 5083.2568 43.35272 6.85276 1.0510 0.39070 -1.7366
## 12 BD 4.2000 2323.5586 19.74352 5.81200 1.0568 7.39000 6.0712
## 13 BE 19.3188 49537.5785 256.72570 1.64000 0.5259 1.70044 -0.4905
## 14 BG 22.7406 11288.8489 70.29080 0.77854 -0.7095 3.61996 2.7008
## 15 BH 20.0000 22003.1172 214.20080 1.82200 4.4021 2.80902 -3.2531
## 16 BJ 10.5000 1420.6492 49.56500 0.22520 2.7684 4.87736 2.2133
## 17 BO 12.2800 3372.3576 32.27932 2.92384 1.4362 3.95132 -0.0467
## 18 BR 19.1400 7372.9153 35.67380 5.72400 0.7789 -0.46082 -1.3423
## 19 BY NA 6453.9238 69.54962 8.39000 0.0197 0.10000 0.6603
## 20 CA 16.0956 51704.8992 117.76010 1.67406 1.1918 1.79828 -0.5879
## 21 CG 22.3000 2378.0444 70.34826 2.03374 2.5889 -5.13500 -8.1338
## 22 CH 19.3000 89770.8521 275.61690 0.00116 0.8402 1.88522 0.1733
## 23 CI NA 2594.7038 35.68100 0.75212 2.5779 7.29640 3.3099
## 24 CL 14.2800 15986.3031 65.82414 2.97824 1.2484 1.97106 -0.8925
## 25 CM 9.1000 1690.8639 39.74812 1.54052 2.6442 4.35350 0.7177
## 26 CN 14.7045 12226.6610 13.63044 2.00000 0.4575 6.64354 5.2661
## 27 CO 17.2000 5859.6535 47.58910 4.70940 1.3766 2.44972 -0.8876
## 28 CR 13.2840 11954.5890 44.98044 1.34600 0.9961 3.24766 0.7026
## 29 CV 19.4200 3466.2500 115.31060 0.37920 1.1636 3.92138 -0.4036
## 30 CY 16.0000 30630.2905 1026.49500 -0.15102 0.1535 4.62534 2.8078
## 31 CZ 21.3796 27044.7929 79.21186 1.57500 0.2067 3.72134 1.3165
## 32 DE 18.5800 50891.5812 165.29300 1.20768 0.4835 1.62892 -0.1226
## 33 DK 22.6000 67565.6556 155.84030 0.54000 0.3613 2.68708 1.3106
## 34 DO 18.6500 8172.2496 42.49502 2.22228 0.9213 6.05690 2.4036
## 35 EC 13.4000 5830.4242 49.66284 1.23328 1.7062 0.50846 -2.7675
## 36 EE 25.3120 26427.0455 84.20358 2.03974 0.2136 3.94866 2.6995
## 37 EG 20.1000 3756.4221 31.92960 16.16054 2.0540 4.44796 2.2412
## 38 ES 16.9822 30488.0476 176.49180 0.71830 0.3949 2.84406 -0.4857
## 39 ET NA 891.6737 33.04222 10.37682 2.6572 9.06000 5.5428
## 40 FI 20.1000 53937.2819 257.89490 0.67100 0.2165 1.82644 0.9465
## 41 FR 19.6501 44939.9046 247.44200 0.99026 0.2532 1.63728 -0.4225
## 42 GA NA 7803.8309 39.13694 2.80396 2.7047 2.25300 -1.5934
## 43 GB 21.6000 46723.9041 406.04150 1.53050 0.6090 1.70254 -1.3484
## 44 GE 17.6000 4422.7082 104.45110 3.93800 -0.0317 4.12018 2.3147
## 45 GH 15.0000 2353.8541 59.62672 12.94200 2.2431 5.29400 2.6949
## 46 GR 16.6643 19404.1830 239.31940 0.26994 -0.2217 0.75896 -0.5866
## 47 GT 16.1000 4478.2807 34.07538 3.74200 1.9677 3.40778 0.3178
## 48 HK 20.7000 50214.6484 446.31540 2.43600 0.8510 1.99164 -0.5657
## 49 HR 25.5000 16617.8744 89.53644 0.55302 -0.7565 3.00696 1.6712
## 50 HU 18.2802 18224.0904 115.04970 1.84622 -0.2417 4.07978 2.5449
## 51 ID 23.9000 4223.4646 35.44648 3.94398 1.1454 5.03546 2.5001
## 52 IE 25.4692 91715.2029 815.34610 0.32334 1.1977 10.07624 4.5268
## 53 IL NA 50813.0432 26.75248 0.14240 1.6423 3.36048 0.7084
## 54 IN 13.6000 2218.5362 20.80796 4.24752 1.0491 6.72450 2.6258
## 55 IQ NA 4270.7893 37.55918 0.44192 2.4876 3.80000 -1.3546
## 56 IS 24.8200 66458.8741 108.86200 0.41926 2.0438 4.63556 0.3121
## 57 IT 16.0000 34641.2557 129.37690 0.65218 -0.0396 0.98170 -0.8850
## 58 JM 14.3000 4938.6880 90.75014 3.60208 0.4806 1.18000 -1.4606
## 59 JO 17.9300 4433.1037 70.49410 1.37566 1.9443 2.02956 -0.6875
## 60 JP 17.3000 40838.2838 75.76394 0.51938 -0.1929 0.91240 -0.1394
## 61 KE 18.4444 2025.2907 52.90428 6.28796 2.8000 5.62820 1.9242
## 62 KR 14.8000 35337.0758 26.36510 1.09614 0.3751 2.77248 1.6542
## 63 KW NA 30276.8754 47.57912 1.88400 1.9857 0.13772 -3.7375
## 64 KZ 26.9700 10589.0517 56.21532 7.95000 1.3350 3.00000 0.9054
## 65 LK 16.5000 3822.1732 59.36342 4.21800 0.8755 3.67800 1.0805
## 66 LS 22.9520 1010.6177 61.18192 5.00340 0.7958 0.39388 -3.2384
## 67 LS 11.0000 2637.6890 85.47152 1.81194 1.5534 6.57820 3.6709
## 68 LT 21.8073 22636.1217 78.06336 1.69862 -0.8862 3.42028 3.7273
## 69 LU 23.9000 124340.3835 6908.35200 1.17428 2.0218 3.22626 0.0792
## 70 LV 24.9680 19638.1070 134.37960 1.70144 -0.7032 3.13634 2.3134
## 71 MA 15.2000 3301.6090 45.30954 1.18800 1.2641 3.09514 -0.4997
## 72 MK 16.6966 6571.8228 72.20372 0.62200 0.0389 2.77890 1.0689
## 73 MN NA 4393.3037 218.85680 4.98958 1.8006 4.25820 0.9093
## 74 MO 14.5000 52074.0604 194.62240 2.78282 1.2126 -1.66952 -9.8453
## 75 MT 23.9600 31441.2784 761.28590 1.32032 3.1949 6.53774 -0.1381
## 76 MV 47.5000 8656.5566 35.18078 0.88000 3.5095 6.30000 -3.6495
## 77 MX 17.7000 9729.2631 37.27282 4.02524 1.1350 2.01104 -1.4444
## 78 MY 18.3000 11363.6075 65.23842 1.91000 1.3474 4.87800 1.3926
## 79 MZ 26.0000 434.4606 356.19670 9.04260 2.9384 3.92880 -0.4276
## 80 <NA> 15.2000 5051.3480 60.48620 4.85712 1.8806 0.75446 -3.5756
## 81 NG 15.4000 2149.7791 24.81464 12.94034 2.6197 1.19458 -2.3145
## 82 NI 21.7500 1986.7204 83.97706 4.34306 1.0414 1.37996 -1.0152
## 83 NL 18.9026 57230.6715 512.18330 1.17730 0.5927 2.21984 0.4875
## 84 NO 23.1000 82858.2833 155.89490 2.61900 0.8375 1.46654 0.1530
## 85 NZ NA 48925.4692 103.06840 1.20152 2.0008 3.39204 0.0639
## 86 OM 19.1000 14957.4883 88.59674 0.76000 2.3197 1.99976 -2.3013
## 87 PA 16.2500 13872.7952 156.66240 0.42400 1.6872 4.58302 -1.8406
## 88 PE 15.5888 6528.2061 35.38870 2.69720 1.0511 3.17022 -0.7479
## 89 PH 14.9396 3658.9600 32.33956 2.49528 1.4217 6.56308 1.9917
## 90 PK 17.2000 1406.1297 29.25594 4.73824 1.8710 4.29018 1.5142
## 91 PL 20.1490 17732.6481 67.90290 0.80874 -0.0948 4.34840 3.1314
## 92 PT 16.7000 25282.8171 203.15930 0.83600 -0.3333 2.53122 0.9934
## 93 PY 19.1000 5054.1153 43.10920 3.52000 1.2931 2.96754 0.8830
## 94 QA 18.8000 55338.4835 108.79850 0.82252 2.3456 1.66590 -2.3610
## 95 RO 23.2000 14981.8900 52.57054 1.51808 -0.5878 4.71770 3.9390
## 96 RS 21.8000 8768.7320 86.54676 1.90000 -0.5088 3.17400 3.1268
## 97 RU 12.7000 10274.3779 33.40582 6.72076 0.1073 0.97740 0.6743
## 98 RW 23.3000 825.5581 52.98370 4.20636 2.6417 7.36908 2.2852
## 99 SA NA 21664.6362 48.19680 0.76200 2.5009 1.56022 -2.5833
## 100 SC 19.0200 13490.5158 109.22200 1.18702 1.0858 3.51306 -1.1137
## X8 X9 X10 X11 X12 X13 X14
## 1 55.00000 -26.52000 2.857862 8.0000 23.08410 26.94344 3.0000
## 2 102.52738 -13.59890 352.910575 8.1550 24.85976 32.47740 2.4500
## 3 102.52738 -56.24160 199.928422 8.1550 20.39940 31.03926 NA
## 4 102.52738 24.78532 10.108892 NA 21.69104 17.30888 NA
## 5 166.80851 47.27262 12.645460 6.6000 19.40300 15.11172 18.5000
## 6 34.81845 15.44938 62.485865 10.3000 31.12380 20.57210 10.5000
## 7 NA -5.01348 375.190755 10.6000 16.71368 13.81918 11.0500
## 8 116.41876 15.36980 429.980978 2.0190 24.78244 26.89982 6.0000
## 9 191.74943 57.95768 1359.132847 0.9600 24.28828 22.49670 5.4478
## 10 80.54508 28.09668 2.383969 5.0000 21.13634 24.49756 8.0000
## 11 110.63987 -174.36800 42.607177 NA 23.63816 29.44668 7.0000
## 12 78.40700 4.91586 347.147671 7.7000 32.70006 32.19390 5.0000
## 13 90.42874 -18.98450 514.176961 NA 24.55938 24.73148 6.0000
## 14 72.99709 -12.95970 69.103768 5.7984 20.46028 23.25440 5.3000
## 15 NA -51.11920 34.539229 NA 31.14198 28.74462 4.0000
## 16 81.54536 20.59038 15.355253 17.0000 23.40036 19.12786 2.3000
## 17 101.29834 -20.17800 37.238307 NA 20.84156 16.34698 8.5000
## 18 76.95112 10.01940 1444.733210 NA 15.49512 13.50872 13.9500
## 19 149.03716 42.60472 60.258857 4.8292 28.15432 18.39389 4.5000
## 20 106.86400 45.53354 1721.506090 0.5333 23.26748 20.61472 7.4503
## 21 84.09314 54.86688 9.707663 24.4000 46.82746 35.13040 NA
## 22 82.50000 -152.44500 749.017673 0.7500 24.41888 32.83370 3.1728
## 23 83.29876 7.37624 61.348608 8.8000 20.82476 19.42264 12.0000
## 24 116.51738 14.00910 252.940034 1.7229 22.48684 20.37026 10.0000
## 25 87.68676 26.04546 40.349134 20.0000 28.82414 25.75278 NA
## 26 94.22575 -26.98080 14866.703370 1.8396 43.17774 44.69396 4.9000
## 27 117.32727 9.92820 271.346897 3.1800 22.22970 18.09570 13.0000
## 28 125.26214 1.74676 61.520675 2.7000 17.93502 15.88014 15.0000
## 29 67.45486 51.37952 1.703701 9.5200 35.67148 32.54966 15.8000
## 30 68.15254 456.48640 23.804052 NA 17.90342 14.38300 7.8000
## 31 76.50765 -16.61460 243.530380 2.6562 27.97028 30.60572 3.4000
## 32 138.35724 -15.64810 3793.593164 NA 20.71144 28.03106 4.8177
## 33 359.13886 -5.59082 355.184032 1.8000 22.06322 29.24874 5.4000
## 34 80.10580 20.97700 79.001191 1.8500 23.77748 23.65308 5.7000
## 35 99.71869 8.90856 98.808010 NA 26.15664 26.23654 7.2000
## 36 104.68417 -13.26730 30.960228 0.3791 26.20832 28.47148 6.4000
## 37 53.29829 11.90716 361.845786 3.9000 15.78406 11.70278 7.0000
## 38 117.48908 83.40662 1278.325953 2.8512 19.68004 22.06638 16.1639
## 39 NA 27.57822 107.795527 11.1000 39.33340 31.44122 18.0000
## 40 175.16013 68.78926 270.625631 1.4000 23.67910 22.42012 7.8000
## 41 139.57196 36.80590 2609.943503 2.8415 23.40664 22.34918 9.3242
## 42 72.26748 29.34950 15.062255 11.2000 32.40134 32.38396 19.0000
## 43 91.33396 31.42504 2707.744043 1.2157 17.99490 14.01296 5.5665
## 44 145.18138 62.38444 15.891616 2.3000 27.43172 19.16908 20.0000
## 45 50.13182 47.67084 67.471195 15.0000 26.24074 23.15904 NA
## 46 89.84441 134.16510 188.985393 26.9780 12.66678 10.94992 18.3000
## 47 72.98978 2.81678 77.604632 1.8301 14.07878 15.21612 4.0000
## 48 68.82672 -283.26700 349.444713 0.9024 21.19750 24.59920 6.5000
## 49 82.18115 32.12062 56.170837 7.1776 21.94236 25.00206 7.5000
## 50 79.03623 11.68846 155.013041 0.9250 24.36458 27.06952 4.4000
## 51 96.57359 9.73972 1062.299663 3.0600 33.99662 32.80276 6.5000
## 52 84.42461 -345.29200 417.683180 3.5406 34.89774 43.30982 6.5000
## 53 85.93030 -47.40690 403.526464 1.4760 21.12638 24.84940 4.5000
## 54 73.69553 0.68776 2660.261329 9.5000 31.20112 30.23156 7.5000
## 55 NA 1.49128 165.493039 NA 19.14184 22.59400 13.5000
## 56 160.95762 28.71610 21.714538 2.9022 20.96484 26.28896 7.8000
## 57 111.19617 50.94154 1880.708359 NA 17.86622 20.23166 10.8839
## 58 88.86704 38.11946 13.812422 2.8000 22.55048 21.29292 8.5000
## 59 93.88143 10.63962 43.697563 5.4000 18.98000 11.38420 22.0000
## 60 70.95343 -44.36040 5043.573440 1.0734 25.31864 27.87758 2.7383
## 61 87.40996 39.30272 100.470001 14.1390 17.13230 11.00750 11.4537
## 62 123.99717 -26.86070 1631.134780 1.0000 30.95906 36.10606 4.0000
## 63 NA -271.87800 105.949023 NA 26.68898 32.14778 NA
## 64 83.52300 -38.63250 171.239891 7.9000 26.82074 26.28400 5.1000
## 65 89.13270 46.95208 80.676726 5.3000 29.69044 27.95642 5.0000
## 66 59.50145 12.39604 1.844513 4.1991 28.89906 16.64574 24.6500
## 67 NA 72.94226 19.129116 3.2000 33.45200 30.12460 NA
## 68 68.23179 16.85362 55.761983 0.9915 19.48272 20.31900 9.0000
## 69 42.65388 -1955.72000 73.055370 1.0280 18.38978 34.25134 6.8000
## 70 77.92536 22.27304 33.430044 3.5221 22.67616 23.11992 8.1000
## 71 102.28671 15.76728 112.869983 8.3515 32.29572 29.09784 11.5000
## 72 94.67722 23.32864 12.263700 3.2613 32.41148 31.85910 16.3000
## 73 77.34337 180.83030 13.269000 11.6788 32.50458 24.22642 7.3000
## 74 81.91411 -213.14400 24.333081 0.3357 19.61144 55.08932 2.4000
## 75 67.94252 -212.97000 14.474956 3.6629 22.98726 29.32880 4.3000
## 76 78.58290 -0.57864 3.767023 8.3000 20.09944 25.59820 6.5000
## 77 99.03247 8.52064 1073.915464 2.4292 22.74344 21.51140 4.0000
## 78 113.38897 -16.25250 336.664465 1.6600 24.38356 27.92792 4.0000
## 79 49.29687 282.81000 14.374968 11.8000 42.66772 15.14566 NA
## 80 91.29415 19.01378 10.710329 6.4000 20.90098 13.32220 23.0000
## 81 64.18761 -2.27422 401.028628 6.0000 18.31216 18.87146 22.0000
## 82 84.09234 50.63570 12.621466 4.1000 29.94454 26.86952 4.5000
## 83 161.74143 12.94086 910.005594 NA 21.16826 30.30356 4.6000
## 84 207.31980 -35.66660 362.571122 NA 28.22544 32.62790 4.2000
## 85 138.38958 51.69554 208.833638 NA 23.66948 21.26348 4.8000
## 86 148.12624 20.96404 64.648375 4.2000 26.17416 14.67168 NA
## 87 121.21618 39.83066 52.938074 2.1500 41.14624 36.51452 12.0000
## 88 116.71299 -20.74610 204.753978 4.1276 21.80948 20.42292 9.7000
## 89 74.25542 -12.25290 363.429119 1.6736 24.97170 24.32906 8.5000
## 90 61.34609 20.46164 262.232162 9.1000 16.10420 12.78448 6.5000
## 91 93.36164 27.71838 594.155788 3.7123 20.15800 20.38070 6.2000
## 92 113.24087 86.35888 230.736935 6.2000 17.22924 18.41932 7.1000
## 93 104.13747 9.42000 35.304238 4.9000 21.36214 23.25368 5.5000
## 94 163.71672 1.28492 146.400550 2.0000 42.35736 46.79924 0.1200
## 95 74.41554 18.74188 248.716040 4.0580 23.69546 20.98926 5.0000
## 96 90.19331 38.24144 52.960139 5.0000 20.81872 17.03374 9.7000
## 97 122.72076 -30.73920 1471.003881 NA 22.78718 27.47832 5.4000
## 98 111.94390 36.62538 10.332054 4.5000 22.69110 11.11020 NA
## 99 NA -62.91130 700.117867 NA 29.55294 29.68834 NA
## 100 45.41537 31.39798 1.170879 3.8700 34.72552 17.38966 4.5000
## Risk.Level
## 1 low
## 2 low
## 3 low
## 4 low
## 5 high
## 6 high
## 7 high
## 8 low
## 9 low
## 10 high
## 11 high
## 12 high
## 13 low
## 14 low
## 15 high
## 16 high
## 17 high
## 18 high
## 19 high
## 20 low
## 21 high
## 22 low
## 23 high
## 24 low
## 25 high
## 26 low
## 27 high
## 28 high
## 29 high
## 30 high
## 31 low
## 32 low
## 33 low
## 34 high
## 35 high
## 36 low
## 37 high
## 38 low
## 39 high
## 40 low
## 41 low
## 42 high
## 43 low
## 44 high
## 45 high
## 46 high
## 47 high
## 48 low
## 49 high
## 50 low
## 51 low
## 52 low
## 53 low
## 54 high
## 55 high
## 56 low
## 57 high
## 58 high
## 59 high
## 60 low
## 61 high
## 62 low
## 63 low
## 64 low
## 65 high
## 66 high
## 67 high
## 68 low
## 69 low
## 70 low
## 71 high
## 72 high
## 73 high
## 74 low
## 75 low
## 76 high
## 77 high
## 78 low
## 79 high
## 80 high
## 81 high
## 82 high
## 83 low
## 84 low
## 85 low
## 86 high
## 87 high
## 88 low
## 89 low
## 90 high
## 91 low
## 92 low
## 93 high
## 94 low
## 95 high
## 96 high
## 97 low
## 98 high
## 99 low
## 100 high
Dari data diatas, ada satu kolom yang nantinya tidak diperlukan untuk analisis, yaitu country.
Selanjutnya, lakukan penghapusan kolom country
invest1 <- invest %>% select(-Country)
head(invest1)
## X1 X2 X3 X4 X5 X6 X7 X8 X9
## 1 17.5 38674.616 172.75400 0.68000 1.2206 1.78560 -2.0843 55.00000 -26.52000
## 2 18.2 40105.120 103.52280 1.76600 0.8698 2.65884 -0.7254 102.52738 -13.59890
## 3 18.7 76037.997 31.03626 2.63056 1.4893 1.85034 -1.9008 102.52738 -56.24160
## 4 NA 27882.829 24.78532 1.29416 1.7530 2.23192 -1.1355 102.52738 24.78532
## 5 14.0 4251.398 89.61882 1.44000 0.2562 4.74800 2.3318 166.80851 47.27262
## 6 NA 2033.900 57.05566 22.35646 3.3422 -0.87800 -5.2032 34.81845 15.44938
## X10 X11 X12 X13 X14 Risk.Level
## 1 2.857862 8.000 23.08410 26.94344 3.00 low
## 2 352.910575 8.155 24.85976 32.47740 2.45 low
## 3 199.928422 8.155 20.39940 31.03926 NA low
## 4 10.108892 NA 21.69104 17.30888 NA low
## 5 12.645460 6.600 19.40300 15.11172 18.50 high
## 6 62.485865 10.300 31.12380 20.57210 10.50 high
skimr::skim(invest)
| Name | invest |
| Number of rows | 100 |
| Number of columns | 16 |
| _______________________ | |
| Column type frequency: | |
| factor | 2 |
| numeric | 14 |
| ________________________ | |
| Group variables | None |
Variable type: factor
| skim_variable | n_missing | complete_rate | ordered | n_unique | top_counts |
|---|---|---|---|---|---|
| Country | 1 | 0.99 | FALSE | 98 | LS: 2, AD: 1, AE: 1, AE-: 1 |
| Risk.Level | 0 | 1.00 | FALSE | 2 | hig: 54, low: 46 |
Variable type: numeric
| skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
|---|---|---|---|---|---|---|---|---|---|---|
| X1 | 12 | 0.88 | 18.97 | 5.42 | 4.20 | 15.93 | 18.58 | 21.80 | 47.50 | ▁▇▃▁▁ |
| X2 | 0 | 1.00 | 22641.57 | 24846.18 | 434.46 | 4265.94 | 11659.10 | 34815.21 | 124340.38 | ▇▂▂▁▁ |
| X3 | 0 | 1.00 | 191.93 | 697.33 | 13.63 | 42.96 | 70.42 | 130.63 | 6908.35 | ▇▁▁▁▁ |
| X4 | 0 | 1.00 | 3.26 | 4.86 | -0.15 | 0.87 | 1.70 | 3.94 | 36.70 | ▇▁▁▁▁ |
| X5 | 0 | 1.00 | 1.20 | 1.06 | -0.89 | 0.44 | 1.14 | 1.95 | 4.40 | ▃▇▅▃▁ |
| X6 | 0 | 1.00 | 3.08 | 2.26 | -5.14 | 1.76 | 2.98 | 4.30 | 10.08 | ▁▂▇▃▁ |
| X7 | 0 | 1.00 | 0.11 | 2.58 | -9.85 | -1.19 | 0.07 | 1.94 | 6.07 | ▁▁▆▇▂ |
| X8 | 7 | 0.93 | 99.94 | 42.61 | 34.82 | 76.95 | 90.19 | 113.39 | 359.14 | ▇▅▁▁▁ |
| X9 | 0 | 1.00 | -13.58 | 217.07 | -1955.72 | -14.11 | 12.67 | 36.67 | 456.49 | ▁▁▁▂▇ |
| X10 | 0 | 1.00 | 582.32 | 1659.10 | 1.17 | 32.81 | 106.87 | 366.37 | 14866.70 | ▇▁▁▁▁ |
| X11 | 17 | 0.83 | 5.53 | 5.14 | 0.34 | 1.93 | 3.90 | 7.95 | 26.98 | ▇▃▁▁▁ |
| X12 | 0 | 1.00 | 24.96 | 6.73 | 12.67 | 20.79 | 23.40 | 28.38 | 46.83 | ▃▇▃▁▁ |
| X13 | 0 | 1.00 | 24.48 | 8.07 | 10.95 | 19.06 | 24.28 | 29.36 | 55.09 | ▅▇▅▁▁ |
| X14 | 11 | 0.89 | 8.44 | 5.29 | 0.12 | 4.82 | 6.80 | 10.50 | 24.65 | ▆▇▂▂▁ |
Dari output diatas dapat diketahui yaitu ada 57 missing data yang terdiri dari: 12 dari var X1 7 dari var X8 17 dari var X11 11 dari var X14
Berikut ini adalah boxplot dari setiap variable
x1_y <- ggplot(data = invest, mapping = aes(x = X1, y = Risk.Level)) +
geom_boxplot()
x2_y <- ggplot(data = invest, mapping = aes(x = X2, y = Risk.Level)) +
geom_boxplot()
x3_y <- ggplot(data = invest, mapping = aes(x = X3, y = Risk.Level)) +
geom_boxplot()
x4_y <- ggplot(data = invest, mapping = aes(x = X4, y = Risk.Level)) +
geom_boxplot()
x5_y <- ggplot(data = invest, mapping = aes(x = X5, y = Risk.Level)) +
geom_boxplot()
x6_y <- ggplot(data = invest, mapping = aes(x = X6, y = Risk.Level)) +
geom_boxplot()
x7_y <- ggplot(data = invest, mapping = aes(x = X7, y = Risk.Level)) +
geom_boxplot()
x8_y <- ggplot(data = invest, mapping = aes(x = X8, y = Risk.Level)) +
geom_boxplot()
x9_y <- ggplot(data = invest, mapping = aes(x = X9, y = Risk.Level)) +
geom_boxplot()
x10_y <- ggplot(data = invest, mapping = aes(x = X10, y = Risk.Level)) +
geom_boxplot()
x11_y <- ggplot(data = invest, mapping = aes(x = X11, y = Risk.Level)) +
geom_boxplot()
x12_y <- ggplot(data = invest, mapping = aes(x = X12, y = Risk.Level)) +
geom_boxplot()
x13_y <- ggplot(data = invest, mapping = aes(x = X13, y = Risk.Level)) +
geom_boxplot()
x14_y <- ggplot(data = invest, mapping = aes(x = X14, y = Risk.Level)) +
geom_boxplot()
plot_grid(x1_y, x2_y, x3_y, x4_y)
## Warning: Removed 12 rows containing non-finite outside the scale range
## (`stat_boxplot()`).
plot_grid(x5_y, x6_y, x7_y, x8_y)
## Warning: Removed 7 rows containing non-finite outside the scale range
## (`stat_boxplot()`).
plot_grid(x9_y, x10_y, x11_y, x12_y)
## Warning: Removed 17 rows containing non-finite outside the scale range
## (`stat_boxplot()`).
plot_grid(x13_y, x14_y)
## Warning: Removed 11 rows containing non-finite outside the scale range
## (`stat_boxplot()`).
Untuk mengatasi missing value bisa dilakukan pendugaan dengan mengisi atau mengganti missing data dengan nilai median pada masing-masing variabel
invest1$X1 = ifelse(is.na(invest$X1),
ave(invest$X1, FUN = function(x) median(invest$X1, na.rm = TRUE)), invest$X1)
invest1$X8 = ifelse(is.na(invest$X8),
ave(invest$X8, FUN = function(x) median(invest$X8, na.rm = TRUE)), invest$X8)
invest1$X11 = ifelse(is.na(invest$X11),
ave(invest$X11, FUN = function(x) median(invest$X11, na.rm = TRUE)), invest$X11)
invest1$X14 = ifelse(is.na(invest$X14),
ave(invest$X14, FUN = function(x) median(invest$X14, na.rm = TRUE)), invest$X14)
# cek missing data
sum(is.na(invest1))
## [1] 0
frinvest <- as.data.frame(table(invest1$Risk.Level))
frinvest
## Var1 Freq
## 1 high 54
## 2 low 46
Menggunakan Bar Chart
frinvest$persen <- frinvest$Freq/sum(frinvest$Freq)
ggplot(data = frinvest, mapping = aes(x = Var1, y = Freq)) +
geom_col(aes(fill =Var1), alpha = 0.7) +
labs(title = "Investment Risk Level",
x = "Risk Level",
y = "Frekuensi") +
geom_text(aes(label = paste0(round(persen*100, 2), "%")), vjust = -0.25) +
theme(legend.position = "none")
Berdasarkan bar chart tersebut, dapat dikatakan bahwa 54% atau 54 negara
berada pada level risiko investasi yang tinggi dan 46% atau 46 negara
berada pada level risiko investasi yang rendah. Dengan kata lain, negara
yang memiliki level risiko investasi yang tinggi masih lumayan banyak
dibandingkan dengan negara yang memiliki level risiko investasi yang
rendah.
Preprocessing data
task_risk = TaskClassif$new(id="country",backend = invest1,target = "Risk.Level",positive ="low")
Data dibagi menjadi 2 bagian yaitu 80% sebagai data training dan 20% sebagai data latih
set.seed(234)
resample_holdout =rsmp("holdout", ratio = 0.8)
resample_holdout$instantiate(task=task_risk)
model_rf <- lrn("classif.ranger", predict_type="prob", importance ="impurity")
model_rf$train(task=task_risk)
importance<- data.frame(Predictors = names(model_rf$model$variable.importance), impurity = model_rf$model$variable.importance)
ggplot(importance,
aes(x=impurity,
y=reorder(Predictors,impurity))
) +
geom_col(fill = "steelblue")+
geom_text(aes(label=round(impurity,2)),hjust=1.2)
Berdasarkan output diatas, variabel X2(GDB per kapita (USD)) merupakan
variabel yang paling berpengaruh terhadap risiko investasi, sedangkan
variabel X6(pertumbuhan PDB Riil (%) rata-rata 5 tahun terakhir)
merupakan variabel yang memiliki pengaruh yang paling rendah terhadap
risiko investasi dibandingkan dengan variabel yang lainnya.
set.seed(234)
train_test_rf = resample(task = task_risk, learner = model_rf, resampling = resample_holdout, store_models = TRUE)
## INFO [08:53:28.486] [mlr3] Applying learner 'classif.ranger' on task 'country' (iter 1/1)
maksud dari output diatas yaitu menunjukkan bahwa proses aplikasi model pembelajaran mesin sedang berlangsung menggunakan algoritma Random Forest untuk memecahkan tugas klasifikasi negara berdasarkan atribut-atribut yang ada dalam datset.
data_testing = as.data.table(train_test_rf$prediction())
data_testing
## row_ids truth response prob.low prob.high
## <int> <fctr> <fctr> <num> <num>
## 1: 9 low low 0.90590317 0.09409683
## 2: 10 high low 0.55254365 0.44745635
## 3: 12 high high 0.28936905 0.71063095
## 4: 22 low low 0.95048651 0.04951349
## 5: 23 high high 0.08841111 0.91158889
## 6: 27 high high 0.23626587 0.76373413
## 7: 32 low low 0.97720714 0.02279286
## 8: 33 low low 0.94433651 0.05566349
## 9: 36 low low 0.80659444 0.19340556
## 10: 37 high high 0.20504206 0.79495794
## 11: 44 high high 0.18224603 0.81775397
## 12: 47 high high 0.27397063 0.72602937
## 13: 59 high high 0.06550238 0.93449762
## 14: 62 low low 0.86840476 0.13159524
## 15: 64 low high 0.49597540 0.50402460
## 16: 71 high high 0.10524603 0.89475397
## 17: 73 high high 0.07424444 0.92575556
## 18: 81 high high 0.16464048 0.83535952
## 19: 94 low low 0.74354524 0.25645476
## 20: 95 high low 0.65104444 0.34895556
## row_ids truth response prob.low prob.high
train_test_rf$prediction()$confusion
## truth
## response low high
## low 7 2
## high 1 10
Berdasarkan output dari confusion matrix diatas, dapat diketahui sebagai berikut: - Low low (True positive)= 7, artinya terdapat 7 negara yang memiliki kategori investasi rendah dan diprediksi rendah - Low high (True negative)= 2, artinya terdapat 2 negara yang memiliki kategori investasi rendah yang seharusnya tinggi - High low (False positive)= 1, artinya terdapat 1 negara yang memiliki kategori investasi tinggi padahal seharusnya rendah - High high (False negative)= 10, artinya terdapat 10 negara yang memiliki kategori investasi tinggi dan diprediksi tinggi
akurasi_rf <- train_test_rf$aggregate(list(msr("classif.acc"),msr("classif.specificity"), msr("classif.sensitivity")))
akurasi_rf
## classif.acc classif.specificity classif.sensitivity
## 0.8500000 0.8333333 0.8750000
Berdasarkan output diatas, nilai akurasi cukup tinggi yaitu sekita 85%
Misal akan dilakukan prediksi terhadap data testing untuk melihat berapa tingkat risiko investasinya, berikut summary dari data testing
summary(testing)
## Country X1 X2 X3
## SE : 1 Min. :11.90 Min. : 786.9 Min. : 30.05
## SG : 1 1st Qu.:15.76 1st Qu.: 3955.1 1st Qu.: 48.51
## SI : 1 Median :17.52 Median : 8653.0 Median : 65.56
## SK : 1 Mean :17.42 Mean :22330.4 Mean : 92.59
## SM : 1 3rd Qu.:19.70 3rd Qu.:31854.3 3rd Qu.:103.06
## SV : 1 Max. :23.20 Max. :69324.7 Max. :409.70
## (Other):11 NA's :1
## X4 X5 X6 X7
## Min. : 0.1051 Min. :-0.3906 Min. :0.340 Min. :-2.3230
## 1st Qu.: 0.8435 1st Qu.: 0.3153 1st Qu.:1.754 1st Qu.:-0.1248
## Median : 1.6200 Median : 0.6255 Median :2.539 Median : 0.4867
## Mean : 4.4949 Mean : 0.8249 Mean :2.994 Mean : 0.8826
## 3rd Qu.: 5.5560 3rd Qu.: 1.1173 3rd Qu.:3.553 3rd Qu.: 1.8906
## Max. :19.1730 Max. : 3.6551 Max. :6.946 Max. : 5.2762
##
## X8 X9 X10 X11
## Min. : 49.06 Min. :-200.98 Min. : 1.491 Min. : 0.500
## 1st Qu.: 72.28 1st Qu.: -42.56 1st Qu.: 52.762 1st Qu.: 1.571
## Median : 88.89 Median : 15.04 Median : 155.582 Median : 2.530
## Mean : 96.54 Mean : -18.76 Mean : 1463.386 Mean :11.258
## 3rd Qu.:109.52 3rd Qu.: 28.57 3rd Qu.: 501.644 3rd Qu.: 3.336
## Max. :185.64 Max. : 64.46 Max. :20935.000 Max. :63.500
## NA's :2 NA's :4
## X12 X13 X14
## Min. :16.45 Min. : 8.882 Min. : 2.000
## 1st Qu.:17.79 1st Qu.:17.208 1st Qu.: 4.675
## Median :22.03 Median :23.211 Median : 7.150
## Mean :21.91 Mean :23.693 Mean : 8.963
## 3rd Qu.:24.86 3rd Qu.:27.953 3rd Qu.: 9.700
## Max. :31.60 Max. :47.254 Max. :33.700
## NA's :1
Karena masih terdapat NA, bisa kita lakukan cleansing data terlebih dahulu dengan mengisi nilai missing value menggunakan mean.
new_X1 <- mutate(testing, X1 = ifelse(is.na(X1), mean(X1, na.rm=TRUE), X1))
new_X8 <- mutate(new_X1, X8 = ifelse(is.na(X8), mean(X8, na.rm=TRUE), X8))
new_X11 <- mutate(new_X8, X11 = ifelse(is.na(X11), mean(X11, na.rm=TRUE), X11))
new_X14 <- mutate(new_X11, X14 = ifelse(is.na(X14), mean(X14, na.rm=TRUE), X14))
newtesting <-new_X14
summary(newtesting)
## Country X1 X2 X3
## SE : 1 Min. :11.90 Min. : 786.9 Min. : 30.05
## SG : 1 1st Qu.:16.30 1st Qu.: 3955.1 1st Qu.: 48.51
## SI : 1 Median :17.42 Median : 8653.0 Median : 65.56
## SK : 1 Mean :17.42 Mean :22330.4 Mean : 92.59
## SM : 1 3rd Qu.:19.67 3rd Qu.:31854.3 3rd Qu.:103.06
## SV : 1 Max. :23.20 Max. :69324.7 Max. :409.70
## (Other):11
## X4 X5 X6 X7
## Min. : 0.1051 Min. :-0.3906 Min. :0.340 Min. :-2.3230
## 1st Qu.: 0.8435 1st Qu.: 0.3153 1st Qu.:1.754 1st Qu.:-0.1248
## Median : 1.6200 Median : 0.6255 Median :2.539 Median : 0.4867
## Mean : 4.4949 Mean : 0.8249 Mean :2.994 Mean : 0.8826
## 3rd Qu.: 5.5560 3rd Qu.: 1.1173 3rd Qu.:3.553 3rd Qu.: 1.8906
## Max. :19.1730 Max. : 3.6551 Max. :6.946 Max. : 5.2762
##
## X8 X9 X10 X11
## Min. : 49.06 Min. :-200.98 Min. : 1.491 Min. : 0.50
## 1st Qu.: 72.31 1st Qu.: -42.56 1st Qu.: 52.762 1st Qu.: 1.69
## Median : 94.00 Median : 15.04 Median : 155.582 Median : 3.20
## Mean : 96.54 Mean : -18.76 Mean : 1463.386 Mean :11.26
## 3rd Qu.:107.25 3rd Qu.: 28.57 3rd Qu.: 501.644 3rd Qu.:11.26
## Max. :185.64 Max. : 64.46 Max. :20935.000 Max. :63.50
##
## X12 X13 X14
## Min. :16.45 Min. : 8.882 Min. : 2.000
## 1st Qu.:17.79 1st Qu.:17.208 1st Qu.: 5.000
## Median :22.03 Median :23.211 Median : 7.300
## Mean :21.91 Mean :23.693 Mean : 8.963
## 3rd Qu.:24.86 3rd Qu.:27.953 3rd Qu.: 9.500
## Max. :31.60 Max. :47.254 Max. :33.700
##
Selanjutnya kita lakukan prediksi terhadap data testing yang telah di cleansing
risklevel_baru <- data.frame(newtesting)
prediksi <- predict(model_rf, newdata = risklevel_baru)
print(prediksi)
## [1] low low low low low high low high high low high high low high high
## [16] low high
## Levels: low high
Dari output diatas menunjukkan hasil prediksi dari data permisalan tadi, yaitu levels low high, yang artinya negara tersebut termasuk dalam kelompok risiko investasi yang tinggi atau memiliki level risiko investasi yang rendah namun seharusnya tinggi.