Prediksi Level Risiko Investasi

Deskripsi Data

Investment Risk, atau risiko investasi, merujuk pada potensi kehilangan uang atau dampak negatif lainnya yang terkait dengan investasi. Risiko ini dapat berasal dari berbagai sumber, seperti volatilitas pasar saham, fluktuasi harga mata uang, kondisi ekonomi global, politik, dan geografis.

Berikut ini merupakan variabel yang akan digunakan untuk analisis:

Kode Peubah:

  • X1 rasio kecukupan modal (%) rata-rata 5 tahun terakhir

  • X2 PDB per kapita USD

  • X3 Rata-rata Utang Luar Negeri Bruto (% PDB) selama 5 tahun terakhir

  • X4 pertumbuhan harga konsumen (%) rata-rata 5 tahun terakhir

  • X5 pertumbuhan penduduk (%) rata-rata 5 tahun terakhir

  • X6 pertumbuhan PDB Riil (%) rata-rata 5 tahun terakhir

  • X7 Pertumbuhan PDB riil per kapita. (%) rata-rata 5 tahun terakhir

  • X8 Rata-rata rasio pinjaman terhadap simpanan (%) selama 5 tahun terakhir

  • X9 Rata-rata Utang Luar Negeri Bersih (% PDB) selama 5 tahun terakhir

  • X10 PDB Nominal (miliar USD)

  • X11 Rata-rata pinjaman bermasalah (% dari pinjaman bruto) selama 5 tahun terakhir

  • X12 persentase investasi domestik bruto terhadap PDB (%) rata-rata 5 tahun terakhir

  • X13 tabungan domestik

  • X14 tingkat pengangguran

Tujuan Analisis: Melakukan klasifikasi untuk memprediksi level risiko inverstasi.

Input Package dan Data

library(rstatix)
## 
## Attaching package: 'rstatix'
## The following object is masked from 'package:stats':
## 
##     filter
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.5.1     ✔ tibble    3.2.1
## ✔ lubridate 1.9.3     ✔ tidyr     1.3.1
## ✔ purrr     1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks rstatix::filter(), stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(ggplot2)
library(randomForest)
## randomForest 4.7-1.2
## Type rfNews() to see new features/changes/bug fixes.
## 
## Attaching package: 'randomForest'
## 
## The following object is masked from 'package:dplyr':
## 
##     combine
## 
## The following object is masked from 'package:ggplot2':
## 
##     margin
library(tidyverse)
library(mlr3verse)
## Loading required package: mlr3
library(mlr3extralearners)
library(rpart.plot)
## Loading required package: rpart
library(cowplot)
## 
## Attaching package: 'cowplot'
## 
## The following object is masked from 'package:lubridate':
## 
##     stamp
library(randomForestSRC)
## 
##  randomForestSRC 3.3.1 
##  
##  Type rfsrc.news() to see new features, changes, and bug fixes. 
##  
## 
## 
## Attaching package: 'randomForestSRC'
## 
## The following object is masked from 'package:mlr3verse':
## 
##     tune
## 
## The following object is masked from 'package:purrr':
## 
##     partial
library(graphics)
library(dplyr)
library(ranger)
## 
## Attaching package: 'ranger'
## 
## The following object is masked from 'package:randomForest':
## 
##     importance
library(caret)
## Loading required package: lattice
## 
## Attaching package: 'caret'
## 
## The following object is masked from 'package:purrr':
## 
##     lift
invest <- read.csv("investasi.csv", stringsAsFactors = TRUE)
testing <- read.csv("testing.csv", stringsAsFactors = TRUE)
invest
##     Country      X1          X2         X3       X4      X5       X6      X7
## 1        AD 17.5000  38674.6160  172.75400  0.68000  1.2206  1.78560 -2.0843
## 2        AE 18.2000  40105.1201  103.52280  1.76600  0.8698  2.65884 -0.7254
## 3     AE-AZ 18.7000  76037.9968   31.03626  2.63056  1.4893  1.85034 -1.9008
## 4     AE-RK      NA  27882.8286   24.78532  1.29416  1.7530  2.23192 -1.1355
## 5        AM 14.0000   4251.3977   89.61882  1.44000  0.2562  4.74800  2.3318
## 6        AO      NA   2033.8999   57.05566 22.35646  3.3422 -0.87800 -5.2032
## 7        AR 23.2527   9203.4287   43.25546 36.70346  0.9657 -0.23680 -3.7297
## 8        AT 18.5740  53174.2385  159.39690  1.52348  0.7259  1.88048 -0.3001
## 9        AU 15.7000  63972.3400  121.98890  1.65124  1.4790  2.44592  0.0306
## 10       AW 33.5000  24642.7034   92.84624  1.21694  0.7972  2.06486 -4.7211
## 11       AZ 25.3012   5083.2568   43.35272  6.85276  1.0510  0.39070 -1.7366
## 12       BD  4.2000   2323.5586   19.74352  5.81200  1.0568  7.39000  6.0712
## 13       BE 19.3188  49537.5785  256.72570  1.64000  0.5259  1.70044 -0.4905
## 14       BG 22.7406  11288.8489   70.29080  0.77854 -0.7095  3.61996  2.7008
## 15       BH 20.0000  22003.1172  214.20080  1.82200  4.4021  2.80902 -3.2531
## 16       BJ 10.5000   1420.6492   49.56500  0.22520  2.7684  4.87736  2.2133
## 17       BO 12.2800   3372.3576   32.27932  2.92384  1.4362  3.95132 -0.0467
## 18       BR 19.1400   7372.9153   35.67380  5.72400  0.7789 -0.46082 -1.3423
## 19       BY      NA   6453.9238   69.54962  8.39000  0.0197  0.10000  0.6603
## 20       CA 16.0956  51704.8992  117.76010  1.67406  1.1918  1.79828 -0.5879
## 21       CG 22.3000   2378.0444   70.34826  2.03374  2.5889 -5.13500 -8.1338
## 22       CH 19.3000  89770.8521  275.61690  0.00116  0.8402  1.88522  0.1733
## 23       CI      NA   2594.7038   35.68100  0.75212  2.5779  7.29640  3.3099
## 24       CL 14.2800  15986.3031   65.82414  2.97824  1.2484  1.97106 -0.8925
## 25       CM  9.1000   1690.8639   39.74812  1.54052  2.6442  4.35350  0.7177
## 26       CN 14.7045  12226.6610   13.63044  2.00000  0.4575  6.64354  5.2661
## 27       CO 17.2000   5859.6535   47.58910  4.70940  1.3766  2.44972 -0.8876
## 28       CR 13.2840  11954.5890   44.98044  1.34600  0.9961  3.24766  0.7026
## 29       CV 19.4200   3466.2500  115.31060  0.37920  1.1636  3.92138 -0.4036
## 30       CY 16.0000  30630.2905 1026.49500 -0.15102  0.1535  4.62534  2.8078
## 31       CZ 21.3796  27044.7929   79.21186  1.57500  0.2067  3.72134  1.3165
## 32       DE 18.5800  50891.5812  165.29300  1.20768  0.4835  1.62892 -0.1226
## 33       DK 22.6000  67565.6556  155.84030  0.54000  0.3613  2.68708  1.3106
## 34       DO 18.6500   8172.2496   42.49502  2.22228  0.9213  6.05690  2.4036
## 35       EC 13.4000   5830.4242   49.66284  1.23328  1.7062  0.50846 -2.7675
## 36       EE 25.3120  26427.0455   84.20358  2.03974  0.2136  3.94866  2.6995
## 37       EG 20.1000   3756.4221   31.92960 16.16054  2.0540  4.44796  2.2412
## 38       ES 16.9822  30488.0476  176.49180  0.71830  0.3949  2.84406 -0.4857
## 39       ET      NA    891.6737   33.04222 10.37682  2.6572  9.06000  5.5428
## 40       FI 20.1000  53937.2819  257.89490  0.67100  0.2165  1.82644  0.9465
## 41       FR 19.6501  44939.9046  247.44200  0.99026  0.2532  1.63728 -0.4225
## 42       GA      NA   7803.8309   39.13694  2.80396  2.7047  2.25300 -1.5934
## 43       GB 21.6000  46723.9041  406.04150  1.53050  0.6090  1.70254 -1.3484
## 44       GE 17.6000   4422.7082  104.45110  3.93800 -0.0317  4.12018  2.3147
## 45       GH 15.0000   2353.8541   59.62672 12.94200  2.2431  5.29400  2.6949
## 46       GR 16.6643  19404.1830  239.31940  0.26994 -0.2217  0.75896 -0.5866
## 47       GT 16.1000   4478.2807   34.07538  3.74200  1.9677  3.40778  0.3178
## 48       HK 20.7000  50214.6484  446.31540  2.43600  0.8510  1.99164 -0.5657
## 49       HR 25.5000  16617.8744   89.53644  0.55302 -0.7565  3.00696  1.6712
## 50       HU 18.2802  18224.0904  115.04970  1.84622 -0.2417  4.07978  2.5449
## 51       ID 23.9000   4223.4646   35.44648  3.94398  1.1454  5.03546  2.5001
## 52       IE 25.4692  91715.2029  815.34610  0.32334  1.1977 10.07624  4.5268
## 53       IL      NA  50813.0432   26.75248  0.14240  1.6423  3.36048  0.7084
## 54       IN 13.6000   2218.5362   20.80796  4.24752  1.0491  6.72450  2.6258
## 55       IQ      NA   4270.7893   37.55918  0.44192  2.4876  3.80000 -1.3546
## 56       IS 24.8200  66458.8741  108.86200  0.41926  2.0438  4.63556  0.3121
## 57       IT 16.0000  34641.2557  129.37690  0.65218 -0.0396  0.98170 -0.8850
## 58       JM 14.3000   4938.6880   90.75014  3.60208  0.4806  1.18000 -1.4606
## 59       JO 17.9300   4433.1037   70.49410  1.37566  1.9443  2.02956 -0.6875
## 60       JP 17.3000  40838.2838   75.76394  0.51938 -0.1929  0.91240 -0.1394
## 61       KE 18.4444   2025.2907   52.90428  6.28796  2.8000  5.62820  1.9242
## 62       KR 14.8000  35337.0758   26.36510  1.09614  0.3751  2.77248  1.6542
## 63       KW      NA  30276.8754   47.57912  1.88400  1.9857  0.13772 -3.7375
## 64       KZ 26.9700  10589.0517   56.21532  7.95000  1.3350  3.00000  0.9054
## 65       LK 16.5000   3822.1732   59.36342  4.21800  0.8755  3.67800  1.0805
## 66       LS 22.9520   1010.6177   61.18192  5.00340  0.7958  0.39388 -3.2384
## 67       LS 11.0000   2637.6890   85.47152  1.81194  1.5534  6.57820  3.6709
## 68       LT 21.8073  22636.1217   78.06336  1.69862 -0.8862  3.42028  3.7273
## 69       LU 23.9000 124340.3835 6908.35200  1.17428  2.0218  3.22626  0.0792
## 70       LV 24.9680  19638.1070  134.37960  1.70144 -0.7032  3.13634  2.3134
## 71       MA 15.2000   3301.6090   45.30954  1.18800  1.2641  3.09514 -0.4997
## 72       MK 16.6966   6571.8228   72.20372  0.62200  0.0389  2.77890  1.0689
## 73       MN      NA   4393.3037  218.85680  4.98958  1.8006  4.25820  0.9093
## 74       MO 14.5000  52074.0604  194.62240  2.78282  1.2126 -1.66952 -9.8453
## 75       MT 23.9600  31441.2784  761.28590  1.32032  3.1949  6.53774 -0.1381
## 76       MV 47.5000   8656.5566   35.18078  0.88000  3.5095  6.30000 -3.6495
## 77       MX 17.7000   9729.2631   37.27282  4.02524  1.1350  2.01104 -1.4444
## 78       MY 18.3000  11363.6075   65.23842  1.91000  1.3474  4.87800  1.3926
## 79       MZ 26.0000    434.4606  356.19670  9.04260  2.9384  3.92880 -0.4276
## 80     <NA> 15.2000   5051.3480   60.48620  4.85712  1.8806  0.75446 -3.5756
## 81       NG 15.4000   2149.7791   24.81464 12.94034  2.6197  1.19458 -2.3145
## 82       NI 21.7500   1986.7204   83.97706  4.34306  1.0414  1.37996 -1.0152
## 83       NL 18.9026  57230.6715  512.18330  1.17730  0.5927  2.21984  0.4875
## 84       NO 23.1000  82858.2833  155.89490  2.61900  0.8375  1.46654  0.1530
## 85       NZ      NA  48925.4692  103.06840  1.20152  2.0008  3.39204  0.0639
## 86       OM 19.1000  14957.4883   88.59674  0.76000  2.3197  1.99976 -2.3013
## 87       PA 16.2500  13872.7952  156.66240  0.42400  1.6872  4.58302 -1.8406
## 88       PE 15.5888   6528.2061   35.38870  2.69720  1.0511  3.17022 -0.7479
## 89       PH 14.9396   3658.9600   32.33956  2.49528  1.4217  6.56308  1.9917
## 90       PK 17.2000   1406.1297   29.25594  4.73824  1.8710  4.29018  1.5142
## 91       PL 20.1490  17732.6481   67.90290  0.80874 -0.0948  4.34840  3.1314
## 92       PT 16.7000  25282.8171  203.15930  0.83600 -0.3333  2.53122  0.9934
## 93       PY 19.1000   5054.1153   43.10920  3.52000  1.2931  2.96754  0.8830
## 94       QA 18.8000  55338.4835  108.79850  0.82252  2.3456  1.66590 -2.3610
## 95       RO 23.2000  14981.8900   52.57054  1.51808 -0.5878  4.71770  3.9390
## 96       RS 21.8000   8768.7320   86.54676  1.90000 -0.5088  3.17400  3.1268
## 97       RU 12.7000  10274.3779   33.40582  6.72076  0.1073  0.97740  0.6743
## 98       RW 23.3000    825.5581   52.98370  4.20636  2.6417  7.36908  2.2852
## 99       SA      NA  21664.6362   48.19680  0.76200  2.5009  1.56022 -2.5833
## 100      SC 19.0200  13490.5158  109.22200  1.18702  1.0858  3.51306 -1.1137
##            X8          X9          X10     X11      X12      X13     X14
## 1    55.00000   -26.52000     2.857862  8.0000 23.08410 26.94344  3.0000
## 2   102.52738   -13.59890   352.910575  8.1550 24.85976 32.47740  2.4500
## 3   102.52738   -56.24160   199.928422  8.1550 20.39940 31.03926      NA
## 4   102.52738    24.78532    10.108892      NA 21.69104 17.30888      NA
## 5   166.80851    47.27262    12.645460  6.6000 19.40300 15.11172 18.5000
## 6    34.81845    15.44938    62.485865 10.3000 31.12380 20.57210 10.5000
## 7          NA    -5.01348   375.190755 10.6000 16.71368 13.81918 11.0500
## 8   116.41876    15.36980   429.980978  2.0190 24.78244 26.89982  6.0000
## 9   191.74943    57.95768  1359.132847  0.9600 24.28828 22.49670  5.4478
## 10   80.54508    28.09668     2.383969  5.0000 21.13634 24.49756  8.0000
## 11  110.63987  -174.36800    42.607177      NA 23.63816 29.44668  7.0000
## 12   78.40700     4.91586   347.147671  7.7000 32.70006 32.19390  5.0000
## 13   90.42874   -18.98450   514.176961      NA 24.55938 24.73148  6.0000
## 14   72.99709   -12.95970    69.103768  5.7984 20.46028 23.25440  5.3000
## 15         NA   -51.11920    34.539229      NA 31.14198 28.74462  4.0000
## 16   81.54536    20.59038    15.355253 17.0000 23.40036 19.12786  2.3000
## 17  101.29834   -20.17800    37.238307      NA 20.84156 16.34698  8.5000
## 18   76.95112    10.01940  1444.733210      NA 15.49512 13.50872 13.9500
## 19  149.03716    42.60472    60.258857  4.8292 28.15432 18.39389  4.5000
## 20  106.86400    45.53354  1721.506090  0.5333 23.26748 20.61472  7.4503
## 21   84.09314    54.86688     9.707663 24.4000 46.82746 35.13040      NA
## 22   82.50000  -152.44500   749.017673  0.7500 24.41888 32.83370  3.1728
## 23   83.29876     7.37624    61.348608  8.8000 20.82476 19.42264 12.0000
## 24  116.51738    14.00910   252.940034  1.7229 22.48684 20.37026 10.0000
## 25   87.68676    26.04546    40.349134 20.0000 28.82414 25.75278      NA
## 26   94.22575   -26.98080 14866.703370  1.8396 43.17774 44.69396  4.9000
## 27  117.32727     9.92820   271.346897  3.1800 22.22970 18.09570 13.0000
## 28  125.26214     1.74676    61.520675  2.7000 17.93502 15.88014 15.0000
## 29   67.45486    51.37952     1.703701  9.5200 35.67148 32.54966 15.8000
## 30   68.15254   456.48640    23.804052      NA 17.90342 14.38300  7.8000
## 31   76.50765   -16.61460   243.530380  2.6562 27.97028 30.60572  3.4000
## 32  138.35724   -15.64810  3793.593164      NA 20.71144 28.03106  4.8177
## 33  359.13886    -5.59082   355.184032  1.8000 22.06322 29.24874  5.4000
## 34   80.10580    20.97700    79.001191  1.8500 23.77748 23.65308  5.7000
## 35   99.71869     8.90856    98.808010      NA 26.15664 26.23654  7.2000
## 36  104.68417   -13.26730    30.960228  0.3791 26.20832 28.47148  6.4000
## 37   53.29829    11.90716   361.845786  3.9000 15.78406 11.70278  7.0000
## 38  117.48908    83.40662  1278.325953  2.8512 19.68004 22.06638 16.1639
## 39         NA    27.57822   107.795527 11.1000 39.33340 31.44122 18.0000
## 40  175.16013    68.78926   270.625631  1.4000 23.67910 22.42012  7.8000
## 41  139.57196    36.80590  2609.943503  2.8415 23.40664 22.34918  9.3242
## 42   72.26748    29.34950    15.062255 11.2000 32.40134 32.38396 19.0000
## 43   91.33396    31.42504  2707.744043  1.2157 17.99490 14.01296  5.5665
## 44  145.18138    62.38444    15.891616  2.3000 27.43172 19.16908 20.0000
## 45   50.13182    47.67084    67.471195 15.0000 26.24074 23.15904      NA
## 46   89.84441   134.16510   188.985393 26.9780 12.66678 10.94992 18.3000
## 47   72.98978     2.81678    77.604632  1.8301 14.07878 15.21612  4.0000
## 48   68.82672  -283.26700   349.444713  0.9024 21.19750 24.59920  6.5000
## 49   82.18115    32.12062    56.170837  7.1776 21.94236 25.00206  7.5000
## 50   79.03623    11.68846   155.013041  0.9250 24.36458 27.06952  4.4000
## 51   96.57359     9.73972  1062.299663  3.0600 33.99662 32.80276  6.5000
## 52   84.42461  -345.29200   417.683180  3.5406 34.89774 43.30982  6.5000
## 53   85.93030   -47.40690   403.526464  1.4760 21.12638 24.84940  4.5000
## 54   73.69553     0.68776  2660.261329  9.5000 31.20112 30.23156  7.5000
## 55         NA     1.49128   165.493039      NA 19.14184 22.59400 13.5000
## 56  160.95762    28.71610    21.714538  2.9022 20.96484 26.28896  7.8000
## 57  111.19617    50.94154  1880.708359      NA 17.86622 20.23166 10.8839
## 58   88.86704    38.11946    13.812422  2.8000 22.55048 21.29292  8.5000
## 59   93.88143    10.63962    43.697563  5.4000 18.98000 11.38420 22.0000
## 60   70.95343   -44.36040  5043.573440  1.0734 25.31864 27.87758  2.7383
## 61   87.40996    39.30272   100.470001 14.1390 17.13230 11.00750 11.4537
## 62  123.99717   -26.86070  1631.134780  1.0000 30.95906 36.10606  4.0000
## 63         NA  -271.87800   105.949023      NA 26.68898 32.14778      NA
## 64   83.52300   -38.63250   171.239891  7.9000 26.82074 26.28400  5.1000
## 65   89.13270    46.95208    80.676726  5.3000 29.69044 27.95642  5.0000
## 66   59.50145    12.39604     1.844513  4.1991 28.89906 16.64574 24.6500
## 67         NA    72.94226    19.129116  3.2000 33.45200 30.12460      NA
## 68   68.23179    16.85362    55.761983  0.9915 19.48272 20.31900  9.0000
## 69   42.65388 -1955.72000    73.055370  1.0280 18.38978 34.25134  6.8000
## 70   77.92536    22.27304    33.430044  3.5221 22.67616 23.11992  8.1000
## 71  102.28671    15.76728   112.869983  8.3515 32.29572 29.09784 11.5000
## 72   94.67722    23.32864    12.263700  3.2613 32.41148 31.85910 16.3000
## 73   77.34337   180.83030    13.269000 11.6788 32.50458 24.22642  7.3000
## 74   81.91411  -213.14400    24.333081  0.3357 19.61144 55.08932  2.4000
## 75   67.94252  -212.97000    14.474956  3.6629 22.98726 29.32880  4.3000
## 76   78.58290    -0.57864     3.767023  8.3000 20.09944 25.59820  6.5000
## 77   99.03247     8.52064  1073.915464  2.4292 22.74344 21.51140  4.0000
## 78  113.38897   -16.25250   336.664465  1.6600 24.38356 27.92792  4.0000
## 79   49.29687   282.81000    14.374968 11.8000 42.66772 15.14566      NA
## 80   91.29415    19.01378    10.710329  6.4000 20.90098 13.32220 23.0000
## 81   64.18761    -2.27422   401.028628  6.0000 18.31216 18.87146 22.0000
## 82   84.09234    50.63570    12.621466  4.1000 29.94454 26.86952  4.5000
## 83  161.74143    12.94086   910.005594      NA 21.16826 30.30356  4.6000
## 84  207.31980   -35.66660   362.571122      NA 28.22544 32.62790  4.2000
## 85  138.38958    51.69554   208.833638      NA 23.66948 21.26348  4.8000
## 86  148.12624    20.96404    64.648375  4.2000 26.17416 14.67168      NA
## 87  121.21618    39.83066    52.938074  2.1500 41.14624 36.51452 12.0000
## 88  116.71299   -20.74610   204.753978  4.1276 21.80948 20.42292  9.7000
## 89   74.25542   -12.25290   363.429119  1.6736 24.97170 24.32906  8.5000
## 90   61.34609    20.46164   262.232162  9.1000 16.10420 12.78448  6.5000
## 91   93.36164    27.71838   594.155788  3.7123 20.15800 20.38070  6.2000
## 92  113.24087    86.35888   230.736935  6.2000 17.22924 18.41932  7.1000
## 93  104.13747     9.42000    35.304238  4.9000 21.36214 23.25368  5.5000
## 94  163.71672     1.28492   146.400550  2.0000 42.35736 46.79924  0.1200
## 95   74.41554    18.74188   248.716040  4.0580 23.69546 20.98926  5.0000
## 96   90.19331    38.24144    52.960139  5.0000 20.81872 17.03374  9.7000
## 97  122.72076   -30.73920  1471.003881      NA 22.78718 27.47832  5.4000
## 98  111.94390    36.62538    10.332054  4.5000 22.69110 11.11020      NA
## 99         NA   -62.91130   700.117867      NA 29.55294 29.68834      NA
## 100  45.41537    31.39798     1.170879  3.8700 34.72552 17.38966  4.5000
##     Risk.Level
## 1          low
## 2          low
## 3          low
## 4          low
## 5         high
## 6         high
## 7         high
## 8          low
## 9          low
## 10        high
## 11        high
## 12        high
## 13         low
## 14         low
## 15        high
## 16        high
## 17        high
## 18        high
## 19        high
## 20         low
## 21        high
## 22         low
## 23        high
## 24         low
## 25        high
## 26         low
## 27        high
## 28        high
## 29        high
## 30        high
## 31         low
## 32         low
## 33         low
## 34        high
## 35        high
## 36         low
## 37        high
## 38         low
## 39        high
## 40         low
## 41         low
## 42        high
## 43         low
## 44        high
## 45        high
## 46        high
## 47        high
## 48         low
## 49        high
## 50         low
## 51         low
## 52         low
## 53         low
## 54        high
## 55        high
## 56         low
## 57        high
## 58        high
## 59        high
## 60         low
## 61        high
## 62         low
## 63         low
## 64         low
## 65        high
## 66        high
## 67        high
## 68         low
## 69         low
## 70         low
## 71        high
## 72        high
## 73        high
## 74         low
## 75         low
## 76        high
## 77        high
## 78         low
## 79        high
## 80        high
## 81        high
## 82        high
## 83         low
## 84         low
## 85         low
## 86        high
## 87        high
## 88         low
## 89         low
## 90        high
## 91         low
## 92         low
## 93        high
## 94         low
## 95        high
## 96        high
## 97         low
## 98        high
## 99         low
## 100       high

Dari data diatas, ada satu kolom yang nantinya tidak diperlukan untuk analisis, yaitu country.

Selanjutnya, lakukan penghapusan kolom country

invest1 <- invest %>% select(-Country)
head(invest1)
##     X1        X2        X3       X4     X5       X6      X7        X8        X9
## 1 17.5 38674.616 172.75400  0.68000 1.2206  1.78560 -2.0843  55.00000 -26.52000
## 2 18.2 40105.120 103.52280  1.76600 0.8698  2.65884 -0.7254 102.52738 -13.59890
## 3 18.7 76037.997  31.03626  2.63056 1.4893  1.85034 -1.9008 102.52738 -56.24160
## 4   NA 27882.829  24.78532  1.29416 1.7530  2.23192 -1.1355 102.52738  24.78532
## 5 14.0  4251.398  89.61882  1.44000 0.2562  4.74800  2.3318 166.80851  47.27262
## 6   NA  2033.900  57.05566 22.35646 3.3422 -0.87800 -5.2032  34.81845  15.44938
##          X10    X11      X12      X13   X14 Risk.Level
## 1   2.857862  8.000 23.08410 26.94344  3.00        low
## 2 352.910575  8.155 24.85976 32.47740  2.45        low
## 3 199.928422  8.155 20.39940 31.03926    NA        low
## 4  10.108892     NA 21.69104 17.30888    NA        low
## 5  12.645460  6.600 19.40300 15.11172 18.50       high
## 6  62.485865 10.300 31.12380 20.57210 10.50       high

Eksplorasi Data

skimr::skim(invest)
Data summary
Name invest
Number of rows 100
Number of columns 16
_______________________
Column type frequency:
factor 2
numeric 14
________________________
Group variables None

Variable type: factor

skim_variable n_missing complete_rate ordered n_unique top_counts
Country 1 0.99 FALSE 98 LS: 2, AD: 1, AE: 1, AE-: 1
Risk.Level 0 1.00 FALSE 2 hig: 54, low: 46

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
X1 12 0.88 18.97 5.42 4.20 15.93 18.58 21.80 47.50 ▁▇▃▁▁
X2 0 1.00 22641.57 24846.18 434.46 4265.94 11659.10 34815.21 124340.38 ▇▂▂▁▁
X3 0 1.00 191.93 697.33 13.63 42.96 70.42 130.63 6908.35 ▇▁▁▁▁
X4 0 1.00 3.26 4.86 -0.15 0.87 1.70 3.94 36.70 ▇▁▁▁▁
X5 0 1.00 1.20 1.06 -0.89 0.44 1.14 1.95 4.40 ▃▇▅▃▁
X6 0 1.00 3.08 2.26 -5.14 1.76 2.98 4.30 10.08 ▁▂▇▃▁
X7 0 1.00 0.11 2.58 -9.85 -1.19 0.07 1.94 6.07 ▁▁▆▇▂
X8 7 0.93 99.94 42.61 34.82 76.95 90.19 113.39 359.14 ▇▅▁▁▁
X9 0 1.00 -13.58 217.07 -1955.72 -14.11 12.67 36.67 456.49 ▁▁▁▂▇
X10 0 1.00 582.32 1659.10 1.17 32.81 106.87 366.37 14866.70 ▇▁▁▁▁
X11 17 0.83 5.53 5.14 0.34 1.93 3.90 7.95 26.98 ▇▃▁▁▁
X12 0 1.00 24.96 6.73 12.67 20.79 23.40 28.38 46.83 ▃▇▃▁▁
X13 0 1.00 24.48 8.07 10.95 19.06 24.28 29.36 55.09 ▅▇▅▁▁
X14 11 0.89 8.44 5.29 0.12 4.82 6.80 10.50 24.65 ▆▇▂▂▁

Dari output diatas dapat diketahui yaitu ada 57 missing data yang terdiri dari: 12 dari var X1 7 dari var X8 17 dari var X11 11 dari var X14

Boxplot

Berikut ini adalah boxplot dari setiap variable

x1_y <- ggplot(data = invest, mapping = aes(x = X1, y = Risk.Level)) + 
  geom_boxplot()

x2_y <- ggplot(data = invest, mapping = aes(x = X2, y = Risk.Level)) + 
  geom_boxplot()

x3_y <- ggplot(data = invest, mapping = aes(x = X3, y = Risk.Level)) + 
  geom_boxplot()

x4_y <- ggplot(data = invest, mapping = aes(x = X4, y = Risk.Level)) + 
  geom_boxplot()

x5_y <- ggplot(data = invest, mapping = aes(x = X5, y = Risk.Level)) + 
  geom_boxplot()

x6_y <- ggplot(data = invest, mapping = aes(x = X6, y = Risk.Level)) + 
  geom_boxplot()

x7_y <- ggplot(data = invest, mapping = aes(x = X7, y = Risk.Level)) + 
  geom_boxplot()

x8_y <- ggplot(data = invest, mapping = aes(x = X8, y = Risk.Level)) + 
  geom_boxplot()

x9_y <- ggplot(data = invest, mapping = aes(x = X9, y = Risk.Level)) + 
  geom_boxplot()

x10_y <- ggplot(data = invest, mapping = aes(x = X10, y = Risk.Level)) + 
  geom_boxplot()

x11_y <- ggplot(data = invest, mapping = aes(x = X11, y = Risk.Level)) + 
  geom_boxplot()

x12_y <- ggplot(data = invest, mapping = aes(x = X12, y = Risk.Level)) + 
  geom_boxplot()

x13_y <- ggplot(data = invest, mapping = aes(x = X13, y = Risk.Level)) + 
  geom_boxplot()

x14_y <- ggplot(data = invest, mapping = aes(x = X14, y = Risk.Level)) + 
  geom_boxplot()
plot_grid(x1_y, x2_y, x3_y, x4_y)
## Warning: Removed 12 rows containing non-finite outside the scale range
## (`stat_boxplot()`).

plot_grid(x5_y, x6_y, x7_y, x8_y)
## Warning: Removed 7 rows containing non-finite outside the scale range
## (`stat_boxplot()`).

plot_grid(x9_y, x10_y, x11_y, x12_y) 
## Warning: Removed 17 rows containing non-finite outside the scale range
## (`stat_boxplot()`).

plot_grid(x13_y, x14_y)
## Warning: Removed 11 rows containing non-finite outside the scale range
## (`stat_boxplot()`).

Mengatasi missing data

Untuk mengatasi missing value bisa dilakukan pendugaan dengan mengisi atau mengganti missing data dengan nilai median pada masing-masing variabel

invest1$X1 = ifelse(is.na(invest$X1),
             ave(invest$X1, FUN = function(x) median(invest$X1, na.rm = TRUE)), invest$X1)

invest1$X8 = ifelse(is.na(invest$X8),
             ave(invest$X8, FUN = function(x) median(invest$X8, na.rm = TRUE)), invest$X8)

invest1$X11 = ifelse(is.na(invest$X11),
              ave(invest$X11, FUN = function(x) median(invest$X11, na.rm = TRUE)), invest$X11)
invest1$X14 = ifelse(is.na(invest$X14),
              ave(invest$X14, FUN = function(x) median(invest$X14, na.rm = TRUE)), invest$X14)
# cek missing data
sum(is.na(invest1))
## [1] 0
frinvest <- as.data.frame(table(invest1$Risk.Level))
frinvest
##   Var1 Freq
## 1 high   54
## 2  low   46

Visualisasi Data

Menggunakan Bar Chart

frinvest$persen <- frinvest$Freq/sum(frinvest$Freq)

ggplot(data = frinvest, mapping = aes(x = Var1, y = Freq)) + 
  geom_col(aes(fill =Var1), alpha = 0.7) +
  labs(title = "Investment Risk Level",
       x = "Risk Level",
       y = "Frekuensi") +
  geom_text(aes(label = paste0(round(persen*100, 2), "%")), vjust = -0.25) +
  theme(legend.position = "none")

Berdasarkan bar chart tersebut, dapat dikatakan bahwa 54% atau 54 negara berada pada level risiko investasi yang tinggi dan 46% atau 46 negara berada pada level risiko investasi yang rendah. Dengan kata lain, negara yang memiliki level risiko investasi yang tinggi masih lumayan banyak dibandingkan dengan negara yang memiliki level risiko investasi yang rendah.

Random Forest

Preprocessing data

task_risk = TaskClassif$new(id="country",backend = invest1,target = "Risk.Level",positive ="low")

Data dibagi menjadi 2 bagian yaitu 80% sebagai data training dan 20% sebagai data latih

set.seed(234)
resample_holdout =rsmp("holdout", ratio = 0.8)
resample_holdout$instantiate(task=task_risk)

Model Random Forest

model_rf <- lrn("classif.ranger", predict_type="prob", importance ="impurity")

Interpretasi Model

model_rf$train(task=task_risk)

importance<- data.frame(Predictors = names(model_rf$model$variable.importance), impurity = model_rf$model$variable.importance)
ggplot(importance,
       aes(x=impurity,
           y=reorder(Predictors,impurity))
       ) +
  geom_col(fill = "steelblue")+
  geom_text(aes(label=round(impurity,2)),hjust=1.2)

Berdasarkan output diatas, variabel X2(GDB per kapita (USD)) merupakan variabel yang paling berpengaruh terhadap risiko investasi, sedangkan variabel X6(pertumbuhan PDB Riil (%) rata-rata 5 tahun terakhir) merupakan variabel yang memiliki pengaruh yang paling rendah terhadap risiko investasi dibandingkan dengan variabel yang lainnya.

Training

set.seed(234)
train_test_rf = resample(task = task_risk, learner = model_rf, resampling = resample_holdout, store_models = TRUE)
## INFO  [08:53:28.486] [mlr3] Applying learner 'classif.ranger' on task 'country' (iter 1/1)

maksud dari output diatas yaitu menunjukkan bahwa proses aplikasi model pembelajaran mesin sedang berlangsung menggunakan algoritma Random Forest untuk memecahkan tugas klasifikasi negara berdasarkan atribut-atribut yang ada dalam datset.

Testing

data_testing = as.data.table(train_test_rf$prediction())
data_testing
##     row_ids  truth response   prob.low  prob.high
##       <int> <fctr>   <fctr>      <num>      <num>
##  1:       9    low      low 0.90590317 0.09409683
##  2:      10   high      low 0.55254365 0.44745635
##  3:      12   high     high 0.28936905 0.71063095
##  4:      22    low      low 0.95048651 0.04951349
##  5:      23   high     high 0.08841111 0.91158889
##  6:      27   high     high 0.23626587 0.76373413
##  7:      32    low      low 0.97720714 0.02279286
##  8:      33    low      low 0.94433651 0.05566349
##  9:      36    low      low 0.80659444 0.19340556
## 10:      37   high     high 0.20504206 0.79495794
## 11:      44   high     high 0.18224603 0.81775397
## 12:      47   high     high 0.27397063 0.72602937
## 13:      59   high     high 0.06550238 0.93449762
## 14:      62    low      low 0.86840476 0.13159524
## 15:      64    low     high 0.49597540 0.50402460
## 16:      71   high     high 0.10524603 0.89475397
## 17:      73   high     high 0.07424444 0.92575556
## 18:      81   high     high 0.16464048 0.83535952
## 19:      94    low      low 0.74354524 0.25645476
## 20:      95   high      low 0.65104444 0.34895556
##     row_ids  truth response   prob.low  prob.high

Evaluasi Model

Confusion Matrix

train_test_rf$prediction()$confusion
##         truth
## response low high
##     low    7    2
##     high   1   10

Berdasarkan output dari confusion matrix diatas, dapat diketahui sebagai berikut: - Low low (True positive)= 7, artinya terdapat 7 negara yang memiliki kategori investasi rendah dan diprediksi rendah - Low high (True negative)= 2, artinya terdapat 2 negara yang memiliki kategori investasi rendah yang seharusnya tinggi - High low (False positive)= 1, artinya terdapat 1 negara yang memiliki kategori investasi tinggi padahal seharusnya rendah - High high (False negative)= 10, artinya terdapat 10 negara yang memiliki kategori investasi tinggi dan diprediksi tinggi

Akurasi

akurasi_rf <- train_test_rf$aggregate(list(msr("classif.acc"),msr("classif.specificity"), msr("classif.sensitivity")))
akurasi_rf
##         classif.acc classif.specificity classif.sensitivity 
##           0.8500000           0.8333333           0.8750000

Berdasarkan output diatas, nilai akurasi cukup tinggi yaitu sekita 85%

Prediksi Data Baru

Misal akan dilakukan prediksi terhadap data testing untuk melihat berapa tingkat risiko investasinya, berikut summary dari data testing

summary(testing)
##     Country         X1              X2                X3        
##  SE     : 1   Min.   :11.90   Min.   :  786.9   Min.   : 30.05  
##  SG     : 1   1st Qu.:15.76   1st Qu.: 3955.1   1st Qu.: 48.51  
##  SI     : 1   Median :17.52   Median : 8653.0   Median : 65.56  
##  SK     : 1   Mean   :17.42   Mean   :22330.4   Mean   : 92.59  
##  SM     : 1   3rd Qu.:19.70   3rd Qu.:31854.3   3rd Qu.:103.06  
##  SV     : 1   Max.   :23.20   Max.   :69324.7   Max.   :409.70  
##  (Other):11   NA's   :1                                         
##        X4                X5                X6              X7         
##  Min.   : 0.1051   Min.   :-0.3906   Min.   :0.340   Min.   :-2.3230  
##  1st Qu.: 0.8435   1st Qu.: 0.3153   1st Qu.:1.754   1st Qu.:-0.1248  
##  Median : 1.6200   Median : 0.6255   Median :2.539   Median : 0.4867  
##  Mean   : 4.4949   Mean   : 0.8249   Mean   :2.994   Mean   : 0.8826  
##  3rd Qu.: 5.5560   3rd Qu.: 1.1173   3rd Qu.:3.553   3rd Qu.: 1.8906  
##  Max.   :19.1730   Max.   : 3.6551   Max.   :6.946   Max.   : 5.2762  
##                                                                       
##        X8               X9               X10                 X11        
##  Min.   : 49.06   Min.   :-200.98   Min.   :    1.491   Min.   : 0.500  
##  1st Qu.: 72.28   1st Qu.: -42.56   1st Qu.:   52.762   1st Qu.: 1.571  
##  Median : 88.89   Median :  15.04   Median :  155.582   Median : 2.530  
##  Mean   : 96.54   Mean   : -18.76   Mean   : 1463.386   Mean   :11.258  
##  3rd Qu.:109.52   3rd Qu.:  28.57   3rd Qu.:  501.644   3rd Qu.: 3.336  
##  Max.   :185.64   Max.   :  64.46   Max.   :20935.000   Max.   :63.500  
##  NA's   :2                                              NA's   :4       
##       X12             X13              X14        
##  Min.   :16.45   Min.   : 8.882   Min.   : 2.000  
##  1st Qu.:17.79   1st Qu.:17.208   1st Qu.: 4.675  
##  Median :22.03   Median :23.211   Median : 7.150  
##  Mean   :21.91   Mean   :23.693   Mean   : 8.963  
##  3rd Qu.:24.86   3rd Qu.:27.953   3rd Qu.: 9.700  
##  Max.   :31.60   Max.   :47.254   Max.   :33.700  
##                                   NA's   :1

Karena masih terdapat NA, bisa kita lakukan cleansing data terlebih dahulu dengan mengisi nilai missing value menggunakan mean.

new_X1 <- mutate(testing, X1 = ifelse(is.na(X1), mean(X1, na.rm=TRUE), X1))
new_X8 <- mutate(new_X1, X8 = ifelse(is.na(X8), mean(X8, na.rm=TRUE), X8))
new_X11 <- mutate(new_X8, X11 = ifelse(is.na(X11), mean(X11, na.rm=TRUE), X11))
new_X14 <- mutate(new_X11, X14 = ifelse(is.na(X14), mean(X14, na.rm=TRUE), X14))
newtesting <-new_X14
summary(newtesting)
##     Country         X1              X2                X3        
##  SE     : 1   Min.   :11.90   Min.   :  786.9   Min.   : 30.05  
##  SG     : 1   1st Qu.:16.30   1st Qu.: 3955.1   1st Qu.: 48.51  
##  SI     : 1   Median :17.42   Median : 8653.0   Median : 65.56  
##  SK     : 1   Mean   :17.42   Mean   :22330.4   Mean   : 92.59  
##  SM     : 1   3rd Qu.:19.67   3rd Qu.:31854.3   3rd Qu.:103.06  
##  SV     : 1   Max.   :23.20   Max.   :69324.7   Max.   :409.70  
##  (Other):11                                                     
##        X4                X5                X6              X7         
##  Min.   : 0.1051   Min.   :-0.3906   Min.   :0.340   Min.   :-2.3230  
##  1st Qu.: 0.8435   1st Qu.: 0.3153   1st Qu.:1.754   1st Qu.:-0.1248  
##  Median : 1.6200   Median : 0.6255   Median :2.539   Median : 0.4867  
##  Mean   : 4.4949   Mean   : 0.8249   Mean   :2.994   Mean   : 0.8826  
##  3rd Qu.: 5.5560   3rd Qu.: 1.1173   3rd Qu.:3.553   3rd Qu.: 1.8906  
##  Max.   :19.1730   Max.   : 3.6551   Max.   :6.946   Max.   : 5.2762  
##                                                                       
##        X8               X9               X10                 X11       
##  Min.   : 49.06   Min.   :-200.98   Min.   :    1.491   Min.   : 0.50  
##  1st Qu.: 72.31   1st Qu.: -42.56   1st Qu.:   52.762   1st Qu.: 1.69  
##  Median : 94.00   Median :  15.04   Median :  155.582   Median : 3.20  
##  Mean   : 96.54   Mean   : -18.76   Mean   : 1463.386   Mean   :11.26  
##  3rd Qu.:107.25   3rd Qu.:  28.57   3rd Qu.:  501.644   3rd Qu.:11.26  
##  Max.   :185.64   Max.   :  64.46   Max.   :20935.000   Max.   :63.50  
##                                                                        
##       X12             X13              X14        
##  Min.   :16.45   Min.   : 8.882   Min.   : 2.000  
##  1st Qu.:17.79   1st Qu.:17.208   1st Qu.: 5.000  
##  Median :22.03   Median :23.211   Median : 7.300  
##  Mean   :21.91   Mean   :23.693   Mean   : 8.963  
##  3rd Qu.:24.86   3rd Qu.:27.953   3rd Qu.: 9.500  
##  Max.   :31.60   Max.   :47.254   Max.   :33.700  
## 

Selanjutnya kita lakukan prediksi terhadap data testing yang telah di cleansing

risklevel_baru <- data.frame(newtesting)
prediksi <- predict(model_rf, newdata = risklevel_baru)
print(prediksi)
##  [1] low  low  low  low  low  high low  high high low  high high low  high high
## [16] low  high
## Levels: low high

Dari output diatas menunjukkan hasil prediksi dari data permisalan tadi, yaitu levels low high, yang artinya negara tersebut termasuk dalam kelompok risiko investasi yang tinggi atau memiliki level risiko investasi yang rendah namun seharusnya tinggi.