Analisis Prediksi: Rating Skema Pinjaman dengan Metode Artificial Neural Network

Packages

library(tidyverse)
## Warning: package 'tidyverse' was built under R version 4.2.1
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
## ✔ ggplot2 3.4.0      ✔ purrr   0.3.5 
## ✔ tibble  3.1.8      ✔ dplyr   1.0.10
## ✔ tidyr   1.2.1      ✔ stringr 1.4.1 
## ✔ readr   2.1.3      ✔ forcats 0.5.2
## Warning: package 'tibble' was built under R version 4.2.1
## Warning: package 'tidyr' was built under R version 4.2.1
## Warning: package 'readr' was built under R version 4.2.1
## Warning: package 'purrr' was built under R version 4.2.1
## Warning: package 'dplyr' was built under R version 4.2.1
## Warning: package 'stringr' was built under R version 4.2.1
## Warning: package 'forcats' was built under R version 4.2.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
library(knitr)
## Warning: package 'knitr' was built under R version 4.2.1
library(ggplot2)
library(tidyr)
library(e1071)
## Warning: package 'e1071' was built under R version 4.2.1
library(ROCR)
## Warning: package 'ROCR' was built under R version 4.2.1
library(rpart)
library(keras)
## Warning: package 'keras' was built under R version 4.2.2
library(caret)
## Warning: package 'caret' was built under R version 4.2.1
## Loading required package: lattice
## 
## Attaching package: 'caret'
## 
## The following object is masked from 'package:purrr':
## 
##     lift
library(reticulate)
## Warning: package 'reticulate' was built under R version 4.2.2
library(neuralnet)
## Warning: package 'neuralnet' was built under R version 4.2.2
## 
## Attaching package: 'neuralnet'
## 
## The following object is masked from 'package:ROCR':
## 
##     prediction
## 
## The following object is masked from 'package:dplyr':
## 
##     compute
library(MASS)
## 
## Attaching package: 'MASS'
## 
## The following object is masked from 'package:dplyr':
## 
##     select
library(tensorflow)
## Warning: package 'tensorflow' was built under R version 4.2.2
## 
## Attaching package: 'tensorflow'
## 
## The following object is masked from 'package:caret':
## 
##     train
library(rpart)
library(e1071)
library(DataExplorer)
## Warning: package 'DataExplorer' was built under R version 4.2.2
library(gridExtra)
## 
## Attaching package: 'gridExtra'
## 
## The following object is masked from 'package:dplyr':
## 
##     combine

Data Understanding

dtann <- read.csv("C:/Users/LENOVO/Documents/Departemen Statistika/SEM 6/TPM/data ann.csv")
head(dtann)
##   besar.pinjaman lama.pembayaran bunga pembayaran.per.bulan banyak.cash.back
## 1             70               4     1                  130             10.0
## 2            120               3     5                   15              2.0
## 3             70               4     1                  260              9.0
## 4             50               4     0                  140             14.0
## 5            110               2     2                  180              1.5
## 6            110               2     0                  125              1.0
##     rating
## 1 68.40297
## 2 33.98368
## 3 59.42551
## 4 93.70491
## 5 29.50954
## 6 33.17409

Tipe dan struktur data

str(dtann)
## 'data.frame':    75 obs. of  6 variables:
##  $ besar.pinjaman      : int  70 120 70 50 110 110 130 90 90 120 ...
##  $ lama.pembayaran     : int  4 3 4 4 2 2 3 2 3 1 ...
##  $ bunga               : int  1 5 1 0 2 0 2 1 0 2 ...
##  $ pembayaran.per.bulan: int  130 15 260 140 180 125 210 200 210 220 ...
##  $ banyak.cash.back    : num  10 2 9 14 1.5 1 2 4 5 0 ...
##  $ rating              : num  68.4 34 59.4 93.7 29.5 ...

Data terdiri dari 75 observasi dan 6 peubah. Perhatikan bahwa semua peubah bebas dan peubah respon bersifat numerik. Peubah yang digunakan dalam analisis ini adalah:

  1. Besar pinjaman (juta rupiah)
  2. Lama pembayaran (tahun)
  3. Banyaknya cashback
  4. Tambahan bunga yang ditetapkan (%)
  5. Pembayaran per bulan (10.000)

Mengubah tipe data integer menjadi numeric

dtann <- dtann %>% mutate(across(where(is.integer), as.numeric))
str(dtann)
## 'data.frame':    75 obs. of  6 variables:
##  $ besar.pinjaman      : num  70 120 70 50 110 110 130 90 90 120 ...
##  $ lama.pembayaran     : num  4 3 4 4 2 2 3 2 3 1 ...
##  $ bunga               : num  1 5 1 0 2 0 2 1 0 2 ...
##  $ pembayaran.per.bulan: num  130 15 260 140 180 125 210 200 210 220 ...
##  $ banyak.cash.back    : num  10 2 9 14 1.5 1 2 4 5 0 ...
##  $ rating              : num  68.4 34 59.4 93.7 29.5 ...

Pengecekan missing value

colSums(is.na(dtann))
##       besar.pinjaman      lama.pembayaran                bunga 
##                    0                    0                    0 
## pembayaran.per.bulan     banyak.cash.back               rating 
##                    0                    0                    0

Setelah dilakukan pengecekan, tidak terdapat missing value pada semua peubah.

Scaling data

Scaling data dilakukan sebagai standarisasi data untuk membulatkan bilangan desimal ke titik desimal terdekat.

set.seed(123)

# Scale data
scale01 <- function(x){
  (x-min(x)) / (max(x)-min(x))
}

dtann <- dtann %>% mutate_all(scale01)
head(dtann)
##   besar.pinjaman lama.pembayaran bunga pembayaran.per.bulan banyak.cash.back
## 1      0.1818182             0.6   0.2             0.406250       0.71428571
## 2      0.6363636             0.4   1.0             0.046875       0.14285714
## 3      0.1818182             0.6   0.2             0.812500       0.64285714
## 4      0.0000000             0.6   0.0             0.437500       1.00000000
## 5      0.5454545             0.2   0.4             0.562500       0.10714286
## 6      0.5454545             0.2   0.0             0.390625       0.07142857
##      rating
## 1 0.6655928
## 2 0.2106846
## 3 0.5469406
## 4 1.0000000
## 5 0.1515514
## 6 0.1999845

Splitting Data

set.seed(345)
dtann.train <- sample_frac(tbl=dtann, replace=FALSE, size=0.80)
head(dtann.train)
##   besar.pinjaman lama.pembayaran bunga pembayaran.per.bulan banyak.cash.back
## 1      0.4545455             0.2   0.2             0.437500       0.14285714
## 2      0.7272727             0.4   0.4             0.531250       0.10714286
## 3      0.5454545             0.0   0.0             0.750000       0.00000000
## 4      0.5454545             0.0   0.2             0.421875       0.00000000
## 5      0.5454545             1.0   0.0             0.718750       0.07142857
## 6      0.5454545             0.4   0.6             0.437500       0.28571429
##      rating
## 1 0.2396623
## 2 0.1639923
## 3 0.3166195
## 4 0.1319408
## 5 0.4637525
## 6 0.2961315
dtann.test <- anti_join(dtann, dtann.train)
## Joining, by = c("besar.pinjaman", "lama.pembayaran", "bunga",
## "pembayaran.per.bulan", "banyak.cash.back", "rating")
dtann.test
##    besar.pinjaman lama.pembayaran bunga pembayaran.per.bulan banyak.cash.back
## 1       0.0000000             0.6   0.0             0.437500       1.00000000
## 2       0.3636364             0.2   0.2             0.625000       0.28571429
## 3       0.5454545             0.4   0.4             0.437500       0.14285714
## 4       0.5454545             0.2   0.0             0.875000       0.00000000
## 5       0.4545455             0.4   0.2             0.437500       0.21428571
## 6       0.6363636             0.0   0.4             0.687500       0.07142857
## 7       0.5454545             0.4   0.2             0.781250       0.10714286
## 8       0.5454545             0.2   0.2             0.562500       0.00000000
## 9       0.9090909             0.6   0.6             0.468750       0.21428571
## 10      0.0000000             0.0   0.0             0.000000       0.00000000
## 11      0.4545455             0.6   0.2             0.421875       0.14285714
## 12      0.2727273             0.2   0.0             0.000000       0.21428571
## 13      0.5454545             0.2   0.2             0.218750       0.07142857
## 14      0.5454545             0.2   0.2             0.781250       0.00000000
## 15      0.4545455             0.4   0.2             0.625000       0.21428571
##        rating
## 1  1.00000000
## 2  0.41073956
## 3  0.29548966
## 4  0.30929858
## 5  0.44981653
## 6  0.05059922
## 7  0.17220475
## 8  0.11487480
## 9  0.21274750
## 10 0.56452680
## 11 0.41591549
## 12 0.66338444
## 13 0.17429082
## 14 0.27838685
## 15 0.44341036

Splitting data menjadi 2, data training dan data testing. Data training dilakukan untuk pemodelan, sedangkan data testing dilakukan untuk peramalan dan prediksi. Data training dengan proporsi 80% dan data testing sebesar 20%.

Exploratory Data Analysis

Plot Korelasi

plot_correlation(dtann,type = "c")

Sebaran Data

p1 <- ggplot(dtann, aes(y=dtann$besar.pinjaman))+
 geom_boxplot(fill="yellow", alpha=0.6) + coord_flip() + theme_classic()+
 ggtitle ("Sebaran besar pinjaman")+
 ylab ("pinjaman") +
 theme(plot.title = element_text(size = 12L, face = "bold", hjust = 0.5),
 plot.subtitle = element_text(size = 11L, face = "plain", hjust =
0.5))
p2 <- ggplot(dtann, aes(y=dtann$lama.pembayaran))+
 geom_boxplot(fill="yellow", alpha=0.6) + coord_flip() + theme_classic()+
 ggtitle ("Sebaran lama pembayaran")+
 ylab ("lama pembayaran") +
 theme(plot.title = element_text(size = 12L, face = "bold", hjust = 0.5),
 plot.subtitle = element_text(size = 11L, face = "plain", hjust =
0.5))
p3 <- ggplot(dtann, aes(y=dtann$bunga))+
 geom_boxplot(fill="yellow", alpha=0.6) + coord_flip() + theme_classic()+
 ggtitle ("Sebaran bunga")+
 ylab ("bunga") +
 theme(plot.title = element_text(size = 12L, face = "bold", hjust = 0.5),
 plot.subtitle = element_text(size = 11L, face = "plain", hjust =
0.5))
p4 <- ggplot(dtann, aes(y=dtann$pembayaran.per.bulan))+
 geom_boxplot(fill="yellow", alpha=0.6) + coord_flip() + theme_classic()+
 ggtitle ("Sebaran pembayran")+
 ylab ("pembayaran") +
 theme(plot.title = element_text(size = 12L, face = "bold", hjust = 0.5),
 plot.subtitle = element_text(size = 11L, face = "plain", hjust =
0.5))
p5 <- ggplot(dtann, aes(y=dtann$banyak.cash.back))+
 geom_boxplot(fill="yellow", alpha=0.6) + coord_flip() + theme_classic()+
 ggtitle ("Sebaran cashback")+
 ylab ("cashback") +
 theme(plot.title = element_text(size = 12L, face = "bold", hjust = 0.5),
 plot.subtitle = element_text(size = 11L, face = "plain", hjust =
0.5))
p6 <- ggplot(dtann, aes(y=dtann$rating))+
 geom_boxplot(fill="red", alpha=0.6) + coord_flip() + theme_classic()+
 ggtitle ("Sebaran rating")+
 ylab ("rating") +
 theme(plot.title = element_text(size = 12L, face = "bold", hjust = 0.5),
 plot.subtitle = element_text(size = 11L, face = "plain", hjust =
0.5))
grid.arrange(p1,p2,p3,p4,p5,p6)
## Warning: Use of `dtann$besar.pinjaman` is discouraged.
## ℹ Use `besar.pinjaman` instead.
## Warning: Use of `dtann$lama.pembayaran` is discouraged.
## ℹ Use `lama.pembayaran` instead.
## Warning: Use of `dtann$bunga` is discouraged.
## ℹ Use `bunga` instead.
## Warning: Use of `dtann$pembayaran.per.bulan` is discouraged.
## ℹ Use `pembayaran.per.bulan` instead.
## Warning: Use of `dtann$banyak.cash.back` is discouraged.
## ℹ Use `banyak.cash.back` instead.
## Warning: Use of `dtann$rating` is discouraged.
## ℹ Use `rating` instead.

Pemodelan Neural Network

set.seed(789)
nn1 <- neuralnet(rating ~ besar.pinjaman + lama.pembayaran +
 bunga + pembayaran.per.bulan + banyak.cash.back, data =
dtann.train, hidden=c(3,2), linear.output = TRUE)
plot(nn1, rep = "best")

Prediksi Data

pred.test1 <- compute(nn1, dtann.test[, 1:5])$net.result
pred.test1
##             [,1]
##  [1,] 0.89387430
##  [2,] 0.32592300
##  [3,] 0.28281304
##  [4,] 0.27901778
##  [5,] 0.38552836
##  [6,] 0.09932890
##  [7,] 0.31177372
##  [8,] 0.21751673
##  [9,] 0.25543110
## [10,] 0.04308592
## [11,] 0.44426668
## [12,] 0.62486216
## [13,] 0.22228547
## [14,] 0.23004865
## [15,] 0.36060843

Evaluasi Prediksi

ktrain1 <- nn1 %>% predict(dtann.train[, 1:5])
ktest1 <- nn1 %>% predict(dtann.test[, 1:5])
postResample(ktrain1[,1], dtann.train$rating)
##       RMSE   Rsquared        MAE 
## 0.05443114 0.89282651 0.04374419
postResample(ktest1[,1], dtann.test$rating)
##       RMSE   Rsquared        MAE 
## 0.15134134 0.61492382 0.09328408

Berdasarkan nilai RMSE dan R-square, data training lebih baik digunakan untuk prediksi daripada data testing. Artinya, nilai yang diprediksi dekat dengan nilai amatan aslinya dan akan lebih akurat jika datanya cukup banyak.