Analisis Prediksi: Rating Skema Pinjaman dengan Metode Artificial Neural Network
Packages
library(tidyverse)## Warning: package 'tidyverse' was built under R version 4.2.1
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
## ✔ ggplot2 3.4.0 ✔ purrr 0.3.5
## ✔ tibble 3.1.8 ✔ dplyr 1.0.10
## ✔ tidyr 1.2.1 ✔ stringr 1.4.1
## ✔ readr 2.1.3 ✔ forcats 0.5.2
## Warning: package 'tibble' was built under R version 4.2.1
## Warning: package 'tidyr' was built under R version 4.2.1
## Warning: package 'readr' was built under R version 4.2.1
## Warning: package 'purrr' was built under R version 4.2.1
## Warning: package 'dplyr' was built under R version 4.2.1
## Warning: package 'stringr' was built under R version 4.2.1
## Warning: package 'forcats' was built under R version 4.2.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
library(knitr)## Warning: package 'knitr' was built under R version 4.2.1
library(ggplot2)
library(tidyr)
library(e1071)## Warning: package 'e1071' was built under R version 4.2.1
library(ROCR)## Warning: package 'ROCR' was built under R version 4.2.1
library(rpart)
library(keras)## Warning: package 'keras' was built under R version 4.2.2
library(caret)## Warning: package 'caret' was built under R version 4.2.1
## Loading required package: lattice
##
## Attaching package: 'caret'
##
## The following object is masked from 'package:purrr':
##
## lift
library(reticulate)## Warning: package 'reticulate' was built under R version 4.2.2
library(neuralnet)## Warning: package 'neuralnet' was built under R version 4.2.2
##
## Attaching package: 'neuralnet'
##
## The following object is masked from 'package:ROCR':
##
## prediction
##
## The following object is masked from 'package:dplyr':
##
## compute
library(MASS)##
## Attaching package: 'MASS'
##
## The following object is masked from 'package:dplyr':
##
## select
library(tensorflow)## Warning: package 'tensorflow' was built under R version 4.2.2
##
## Attaching package: 'tensorflow'
##
## The following object is masked from 'package:caret':
##
## train
library(rpart)
library(e1071)
library(DataExplorer)## Warning: package 'DataExplorer' was built under R version 4.2.2
library(gridExtra)##
## Attaching package: 'gridExtra'
##
## The following object is masked from 'package:dplyr':
##
## combine
Data Understanding
dtann <- read.csv("C:/Users/LENOVO/Documents/Departemen Statistika/SEM 6/TPM/data ann.csv")
head(dtann)## besar.pinjaman lama.pembayaran bunga pembayaran.per.bulan banyak.cash.back
## 1 70 4 1 130 10.0
## 2 120 3 5 15 2.0
## 3 70 4 1 260 9.0
## 4 50 4 0 140 14.0
## 5 110 2 2 180 1.5
## 6 110 2 0 125 1.0
## rating
## 1 68.40297
## 2 33.98368
## 3 59.42551
## 4 93.70491
## 5 29.50954
## 6 33.17409
Tipe dan struktur data
str(dtann)## 'data.frame': 75 obs. of 6 variables:
## $ besar.pinjaman : int 70 120 70 50 110 110 130 90 90 120 ...
## $ lama.pembayaran : int 4 3 4 4 2 2 3 2 3 1 ...
## $ bunga : int 1 5 1 0 2 0 2 1 0 2 ...
## $ pembayaran.per.bulan: int 130 15 260 140 180 125 210 200 210 220 ...
## $ banyak.cash.back : num 10 2 9 14 1.5 1 2 4 5 0 ...
## $ rating : num 68.4 34 59.4 93.7 29.5 ...
Data terdiri dari 75 observasi dan 6 peubah. Perhatikan bahwa semua peubah bebas dan peubah respon bersifat numerik. Peubah yang digunakan dalam analisis ini adalah:
- Besar pinjaman (juta rupiah)
- Lama pembayaran (tahun)
- Banyaknya cashback
- Tambahan bunga yang ditetapkan (%)
- Pembayaran per bulan (10.000)
Mengubah tipe data integer menjadi numeric
dtann <- dtann %>% mutate(across(where(is.integer), as.numeric))
str(dtann)## 'data.frame': 75 obs. of 6 variables:
## $ besar.pinjaman : num 70 120 70 50 110 110 130 90 90 120 ...
## $ lama.pembayaran : num 4 3 4 4 2 2 3 2 3 1 ...
## $ bunga : num 1 5 1 0 2 0 2 1 0 2 ...
## $ pembayaran.per.bulan: num 130 15 260 140 180 125 210 200 210 220 ...
## $ banyak.cash.back : num 10 2 9 14 1.5 1 2 4 5 0 ...
## $ rating : num 68.4 34 59.4 93.7 29.5 ...
Pengecekan missing value
colSums(is.na(dtann))## besar.pinjaman lama.pembayaran bunga
## 0 0 0
## pembayaran.per.bulan banyak.cash.back rating
## 0 0 0
Setelah dilakukan pengecekan, tidak terdapat missing value pada semua peubah.
Scaling data
Scaling data dilakukan sebagai standarisasi data untuk membulatkan bilangan desimal ke titik desimal terdekat.
set.seed(123)
# Scale data
scale01 <- function(x){
(x-min(x)) / (max(x)-min(x))
}
dtann <- dtann %>% mutate_all(scale01)
head(dtann)## besar.pinjaman lama.pembayaran bunga pembayaran.per.bulan banyak.cash.back
## 1 0.1818182 0.6 0.2 0.406250 0.71428571
## 2 0.6363636 0.4 1.0 0.046875 0.14285714
## 3 0.1818182 0.6 0.2 0.812500 0.64285714
## 4 0.0000000 0.6 0.0 0.437500 1.00000000
## 5 0.5454545 0.2 0.4 0.562500 0.10714286
## 6 0.5454545 0.2 0.0 0.390625 0.07142857
## rating
## 1 0.6655928
## 2 0.2106846
## 3 0.5469406
## 4 1.0000000
## 5 0.1515514
## 6 0.1999845
Splitting Data
set.seed(345)
dtann.train <- sample_frac(tbl=dtann, replace=FALSE, size=0.80)
head(dtann.train)## besar.pinjaman lama.pembayaran bunga pembayaran.per.bulan banyak.cash.back
## 1 0.4545455 0.2 0.2 0.437500 0.14285714
## 2 0.7272727 0.4 0.4 0.531250 0.10714286
## 3 0.5454545 0.0 0.0 0.750000 0.00000000
## 4 0.5454545 0.0 0.2 0.421875 0.00000000
## 5 0.5454545 1.0 0.0 0.718750 0.07142857
## 6 0.5454545 0.4 0.6 0.437500 0.28571429
## rating
## 1 0.2396623
## 2 0.1639923
## 3 0.3166195
## 4 0.1319408
## 5 0.4637525
## 6 0.2961315
dtann.test <- anti_join(dtann, dtann.train)## Joining, by = c("besar.pinjaman", "lama.pembayaran", "bunga",
## "pembayaran.per.bulan", "banyak.cash.back", "rating")
dtann.test## besar.pinjaman lama.pembayaran bunga pembayaran.per.bulan banyak.cash.back
## 1 0.0000000 0.6 0.0 0.437500 1.00000000
## 2 0.3636364 0.2 0.2 0.625000 0.28571429
## 3 0.5454545 0.4 0.4 0.437500 0.14285714
## 4 0.5454545 0.2 0.0 0.875000 0.00000000
## 5 0.4545455 0.4 0.2 0.437500 0.21428571
## 6 0.6363636 0.0 0.4 0.687500 0.07142857
## 7 0.5454545 0.4 0.2 0.781250 0.10714286
## 8 0.5454545 0.2 0.2 0.562500 0.00000000
## 9 0.9090909 0.6 0.6 0.468750 0.21428571
## 10 0.0000000 0.0 0.0 0.000000 0.00000000
## 11 0.4545455 0.6 0.2 0.421875 0.14285714
## 12 0.2727273 0.2 0.0 0.000000 0.21428571
## 13 0.5454545 0.2 0.2 0.218750 0.07142857
## 14 0.5454545 0.2 0.2 0.781250 0.00000000
## 15 0.4545455 0.4 0.2 0.625000 0.21428571
## rating
## 1 1.00000000
## 2 0.41073956
## 3 0.29548966
## 4 0.30929858
## 5 0.44981653
## 6 0.05059922
## 7 0.17220475
## 8 0.11487480
## 9 0.21274750
## 10 0.56452680
## 11 0.41591549
## 12 0.66338444
## 13 0.17429082
## 14 0.27838685
## 15 0.44341036
Splitting data menjadi 2, data training dan data testing. Data training dilakukan untuk pemodelan, sedangkan data testing dilakukan untuk peramalan dan prediksi. Data training dengan proporsi 80% dan data testing sebesar 20%.
Exploratory Data Analysis
Plot Korelasi
plot_correlation(dtann,type = "c")Sebaran Data
p1 <- ggplot(dtann, aes(y=dtann$besar.pinjaman))+
geom_boxplot(fill="yellow", alpha=0.6) + coord_flip() + theme_classic()+
ggtitle ("Sebaran besar pinjaman")+
ylab ("pinjaman") +
theme(plot.title = element_text(size = 12L, face = "bold", hjust = 0.5),
plot.subtitle = element_text(size = 11L, face = "plain", hjust =
0.5))
p2 <- ggplot(dtann, aes(y=dtann$lama.pembayaran))+
geom_boxplot(fill="yellow", alpha=0.6) + coord_flip() + theme_classic()+
ggtitle ("Sebaran lama pembayaran")+
ylab ("lama pembayaran") +
theme(plot.title = element_text(size = 12L, face = "bold", hjust = 0.5),
plot.subtitle = element_text(size = 11L, face = "plain", hjust =
0.5))
p3 <- ggplot(dtann, aes(y=dtann$bunga))+
geom_boxplot(fill="yellow", alpha=0.6) + coord_flip() + theme_classic()+
ggtitle ("Sebaran bunga")+
ylab ("bunga") +
theme(plot.title = element_text(size = 12L, face = "bold", hjust = 0.5),
plot.subtitle = element_text(size = 11L, face = "plain", hjust =
0.5))
p4 <- ggplot(dtann, aes(y=dtann$pembayaran.per.bulan))+
geom_boxplot(fill="yellow", alpha=0.6) + coord_flip() + theme_classic()+
ggtitle ("Sebaran pembayran")+
ylab ("pembayaran") +
theme(plot.title = element_text(size = 12L, face = "bold", hjust = 0.5),
plot.subtitle = element_text(size = 11L, face = "plain", hjust =
0.5))
p5 <- ggplot(dtann, aes(y=dtann$banyak.cash.back))+
geom_boxplot(fill="yellow", alpha=0.6) + coord_flip() + theme_classic()+
ggtitle ("Sebaran cashback")+
ylab ("cashback") +
theme(plot.title = element_text(size = 12L, face = "bold", hjust = 0.5),
plot.subtitle = element_text(size = 11L, face = "plain", hjust =
0.5))
p6 <- ggplot(dtann, aes(y=dtann$rating))+
geom_boxplot(fill="red", alpha=0.6) + coord_flip() + theme_classic()+
ggtitle ("Sebaran rating")+
ylab ("rating") +
theme(plot.title = element_text(size = 12L, face = "bold", hjust = 0.5),
plot.subtitle = element_text(size = 11L, face = "plain", hjust =
0.5))
grid.arrange(p1,p2,p3,p4,p5,p6)## Warning: Use of `dtann$besar.pinjaman` is discouraged.
## ℹ Use `besar.pinjaman` instead.
## Warning: Use of `dtann$lama.pembayaran` is discouraged.
## ℹ Use `lama.pembayaran` instead.
## Warning: Use of `dtann$bunga` is discouraged.
## ℹ Use `bunga` instead.
## Warning: Use of `dtann$pembayaran.per.bulan` is discouraged.
## ℹ Use `pembayaran.per.bulan` instead.
## Warning: Use of `dtann$banyak.cash.back` is discouraged.
## ℹ Use `banyak.cash.back` instead.
## Warning: Use of `dtann$rating` is discouraged.
## ℹ Use `rating` instead.
Pemodelan Neural Network
set.seed(789)
nn1 <- neuralnet(rating ~ besar.pinjaman + lama.pembayaran +
bunga + pembayaran.per.bulan + banyak.cash.back, data =
dtann.train, hidden=c(3,2), linear.output = TRUE)
plot(nn1, rep = "best")Prediksi Data
pred.test1 <- compute(nn1, dtann.test[, 1:5])$net.result
pred.test1## [,1]
## [1,] 0.89387430
## [2,] 0.32592300
## [3,] 0.28281304
## [4,] 0.27901778
## [5,] 0.38552836
## [6,] 0.09932890
## [7,] 0.31177372
## [8,] 0.21751673
## [9,] 0.25543110
## [10,] 0.04308592
## [11,] 0.44426668
## [12,] 0.62486216
## [13,] 0.22228547
## [14,] 0.23004865
## [15,] 0.36060843
Evaluasi Prediksi
ktrain1 <- nn1 %>% predict(dtann.train[, 1:5])
ktest1 <- nn1 %>% predict(dtann.test[, 1:5])
postResample(ktrain1[,1], dtann.train$rating)## RMSE Rsquared MAE
## 0.05443114 0.89282651 0.04374419
postResample(ktest1[,1], dtann.test$rating)## RMSE Rsquared MAE
## 0.15134134 0.61492382 0.09328408
Berdasarkan nilai RMSE dan R-square, data training lebih baik digunakan untuk prediksi daripada data testing. Artinya, nilai yang diprediksi dekat dengan nilai amatan aslinya dan akan lebih akurat jika datanya cukup banyak.