Introduction

Business problem

Bank portugal saat ini sedang mengalami penurunan revenue, mereka ingin tahu tindakan apa yang harus di ambil, setelah diselidiki, kami menemukan bahwa akar masalahnya adalah banyak nasabah tidak melakukan penyetoran sesering sebelumnya. selain itu, bank memiliki peluang lebih besar untuk membujuk nasabah yang membeli produk lain Seperti asuransi atau dana pensiun untuk lebih meningkatkan pendapatannya, oleh karena itu bank portugal ingin mengindentifikasi klien yang ada yang memiliki peluang lebih tinggi untuk berlangganan deposito berjangka dan memfokuskan upaya pemasaran pada klien tersebut.

Bank Portugal telah mengumpulkan sejumlah besar data yang mencakup profil pelanggan dari mereka yang telah berlangganan deposito berjangka dan mereka yang tidak berlangganan deposito berjangka. Data tersebut mencakup kolom-kolom berikut.

age (numeric)
job: type of job (categorical: ‘admin.’,’blue-collar’,’ entrepreneur’,’ housemaid’,’ management’,’ retired’,’ self-employed’,’ services’,’ student’,’technician’,’ unemployed’,’ unknown’)
marital: marital status (categorical: ‘divorced’,’ married’,’ single’,’ unknown’; note: ‘divorced’ means divorced or widowed)
education (categorical): ‘basic.4y’,’basic.6y’,’basic.9y’,’ high.school’,’ illiterate’,’professional.course’,’ university. degree’,’ unknown’)
default: has credit in default? (categorical: ‘no’,’yes’,’ unknown’)
housing: has a housing loan? (categorical: ‘no’,’yes’,’ unknown’)
loan: has a personal loan? (categorical: ‘no’,’yes’,’ unknown’) The following are related to the last contact of the current campaign
contact: contact communication type (categorical: ‘cellular’,’telephone’)
month: last contact month of the year (categorical: ‘Jan’, ‘Feb’, ‘mar’, …, ‘Nov’, ‘Dec’)
day_of_week: last contact day of the week (categorical: ‘mon’,’tue’,’wed’,’thu’,’fri’)
duration: last contact duration, in seconds (numeric). Important note: this attribute highly affects the output target (e.g., if duration=0 then y=’no’). Yet, the duration is not known before a call is performed. Also, after the end of the call y is known. Thus, this input should only be included for benchmark purposes and should be discarded if the intention is to have a realistic predictive model.

Other attributes:

campaign: number of contacts performed during this campaign and for this client (numeric, includes the last contact)
days: number of days that passed by after the client was last contacted from a previous campaign (numeric; 999 means the client was not previously contacted)
previous: number of contacts performed before this campaign and for this client (numeric)
outcome: outcome of the previous marketing campaign (categorical: ‘failure’,’ nonexistent’,’ success’)

The following are social and economic context attributes

emp.var.rate: employment variation rate — quarterly indicator (numeric) 17 — cons.price.idx: consumer price index — monthly indicator (numeric)
cons.conf.idx: consumer confidence index — monthly indicator (numeric)
euribor3m: Euribor 3 month rate — daily indicator (numeric)
nr.employed: number of employees — quarterly indicator (numeric) Output variable (desired target):
y has the client subscribed to a term deposit? (binary: ‘yes’,’ no’)

Library and Setup

# Data visualization
library(ggplot2)
library(ggthemr)
library(gridExtra)
library(hrbrthemes)
# Data wrangling
library(tidyverse)
library(Amelia)

Data Explanatory

Kita akan memasukan data ke object dt

dt <- read.csv("data_input/bank-additional-full.csv", sep= ";")

Inspect data

head(dt, 10)

tail(dt, 10)

Check row & column

dim(dt)

## [1] 41188    21

Dari hasil inspection di atas, kita dapat description sederhana dari data bank portugal mempunyai 41188x21 row dan column, lalu kita akan check struktur data.

Check Struktur Data

str(dt)

## 'data.frame':    41188 obs. of  21 variables:
##  $ age           : int  56 57 37 40 56 45 59 41 24 25 ...
##  $ job           : chr  "housemaid" "services" "services" "admin." ...
##  $ marital       : chr  "married" "married" "married" "married" ...
##  $ education     : chr  "basic.4y" "high.school" "high.school" "basic.6y" ...
##  $ default       : chr  "no" "unknown" "no" "no" ...
##  $ housing       : chr  "no" "no" "yes" "no" ...
##  $ loan          : chr  "no" "no" "no" "no" ...
##  $ contact       : chr  "telephone" "telephone" "telephone" "telephone" ...
##  $ month         : chr  "may" "may" "may" "may" ...
##  $ day_of_week   : chr  "mon" "mon" "mon" "mon" ...
##  $ duration      : int  261 149 226 151 307 198 139 217 380 50 ...
##  $ campaign      : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ pdays         : int  999 999 999 999 999 999 999 999 999 999 ...
##  $ previous      : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ poutcome      : chr  "nonexistent" "nonexistent" "nonexistent" "nonexistent" ...
##  $ emp.var.rate  : num  1.1 1.1 1.1 1.1 1.1 1.1 1.1 1.1 1.1 1.1 ...
##  $ cons.price.idx: num  94 94 94 94 94 ...
##  $ cons.conf.idx : num  -36.4 -36.4 -36.4 -36.4 -36.4 -36.4 -36.4 -36.4 -36.4 -36.4 ...
##  $ euribor3m     : num  4.86 4.86 4.86 4.86 4.86 ...
##  $ nr.employed   : num  5191 5191 5191 5191 5191 ...
##  $ y             : chr  "no" "no" "no" "no" ...

hasil dari struktur data di atas kita bisa lihat bahwa ada beberapa data yang masih belum sesuai type nya, pada peroses selanjutnya kita akan menyesuaikan type data.

Adjuted type data

dt[, c("marital","default", "housing", "loan", "contact", "day_of_week", "month", "education", "y" )] <- lapply(dt[,c("marital", "default", "housing", "loan", "contact", "day_of_week", "month", "education", "y")], as.factor) 

str(dt)

## 'data.frame':    41188 obs. of  21 variables:
##  $ age           : int  56 57 37 40 56 45 59 41 24 25 ...
##  $ job           : chr  "housemaid" "services" "services" "admin." ...
##  $ marital       : Factor w/ 4 levels "divorced","married",..: 2 2 2 2 2 2 2 2 3 3 ...
##  $ education     : Factor w/ 8 levels "basic.4y","basic.6y",..: 1 4 4 2 4 3 6 8 6 4 ...
##  $ default       : Factor w/ 3 levels "no","unknown",..: 1 2 1 1 1 2 1 2 1 1 ...
##  $ housing       : Factor w/ 3 levels "no","unknown",..: 1 1 3 1 1 1 1 1 3 3 ...
##  $ loan          : Factor w/ 3 levels "no","unknown",..: 1 1 1 1 3 1 1 1 1 1 ...
##  $ contact       : Factor w/ 2 levels "cellular","telephone": 2 2 2 2 2 2 2 2 2 2 ...
##  $ month         : Factor w/ 10 levels "apr","aug","dec",..: 7 7 7 7 7 7 7 7 7 7 ...
##  $ day_of_week   : Factor w/ 5 levels "fri","mon","thu",..: 2 2 2 2 2 2 2 2 2 2 ...
##  $ duration      : int  261 149 226 151 307 198 139 217 380 50 ...
##  $ campaign      : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ pdays         : int  999 999 999 999 999 999 999 999 999 999 ...
##  $ previous      : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ poutcome      : chr  "nonexistent" "nonexistent" "nonexistent" "nonexistent" ...
##  $ emp.var.rate  : num  1.1 1.1 1.1 1.1 1.1 1.1 1.1 1.1 1.1 1.1 ...
##  $ cons.price.idx: num  94 94 94 94 94 ...
##  $ cons.conf.idx : num  -36.4 -36.4 -36.4 -36.4 -36.4 -36.4 -36.4 -36.4 -36.4 -36.4 ...
##  $ euribor3m     : num  4.86 4.86 4.86 4.86 4.86 ...
##  $ nr.employed   : num  5191 5191 5191 5191 5191 ...
##  $ y             : Factor w/ 2 levels "no","yes": 1 1 1 1 1 1 1 1 1 1 ...

Check missing value

anyNA(dt)

## [1] FALSE

dt %>% missmap()

Jika kita lihat data bank portugal tidak ada missing sehingga kita tidak perlu lagi adjust data,

Rename kolom y menjadi deposit

dt <- dt %>% 
  rename("deposit" = "y")

Summary statistics

summary(dt)

##       age            job                marital     
##  Min.   :17.00   Length:41188       divorced: 4612  
##  1st Qu.:32.00   Class :character   married :24928  
##  Median :38.00   Mode  :character   single  :11568  
##  Mean   :40.02                      unknown :   80  
##  3rd Qu.:47.00                                      
##  Max.   :98.00                                      
##                                                     
##                education        default         housing           loan      
##  university.degree  :12168   no     :32588   no     :18622   no     :33950  
##  high.school        : 9515   unknown: 8597   unknown:  990   unknown:  990  
##  basic.9y           : 6045   yes    :    3   yes    :21576   yes    : 6248  
##  professional.course: 5243                                                  
##  basic.4y           : 4176                                                  
##  basic.6y           : 2292                                                  
##  (Other)            : 1749                                                  
##       contact          month       day_of_week    duration     
##  cellular :26144   may    :13769   fri:7827    Min.   :   0.0  
##  telephone:15044   jul    : 7174   mon:8514    1st Qu.: 102.0  
##                    aug    : 6178   thu:8623    Median : 180.0  
##                    jun    : 5318   tue:8090    Mean   : 258.3  
##                    nov    : 4101   wed:8134    3rd Qu.: 319.0  
##                    apr    : 2632               Max.   :4918.0  
##                    (Other): 2016                               
##     campaign          pdays          previous       poutcome        
##  Min.   : 1.000   Min.   :  0.0   Min.   :0.000   Length:41188      
##  1st Qu.: 1.000   1st Qu.:999.0   1st Qu.:0.000   Class :character  
##  Median : 2.000   Median :999.0   Median :0.000   Mode  :character  
##  Mean   : 2.568   Mean   :962.5   Mean   :0.173                     
##  3rd Qu.: 3.000   3rd Qu.:999.0   3rd Qu.:0.000                     
##  Max.   :56.000   Max.   :999.0   Max.   :7.000                     
##                                                                     
##   emp.var.rate      cons.price.idx  cons.conf.idx     euribor3m    
##  Min.   :-3.40000   Min.   :92.20   Min.   :-50.8   Min.   :0.634  
##  1st Qu.:-1.80000   1st Qu.:93.08   1st Qu.:-42.7   1st Qu.:1.344  
##  Median : 1.10000   Median :93.75   Median :-41.8   Median :4.857  
##  Mean   : 0.08189   Mean   :93.58   Mean   :-40.5   Mean   :3.621  
##  3rd Qu.: 1.40000   3rd Qu.:93.99   3rd Qu.:-36.4   3rd Qu.:4.961  
##  Max.   : 1.40000   Max.   :94.77   Max.   :-26.9   Max.   :5.045  
##                                                                    
##   nr.employed   deposit    
##  Min.   :4964   no :36548  
##  1st Qu.:5099   yes: 4640  
##  Median :5191              
##  Mean   :5167              
##  3rd Qu.:5228              
##  Max.   :5228              
##

age memiliki rentang nilai max yaitu 98 dan min 17 sedangkan untuk median 38 dan mean 40
Duration range nya antara 0-4918, dan memiliki nilain mean 258.3 dan median 180.0, semakin tinggi duration semakin berpotensi nasabah membuka deposito
campaign nilai min 1 dan max 56, dan memiliki nilai mean 2.6 dan median 2
previous jumlah call sebelumnya max 7 nilainya sangat kecil sekali jika dibandingankan jumlah data customer yang ada.
deposit memiliki nilai gap nya cukup besar hampir 88.7% target customer memutuskan untuk tidak membuka deposito berjangka.
CPI memiliki nilai yang cukup stabil di setiap range ini mengindikasikan inflasi cukup terjaga.

Data Visualization & Analysis case study

kita ingin mengetahui corelation deposito by duration

ggplot(data = dt, mapping = aes(x =deposit, y = duration))+
  geom_boxplot(aes(fill = deposit))+
  labs(title = "Deposit by Duration", x = "deposit", y = "Duration")+
  theme(plot.title = element_text(hjust = 0.5))+
  geom_hline(yintercept = mean(dt$duration), col = "blue", linetype = 5)

Insight :

Seperti kita lihat dari boxplot di atas bahwa semakin tinggi duration call maka semakin tinggi kemungkinan nasabah untuk membuka deposit.
berdasarkan hasil summary mean dari duration adalah 258 sehingga dapat disimpulkan jika duration call dibawah rata2 tersebut, kemungkinan peluang seseorang membuka deposito sangat kecil.

kita ingin mengetahui corelation antara duration dan campaign

ggthemr_reset()
ggplot(data = dt, mapping = aes(x = campaign, y = duration, col = deposit))+
  geom_point()+
  geom_hline(yintercept = mean(dt$duration), col = "blue", linetype = 5)+
  labs(title = "Duration vs Campaign")+
  theme(plot.title = element_text(hjust = 0.5))

Insight :

semakin lama durasi panggilan, peluang nya lebih tinggi untuk membuka deposito.

kita ingin mengetahui nilai rata2 dari duration call dan campaign berdasarkan contact yang di gunakan

agg <- aggregate.data.frame(list(duration = dt$duration, campaign = dt$campaign),by = list(contact = dt$contact), FUN = mean)
dim(agg)

## [1] 2 3

agg.pi <- pivot_longer(agg, 
                       cols = c("duration", "campaign"),
                       names_to = "stat",
                       values_to = "value")


agg.pi

ggthemr_reset()
ggplot(agg.pi, mapping = aes(x = contact, y = value))+
  geom_col(aes(fill = stat), position = "dodge")+
  coord_flip()+
  labs(title = "Average duration & campaign for each contact", x = " contact", y = "value", fill = "variabel")+
  geom_text(aes(label = round(value, digits = 1)), hjust = -0.1, size = 3)+
  theme(plot.title = element_text(hjust = 0.5))

Insight :

campaign dengan telephone memiliki nilai rata-rata minimum lebih tinggi dibandingkan cellular mesikpun duration nya telephone rendah, ini bisa terjadi karena cost lebih mahal.
Cellular memiliki nilai rata-rata minimum campaign lebih rendah dibandikan dengan telephon tapi durationnya lebih tinggi hal ini bisa terjadi karena dari cost lebih murah hampir semua orang memilikinya dan flexibel menggunakannya.

Feature engineering

kita akan mengelompokan usia dan membuat boxplot education by duration

p <- function(x){if(x<30){x <- "<30"} else if(x>=30 & x<=39){x <- "30-39"}else if (x>=40 & x<=49){x <-"40-49"}else if(x>=50 & x <=59){x <- "50-59"}else(x <- "60+")}


dt$ageg <- as.factor(sapply(dt$age, p))
head(dt)

ggthemr_reset()
ggplot(dt, aes(x = duration, y = education))+
  geom_jitter(aes(col = dt$ageg))+
  geom_boxplot(alpha = 0.5)+
  scale_x_continuous(trans = "log10")+
  labs(title = "Entire education by duration ", x = "Duration", y = "Education", col = "age segment")+
  theme(plot.title = element_text(hjust = 0.5))

## Warning: Use of `dt$ageg` is discouraged. Use `ageg` instead.

## Warning: Transformation introduced infinite values in continuous x-axis

## Warning: Transformation introduced infinite values in continuous x-axis

## Warning: Removed 4 rows containing non-finite values (stat_boxplot).

Insight :

based on the box plots above, kita dapat melihat bahwa terdapat sedikit perbedaan antara education, mediannya hampir konsisten di setiap box ukuran box pun hampir serupa dan memiliki outlier di ujung minimum dan maximum, namun untuk university degree dan high school lebih padat dari uknown, profesional course, basic 4y dan basic 6y, tapi untuk basic.9y memiliki nilai minumum outiler yang cukup tinggi dan di rentang usia 40-49 artinya hanya sedikit duration call yang terjadi di segment usia dan basic.9y, 1 box di bawah boxplots yang lainna yaitu illiterate jelas ini mengidikasikan segement tersebut tidak begitu menjadi prioritas.

kita akan melihat bagaimana campaign banyak terjadi di bulan apa

ggplot(dt, aes(x = month , y = campaign))+
  geom_col(aes(fill = deposit))+
  labs(title = "Campaign vs Month", x = "Month", y = "campaign")+
  theme(plot.title = element_text(hjust = 0.5))

Insight :

campaign sebagian besar terjadi pada bulan (juli, juni, mei)
campaign dilakukan juga pada akhir tahun

Kita ingin mencari segment pekerjaan dengan pesentase tertinggi

ggthemr("grape", type = "inner")
total_deposit <- dt %>% 
  select(job) %>% 
  group_by(job) %>%
  summarise(n=n()) %>% 
  mutate(percent=round(prop.table(n),2) * 100) %>%
  
  ggplot(mapping = aes(x =reorder(job, percent), y = percent))+
  geom_bar(stat = "identity")+
  coord_flip()+
  scale_fill_brewer(palette = "Dark2")+
  theme(legend.position = "none")+
  xlab("Jobs")+
  geom_label(aes(label = paste0(percent, "%")),
             position = position_stack(vjust = 1 ),
             colour = "black",
             fontface = "bold")

## `summarise()` ungrouping output (override with `.groups` argument)

total_deposit

Insight :

berdasarkan graph di atas bisa kita lihat bahwa jenis pekerjaan admin memiliki persentase tertinggi yaitu 25% di ikuti oleh blue collar dan technician.

kita ingin mencari bagaimana interaction job berdasarkan deposito

job_by_deposit <- dt %>% 
  select(job, deposit) %>% 
  dplyr::group_by(job, deposit) %>% 
  summarise(n=n()) %>% 
  mutate(percent=round(prop.table(n),2) * 100) %>% 
  ungroup() %>% 

ggplot(aes(x = reorder(job, -n), y = n, fill = deposit)) +
  geom_bar(stat="identity", width = 0.9, position = position_dodge(width = 0.8)) +
  geom_text(aes(label = sprintf("%.f%%", percent)),
            position = position_dodge(width = 0.8),
            vjust = -0.3, size = 3.5,
            colour="black", fontface="bold") +
  labs(x="", y="") + 
  labs(title="Jobs By Deposit") + 
  theme(plot.title=element_text(hjust=0.5))

## `summarise()` regrouping output by 'job' (override with `.groups` argument)

job_by_deposit

Insight :

Jika kita lihat graph di atas admin memiliki proporsi tidak membuka deposito tertinggi 87% di ikuti oleh blue-collar sebesar 93% dan 7% yang membeli, technician 89% penolakan dan 11 % yang memutuskan untuk membuka deposito
Dari semua segment pekerjaan belum menemukan break event point mayoritas customer dari berbagai segment pekerjaan masih banyak yang tidak mau membuka deposito, yang menarik dari graph diatas adalah pada retired kontribusi nya cukup tinggi sebesar 25%, ini mengindikasikan bahwa pensiunan sangat tertarik untuk membuka deposito hal ini dikarenakan low risk dan return yang stabil meski rata-rata yield nya kecil.

Kita ingin mencari bagaimana interaction month berdasarkan deposito

monht_deposit <-  dt %>% 
  select(month, deposit) %>%
  group_by(month, deposit) %>% 
  summarise(n = n()) %>% 
  mutate(percent = round(prop.table(n), digits = 2)*100) %>% 
  ungroup() %>%
  
  ggplot(aes(x =reorder(month, -n), y = n, fill = deposit))+
  geom_bar(stat = "identity", width = 0.9, position = position_dodge(width = 0.8))+
  theme(axis.title.x = element_text(size = 12, face = "bold"))+
  theme(axis.title.y = element_text(size = 12, face = "bold"))+
  xlab("Jobs")+
  ylab("Deposit")+
  geom_text(aes(label = sprintf("%.f%%", percent)),
            position = position_dodge(width = 0.8),
            vjust = -0.3, size = 3.5,
            colour = "black", fontface = "bold")+
  labs(x = "", y = "")+
  labs(title = "Month by deposit")+
  theme(plot.title = element_text(hjust = 0.5))

## `summarise()` regrouping output by 'month' (override with `.groups` argument)

monht_deposit

Insight :

Jika kita lihat graph di atas pada bulan may hanya 6% yang membuka produk, padahal jika kita refer to graph sebelumnya Campaign vs Month di bulan mei paling banyak melakukn campaign dan diikuti oleh bulan juni, juli, agustus, tapi justru di bulan ini terjadi banyak menolak untuk membuka deposito.
Jika kita amati di bulan oct, sep, mar, jumlah nya cukup proposional jumlah tertinggi melakukan pembukaan deposito terjadi di bulan maret sebesar 51%.

Kita ingin mengetahui interaction loan housing terhadap deposito

ggthemr("grape", type = "inner")
loan_by_deposit <- dt %>% 
  select(loan, deposit) %>% 
  group_by(loan, deposit) %>% 
  summarise(n=n()) %>% 
  mutate(percent = round(prop.table(n),digits = 3)*100) %>% 
  ungroup() %>% 
  
  
  ggplot(aes(x = loan, y = n, fill = deposit))+
  geom_bar(stat = "identity", width = 0.7, position = position_dodge(width = 0.8))+
  theme(axis.title.x = element_text(size = 12, face = "bold"))+
  geom_text(aes(label = sprintf("%.1f%%", percent)),
            position = position_dodge(width = 0.8),
            vjust = -0.3, size = 3.5,
            colour = "black", fontface = "bold")+
  labs(x = "", y = "")+
  labs(title = "Loan by deposit")+
  theme(plot.title = element_text(hjust = 0.5))

## `summarise()` regrouping output by 'loan' (override with `.groups` argument)

housing_by_deposit <- dt %>% 
  select(housing, deposit) %>% 
  group_by(housing, deposit) %>% 
  summarise(n = n()) %>% 
  mutate(percent = round(prop.table(n) ,digits = 2)*100) %>% 
  ungroup() %>% 
  
  ggplot(aes(x = housing, y = n, fill = deposit))+
  geom_bar(stat = "identity", width = 0.7, position = position_dodge(width = 0.8))+
  theme(axis.title.x = element_text(size = 15, face = "bold"))+
  geom_text(aes(label = sprintf("%.1f%%", percent)),
            position = position_dodge(width = 0.8),
            vjust = -0.3, size  = 3.5,
            colour = "black", fontface = "bold")+
  labs(title = "Housing by deposit", x = "", y ="")+
  theme(plot.title = element_text(hjust = 0.5))

## `summarise()` regrouping output by 'housing' (override with `.groups` argument)

grid.arrange(housing_by_deposit,loan_by_deposit, layout_matrix = rbind(c(1,2)))

Insgiht :

Jika kita lihat graph di atas orang-orang yang tidak memiliki pinjaman perumahan cenderung peluangnnya lebih besar untuk membuka deposito.

Kita ingin melihat bagaimana distribusi usia memiliki interaction dengan deposito

ggthemr::ggthemr("flat dark", type = "inner")

age_dis <- dt %>% 
  select(age, deposit) %>% 
  
  ggplot(aes(x = age, fill = deposit))+
  geom_histogram(alpha = 0.9, col = "grey")+
  scale_x_continuous(breaks = seq(min(18), max(95), by = 10))+
  theme(legend.position = "right")+
  labs(x = "", y = "")+
  labs(title = "Distribution Age")+
  theme(plot.title = element_text(hjust = 0.5))

age_dis

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Insight

Usia yang membuka deposito paling tinggi dilakukan pada range usia 28-40 dan di rentang usia tersebut pula terjadi banyak penolakan pembukaan deposito.
jika kita amati graph yang menarik adalah di range usia 60-80 tidak semasif di range usia 28-40, tapi penolakan pembukaan deposito justru hanya sedikit di usia tersebut ini mengindikasikan di usia tersebut orang cenderung lebih tertarik untuk membuka deposito, kemungkinan hal ini terjadi karena cash flow yang rutin dari deposito cocok untuk usia pensiun yaitu 60-80.

Final Conclusion

Dari semua graph dan analysis di atas, kami mempunyai beberapa kesimpulan.

Mayoritas customer yang di hubungi masih menolak untuk membuka deposito
bank portugal terlalu fokus pada segment pasar dengan range usia 28-40 dan dan pekerjaan admin, blue collar, technician, padahal jika kita lihat persentase penolakan banyak terjadi di segment tersebut dan justru di range usai 60-80 dan retired memiliki persentase penolak yang rendah dan pembukaan deposit cukup tinggi hal ini bisa terjadi karena karakterisik dari produk deposito cocok untuk segment tersebut yaitu low risk, cash flow yang rutin, dan yield yang stabil.
housing loan, duration, salah satu indikator penting customer akan memutuskan untuk membuka deposito.

Data Visualization & Analysis

Irsad Ramadhan

12/25/2020

Introduction

Business problem

Library and Setup

Data Explanatory

Inspect data

Check row & column

Check Struktur Data

Adjuted type data

Check missing value

Summary statistics

Data Visualization & Analysis case study

Feature engineering

Final Conclusion