Purposes: analyze against Bank ABC data and express ideas from the analysis
Let’s load the data using dplyr, tidyverse, readxl, and tidyr packages to make it easier to work
The data used in this project is ABC bank customer data in the Greater Jakarta area of Indonesia.dim(), names(), summary(), and rmarkdown::paged_table are used to know the data background.
## [1] 6940 8
## [1] "id" "city" "gender" "name" "date" "debit" "credit"
## [8] "deposit"
## id city gender name
## Min. : 33 Length:6940 Length:6940 Length:6940
## 1st Qu.: 187465 Class :character Class :character Class :character
## Median : 445320 Mode :character Mode :character Mode :character
## Mean : 484521
## 3rd Qu.: 792419
## Max. :1048546
## date debit credit
## Min. :1987-01-01 00:00:00 Length:6940 Min. : 1.5
## 1st Qu.:1991-10-01 18:00:00 Class :character 1st Qu.:100.7
## Median :1996-07-01 12:00:00 Mode :character Median :185.2
## Mean :1996-07-01 12:00:00 Mean :194.4
## 3rd Qu.:2001-04-01 06:00:00 3rd Qu.:270.0
## Max. :2005-12-31 00:00:00 Max. :665.9
## deposit
## Min. : 61.6
## 1st Qu.:196.5
## Median :245.6
## Mean :252.3
## 3rd Qu.:301.4
## Max. :624.8
The data that we used have eight variables and 6940 observations. The data structure of the variable date and debit is still not in accordance with needs.
To check missing values:
## [1] "city" "debit"
There are missing values in city and debit variables.
Fill the missing value with the mean of the debit itself, change the date variable with as.date(), and save the results into the csv.
Import the new data of the bank ABC customers
| Variable | Description |
|---|---|
| Id | Id of customer bank ABC |
| City | City of origin of customer bank ABC |
| Gender | Gender of customer bank ABC |
| Name | Name of sutomer bank ABC |
| Date | Date of joining ABC bank |
| Debit | The amount of credit |
| Credit | The amount of credit |
| Deposit | The amount of deposit |
First, we need to calculate how many male and female customers are.
Total of Female and Male
## [1] 3095
## [1] 3844
Female and Male Historgram
From the histogram, there are more female customers than male customers.
Density Female and Male Plot
Density plot is used to know the center and the spread about female and male central value.
Female and Male Boxplot
Boxplot tells the location of data and the range of data. Male customers in the city of Bekasi have a median of 103 people. From several years, the highest and lowest number of male BCA customers originating from the city of Bekasi are 174 people and 30 people. BCA male customers in Bogor have a median of 167 people, with the highest and lowest numbers being 173 and 75 people. Meanwhile, in the city of Depok, BCA customers who are male have a median of 138 people and the highest and lowest numbers are 206 people and 101 people. In Jakarta, BCA customers who are male have a median of 148 people and the highest and lowest numbers are 187 people and 29 people. In Tangerang, BCA customers who are male have a median of 108.5 or 109 people, with the highest number and the lowest number being 175 people and 52 people.
In Bekasi, female customers have a median of 188 people and the highest and lowest numbers of female customers are 262 and 23 people. In Bogor, female customers have a median of 113 people and the highest and lowest numbers are 192 and 75 people. In the city of Depok, female customers have a median of 145.5 or 146 people and the highest and lowest numbers are 193 and 112 people. In the city of Jakarta, female customers have a median of 179 people, with the highest and lowest numbers being 237 and 56 people. Meanwhile in Tangerang, female customers have a median of 160 people with the highest and lowest numbers being 221 and 80 people.
Female and Male Violin Plot
The shape of the violin shows the density of the data points in a particular variable. The more convex the graph of the violin plot data is visualized, the greater the probability of data density. On the other hand, the flatter the graph of the violin plot data is visualized, the less the probability of data density. Male customers in Jakarta have a greater density of opportunity data at the median than any other city. Female customers in Jakarta have a greater density of opportunity data at the median than any other city.
Female and Male Ridgeline Plot
Ridgeline plots make it possible to study the distribution of numerical variables for several groups. The distribution of male and female customers in each city increased by around 100-200 people. The best distribution for men is in the city of Jakarta, while the best distribution for women is in the city of Depok.
Female and Male Time Series Plot
Bank ABC experienced a huge decrease of male and female customers in 1990 and 2002, to be more precise, it is in Jakarta (1990) and Bekasi (2002).
Make a new dataset by grouping the year and the city and summarizing the total of the debit, credit, and deposit.
Debit, Credit, and Deposit Histogram
The highest total distribution of Bank ABC customer debits is around IDR 100,000. The highest total distribution of customer loans is around 70,000 rupiah. The highest total distribution of customer deposits is around 80,000 rupiah to 100,000 rupiah.
Debit, Credit, and Deposit Plots
Density plot is used to know the center and the spread about debit, credit, and deposits central value.
Debit, Credit, and Deposits Boxplots
The highest of total debits are in the city of Depok and Jakarta. The highest of total credit is in Jakarta. The highest of total deposits is in Depok. Since, the highes of total debits and total deposits in in Depok, it can be ascertained that Depok residents have more money than other areas.
Debit, Credit, and Deposit Plots
The shape of the violin shows the density of the data points in a particular variable. The more convex the graph of the violin plot data is visualized, the greater the probability of data density. On the other hand, the flatter the graph of the violin plot data is visualized, the less probability the data density is. Bekasi’s total debits have a high probability that the customers have a value around the median. Total loans in the city of Jakarta have a high probability that customers have loans around the median. Jakarta’s total deposits indicates a high probability that the customer has a deposit around the median.
Debit, Credit, and Deposit Ridgeline Plots
Ridgeline plots are used to plot the densities of total debits, credits, and deposits. Bank ABC customers have a high probability of debiting around IDR 100,000, crediting around IDR 50,000 to 100,000, depositing around IDR 65,000.
Debit, Credit, and Deposit Time Series Plot
In 1990 and 2002, customers withdraw their money from bank and pay their credit so that the amount of credit in the bank decreased.
Correlation analysis to analyze the relationship between total debits, credits, and deopsits.
Cab <- Ca[,c(-1,-2)]
corM <- cor(Cab)
corrplot(corM, method="number", tl.col = "black", number.cex = 0.8, type= "upper", order = "hclust",col = brewer.pal(4, name="RdBu"))From the data above, we can see that there is a very strong positive correlation between debit and credit, deposit with credit, and deposit with debit. The correlation results between debit and credit, deposit to credit, and deposit to debit are 0.91,0.93 and 0.96. If on the graph, these three results will have a graph that goes up steeply. The correlation result above implies that there is a strong relationship between the variables, which means that the variables are heading in the same direction. For example, the correlation between debit and credit, if the debit increases by IDR 1 , then the credit will also increase by IDR 1.
We will use the assumption that assets are debits plus deposits then deducted by deposits to find out which city assets at Bank ABC are the most and the least.
jkt <- filter(jawa, city == "Jakarta")
Assetjkt <- mutate(jkt, Asset = debit + deposit)
Assettotaljkt <- sum(Assetjkt$Asset, na.rm=T)
bgr <- filter(jawa, city == "Bogor")
Assetbgr <- mutate(bgr, Asset = debit + deposit)
Assettotalbgr <- sum(Assetbgr$Asset, na.rm=T)
tgr <- filter(jawa, city == "Tangerang")
Assettgr <- mutate(tgr, Asset = debit + deposit)
Assettotaltgr <- sum(Assettgr$Asset, na.rm=T)
depok <- filter(jawa, city == "Depok")
Assetdepok <- mutate(depok, Asset = debit + deposit)
Assettotaldepok <- sum(Assetdepok$Asset, na.rm=T)
bkasi <- filter(jawa, city == "Bekasi")
Assetbkasi <- mutate(bkasi, Asset = debit + deposit)
Assettotalbkasi <- sum(Assetbkasi$Asset, na.rm=T)
Alloftheasset <- data.frame(Assettotaljkt, Assettotalbgr, Assettotaltgr, Assettotaldepok, Assettotalbkasi)
AlloftheassetThe city that has the most assets is Jakarta with as many as IDR 1,242,037, while the city with the least is Bogor with as many as IDR 479,429.9.
There is a very strong positive correlation between debit and credit, deposit with credit, and deposit with debit.
There is a significant difference between the number of ABC Bank customers who are male and female.
There is still a risk of data entry errors.
Since, the total of male gender of Bank ABC is 3095, while the female gender is 3844, we can make a project called “Female Service Program”. The purpose is to make female customers remain loyal and attract new female customers to join and become customers of ABC bank. This program can be in the form of debit or credit cards specifically for women, lucky draws on Mother’s Day, parking lots for women, and so on.
ABC Bank can also apply a lottery prize on the condition that the amount of savings exceeds the predetermined conditions. The purpose of this made is so that customers can increase their savings and use ABC bank services continuously.
Observations can be carried out on the branches of the ABC bank itself in relation to the large differences in the number of assets in each region. For example, why are the assets in Bogor smaller than in Jakarta? Is it because the service there is unsatisfactory?
Data Awareness is a program dedicated to employees who work with data to input correctly and completely to all mandatory fields. This program is to reduce the risk of data entry errors.