Kelompok 8:
Joans Henky Servatius Simanullang (52240017)
Isnaini Nur Hasanah (52240005)
PROGRAM STUDI SAINS DATA
FAKULTAS DIGITAL, DESAIN, DAN BISNIS
INSTITUT TEKNOLOGI SAINS BANDUNG
(Weight: ±25%)
Students are required to:
(Weight: ±20%)
Students are required to:
(Weight: ±20%)
Students are required to apply an advanced analytical approach that is appropriate to the dataset, such as:
(Weight: ±25%)
Students are required to:
(Weight: ±10%)
Students are required to:
---
title: "Exploratory and Predictive Analysis of Bank Customer Data"
output:
flexdashboard::flex_dashboard:
vertical_layout: scroll
theme: yeti
source_code: embed
---
```{r setup, include=FALSE}
packages <- c(
"flexdashboard",
"tidyverse",
"highcharter",
"viridis",
"DT",
"gapminder",
"jsonlite"
)
installed <- packages %in% rownames(installed.packages())
if (any(!installed)) {
install.packages(packages[!installed])
}
# Load library
library(flexdashboard)
library(tidyverse)
library(highcharter)
library(viridis)
library(DT)
library(gapminder)
library(jsonlite)
library(plotly)
library(lubridate)
library(scales)
library(ggplot2)
library(zoo)
```
Members {data-orientation=rows}
=======================================================================
Kelompok 8:
* Joans Henky Servatius Simanullang (52240017)
* Isnaini Nur Hasanah (52240005)
**PROGRAM STUDI SAINS DATA**
**FAKULTAS DIGITAL, DESAIN, DAN BISNIS**
**INSTITUT TEKNOLOGI SAINS BANDUNG**
Objectives {data-orientation=rows}
=======================================================================
### A. Dataset Understanding & Exploratory Data Analysis (EDA)
**(Weight: ±25%)**
Students are required to:
- Describe the dataset context and analytical objectives.
- Explain the data structure and variable types.
- Present key descriptive statistics.
- Identify and discuss:
- missing values,
- outliers,
- data distributions.
- Provide at least **five (5) relevant data visualizations**.
---
### B. Relationship and Pattern Analysis
**(Weight: ±20%)**
Students are required to:
- Analyze relationships among key variables.
- Apply appropriate analytical techniques (e.g., correlation, regression, cross-tabulation).
- Identify potential data issues (e.g., multicollinearity, heterogeneity).
- Interpret analytical results clearly and logically.
---
### C. Advanced Analysis (Context-Dependent)
**(Weight: ±20%)**
Students are required to apply an advanced analytical approach that is appropriate to the dataset, such as:
- Time series analysis (if time-related variables exist),
- Clustering or segmentation,
- Risk or anomaly detection,
- Classification or forecasting.
---
### D. Analytical / Predictive Modeling
**(Weight: ±25%)**
Students are required to:
- Develop at least **one analytical or predictive model**.
- Explain model selection and underlying assumptions.
- Evaluate model performance using appropriate metrics.
- Discuss model limitations and potential improvements.
---
### E. Insights, Conclusions, and Recommendations
**(Weight: ±10%)**
Students are required to:
- Summarize key findings from the analysis.
- Present data-driven insights.
- Provide logical and actionable recommendations aligned with the dataset context.
---
Dataset {data-orientation=rows}
=======================================================================
### Table {data-height=520}
```{r}
df <- read.csv("bank_dataset.csv")
df$date <- as.Date(df$date)
datatable(
df,
extensions = c("Buttons", "Scroller"),
options = list(
pageLength = 10,
scrollX = TRUE,
scrollY = 400,
scroller = TRUE,
dom = "Bfrtip",
buttons = c("copy", "csv", "excel", "print"),
lengthMenu = c(5, 10, 20, 50)
),
rownames = FALSE
)
```
EDA {data-orientation=rows}
=======================================================================
## Column {.tabset .tabset-fade data-height=520}
-----------------------------------------------------------------------
### Heatmap {data-width=600 data-height=510}
```{r}
# Drop irrelevant / problematic variables
df_clean <- df %>%
select(
date,
customer_age,
employment_years,
income_monthly,
loan_amount,
account_balance,
monthly_transactions,
avg_transaction_value,
account_type
)
# Remove rows with missing key numeric values
df_clean <- df_clean %>%
drop_na(income_monthly, loan_amount, account_balance)
# Ensure account_type as factor
df_clean$account_type <- as.factor(df_clean$account_type)
```
```{r}
corr <- cor(df_clean %>%
select(income_monthly, loan_amount, account_balance))
plot_ly(
x = colnames(corr),
y = rownames(corr),
z = corr,
type = "heatmap",
colors = colorRamp(c("skyblue", "blue", "darkblue"))
) %>%
layout(
title = "Correlation Heatmap of Key Financial Variables"
)
```
### Scatter Plot {data-width=600 data-height=510}
```{r}
plot_ly(
data = df_clean,
x = ~income_monthly,
y = ~loan_amount,
size = ~account_balance,
type = "scatter",
mode = "markers",
sizes = c(5, 40),
marker = list(opacity = 0.6),
name = "Customers"
) %>%
add_lines(
x = ~income_monthly,
y = ~fitted(lm(loan_amount ~ income_monthly, data = df_clean)),
line = list(color = "black"),
name = "Trend Line"
) %>%
layout(
title = "Monthly Income vs Loan Amount",
xaxis = list(title = "Monthly Income"),
yaxis = list(title = "Loan Amount")
)
```
### Boxplot {data-width=600 data-height=510}
```{r}
plot_ly(
df_clean,
x = ~account_type,
y = ~account_balance,
type = "box"
) %>%
layout(
title = "Account Balance Distribution by Account Type",
xaxis = list(title = "Account Type"),
yaxis = list(title = "Account Balance")
)
```
### Line Chart {data-width=600 data-height=510}
```{r}
loan_quarterly <- df_clean %>%
mutate(quarter = floor_date(as.Date(date), "quarter")) %>%
group_by(quarter) %>%
summarise(
total_loan = sum(loan_amount, na.rm = TRUE)
) %>%
arrange(quarter)
loan_quarterly <- loan_quarterly %>%
mutate(
ma_4q = zoo::rollmean(total_loan, k = 4, fill = NA, align = "right")
)
plot_ly(loan_quarterly, x = ~quarter) %>%
add_lines(
y = ~total_loan,
name = "Quarterly Total Loan",
opacity = 0.45
) %>%
add_lines(
y = ~ma_4q,
name = "4-Quarter Moving Average",
line = list(width = 3)
) %>%
layout(
title = "Quarterly Total Loan Amount with Moving Average",
xaxis = list(title = "Time"),
yaxis = list(title = "Total Loan Amount"),
hovermode = "x unified"
)
```
### Histogram {data-width=600 data-height=510}
```{r}
plot_ly() %>%
add_histogram(
data = df_clean,
x = ~income_monthly,
histnorm = "probability density",
nbinsx = 30,
name = "Histogram",
opacity = 0.6
) %>%
add_lines(
x = density(df_clean$income_monthly)$x,
y = density(df_clean$income_monthly)$y,
name = "Density Curve",
line = list(width = 2)
) %>%
layout(
title = "Histogram of Monthly Income",
xaxis = list(title = "Monthly Income"),
yaxis = list(title = "Density"),
barmode = "overlay"
)
```
Regresi {data-orientation=rows}
=======================================================================
---
Klasifikasi {data-orientation=rows}
=======================================================================
---
Klastering {data-orientation=rows}
=======================================================================
---
Time Series {data-orientation=rows}
=======================================================================
---
Insights {data-orientation=rows}
=======================================================================