Exploratory and Predictive Analysis of Bank Customer Data

---
title: "Exploratory and Predictive Analysis of Bank Customer Data"
output: 
  flexdashboard::flex_dashboard:
    vertical_layout: scroll
    theme: yeti
    source_code: embed
---

  
```{r setup, include=FALSE}
packages <- c(
  "flexdashboard",
  "tidyverse",
  "highcharter",
  "viridis",
  "DT",
  "gapminder",
  "jsonlite"
)

installed <- packages %in% rownames(installed.packages())
if (any(!installed)) {
  install.packages(packages[!installed])
}

# Load library
library(flexdashboard)
library(tidyverse)
library(highcharter)
library(viridis)
library(DT)
library(gapminder)
library(jsonlite)
library(plotly)
library(lubridate)
library(scales)
library(ggplot2)
library(zoo)
```

Members {data-orientation=rows}
=======================================================================

Kelompok 8:

* Joans Henky Servatius Simanullang (52240017)

* Isnaini Nur Hasanah (52240005)

**PROGRAM STUDI SAINS DATA**

**FAKULTAS DIGITAL, DESAIN, DAN BISNIS**

**INSTITUT TEKNOLOGI SAINS BANDUNG**

Objectives {data-orientation=rows}
=======================================================================

### A. Dataset Understanding & Exploratory Data Analysis (EDA)  
**(Weight: ±25%)**

Students are required to:

- Describe the dataset context and analytical objectives.
- Explain the data structure and variable types.
- Present key descriptive statistics.
- Identify and discuss:
  - missing values,
  - outliers,
  - data distributions.
- Provide at least **five (5) relevant data visualizations**.

---

### B. Relationship and Pattern Analysis  
**(Weight: ±20%)**

Students are required to:

- Analyze relationships among key variables.
- Apply appropriate analytical techniques (e.g., correlation, regression, cross-tabulation).
- Identify potential data issues (e.g., multicollinearity, heterogeneity).
- Interpret analytical results clearly and logically.

---

### C. Advanced Analysis (Context-Dependent)  
**(Weight: ±20%)**

Students are required to apply an advanced analytical approach that is appropriate to the dataset, such as:

- Time series analysis (if time-related variables exist),
- Clustering or segmentation,
- Risk or anomaly detection,
- Classification or forecasting.

---

### D. Analytical / Predictive Modeling  
**(Weight: ±25%)**

Students are required to:

- Develop at least **one analytical or predictive model**.
- Explain model selection and underlying assumptions.
- Evaluate model performance using appropriate metrics.
- Discuss model limitations and potential improvements.

---

### E. Insights, Conclusions, and Recommendations  
**(Weight: ±10%)**

Students are required to:

- Summarize key findings from the analysis.
- Present data-driven insights.
- Provide logical and actionable recommendations aligned with the dataset context.

---

Dataset {data-orientation=rows}
=======================================================================
  
### Table {data-height=520}

```{r}
df <- read.csv("bank_dataset.csv")
df$date <- as.Date(df$date)
datatable(
  df,
  extensions = c("Buttons", "Scroller"),
  options = list(
    pageLength = 10,
    scrollX = TRUE,
    scrollY = 400,
    scroller = TRUE,
    dom = "Bfrtip",
    buttons = c("copy", "csv", "excel", "print"),
    lengthMenu = c(5, 10, 20, 50)
  ),
  rownames = FALSE
)
```
  

EDA {data-orientation=rows}
=======================================================================

## Column {.tabset .tabset-fade data-height=520}
-----------------------------------------------------------------------
  
### Heatmap {data-width=600 data-height=510}

```{r}
# Drop irrelevant / problematic variables
df_clean <- df %>%
  select(
    date,
    customer_age,
    employment_years,
    income_monthly,
    loan_amount,
    account_balance,
    monthly_transactions,
    avg_transaction_value,
    account_type
  )

# Remove rows with missing key numeric values
df_clean <- df_clean %>%
  drop_na(income_monthly, loan_amount, account_balance)

# Ensure account_type as factor
df_clean$account_type <- as.factor(df_clean$account_type)
```

```{r}
corr <- cor(df_clean %>% 
              select(income_monthly, loan_amount, account_balance))

plot_ly(
  x = colnames(corr),
  y = rownames(corr),
  z = corr,
  type = "heatmap",
  colors = colorRamp(c("skyblue", "blue", "darkblue"))
) %>%
  layout(
    title = "Correlation Heatmap of Key Financial Variables"
  )

```

### Scatter Plot {data-width=600 data-height=510}
  
```{r}
plot_ly(
  data = df_clean,
  x = ~income_monthly,
  y = ~loan_amount,
  size = ~account_balance,
  type = "scatter",
  mode = "markers",
  sizes = c(5, 40),
  marker = list(opacity = 0.6),
  name = "Customers"
) %>%
  add_lines(
    x = ~income_monthly,
    y = ~fitted(lm(loan_amount ~ income_monthly, data = df_clean)),
    line = list(color = "black"),
    name = "Trend Line"
  ) %>%
  layout(
    title = "Monthly Income vs Loan Amount",
    xaxis = list(title = "Monthly Income"),
    yaxis = list(title = "Loan Amount")
  )
```

### Boxplot {data-width=600 data-height=510}
  
```{r}
plot_ly(
  df_clean,
  x = ~account_type,
  y = ~account_balance,
  type = "box"
) %>%
  layout(
    title = "Account Balance Distribution by Account Type",
    xaxis = list(title = "Account Type"),
    yaxis = list(title = "Account Balance")
  )
```

### Line Chart {data-width=600 data-height=510}
  
```{r}
loan_quarterly <- df_clean %>%
  mutate(quarter = floor_date(as.Date(date), "quarter")) %>%
  group_by(quarter) %>%
  summarise(
    total_loan = sum(loan_amount, na.rm = TRUE)
  ) %>%
  arrange(quarter)

loan_quarterly <- loan_quarterly %>%
  mutate(
    ma_4q = zoo::rollmean(total_loan, k = 4, fill = NA, align = "right")
  )

plot_ly(loan_quarterly, x = ~quarter) %>%
  add_lines(
    y = ~total_loan,
    name = "Quarterly Total Loan",
    opacity = 0.45
  ) %>%
  add_lines(
    y = ~ma_4q,
    name = "4-Quarter Moving Average",
    line = list(width = 3)
  ) %>%
  layout(
    title = "Quarterly Total Loan Amount with Moving Average",
    xaxis = list(title = "Time"),
    yaxis = list(title = "Total Loan Amount"),
    hovermode = "x unified"
  )
```

### Histogram {data-width=600 data-height=510}
  
```{r}
plot_ly() %>%
  add_histogram(
    data = df_clean,
    x = ~income_monthly,
    histnorm = "probability density",
    nbinsx = 30,
    name = "Histogram",
    opacity = 0.6
  ) %>%
  add_lines(
    x = density(df_clean$income_monthly)$x,
    y = density(df_clean$income_monthly)$y,
    name = "Density Curve",
    line = list(width = 2)
  ) %>%
  layout(
    title = "Histogram of Monthly Income",
    xaxis = list(title = "Monthly Income"),
    yaxis = list(title = "Density"),
    barmode = "overlay"
  )
```

Regresi {data-orientation=rows}
=======================================================================

---

Klasifikasi {data-orientation=rows}
=======================================================================

---

Klastering {data-orientation=rows}
=======================================================================

---

Time Series {data-orientation=rows}
=======================================================================

---

Insights {data-orientation=rows}
=======================================================================