# Load required packages
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(tidyr)
library(ggplot2)
library(plotly)
##
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
##
## last_plot
## The following object is masked from 'package:stats':
##
## filter
## The following object is masked from 'package:graphics':
##
## layout
library(DT)
library(flexdashboard)
library(lubridate)
##
## Attaching package: 'lubridate'
## The following objects are masked from 'package:base':
##
## date, intersect, setdiff, union
dataset <- readRDS("dataset.rds")
We applied simple random sampling using sample_frac(0.2) to create a 20% sample of the original dataset. This method ensures that every record has an equal chance of being selected, helping retain the key characteristics and variability of the full dataset while keeping the size manageable for R Markdown and RPubs.
set.seed(123)
sample_data <- dataset |>
sample_frac(0.2)
We focused on the analysis on data from the year 2024. Below, We filtered the sample accordingly and confirmed that all months are represented.
sample_data <- sample_data |>
filter(year(As_Of_Date) == 2024) |>
mutate(Date = As_Of_Date,
Month = month(Date, label = TRUE, abbr = FALSE))
# Check month coverage
unique_months <- sort(unique(sample_data$Month))
unique_months
## [1] January February March April May June July
## [8] August September October November December
## 12 Levels: January < February < March < April < May < June < ... < December
DT::datatable(head(sample_data), options = list(pageLength = 5))
loan_months <- sample_data |>
group_by(Month) |>
summarise(Total_Loan = sum(Loan_amount, na.rm = TRUE)) |>
mutate(Month = factor(Month, levels = month.name))
ggplot(loan_months, aes(x = Month, y = Total_Loan)) +
geom_col(fill = "skyblue") +
labs(title = "Monthly Loan Amounts in 2024",
x = "Month", y = "Total Loan Amount") +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
Insight:
Loan amounts vary month to month, with May having the highest total loan amount of approximately ₱2.6 billion, and January having the lowest at around ₱1.9 billion. This can help assess seasonal borrowing patterns.
branch_summary <- sample_data |>
group_by(Branch_name) |>
summarise(
Members = n_distinct(Identifier),
Borrowers = sum(Total_borrower, na.rm = TRUE),
Total_Income = sum(Monthly_income, na.rm = TRUE),
Total_Loan = sum(Loan_amount, na.rm = TRUE)
) |>
arrange(desc(Total_Loan))
DT::datatable(branch_summary, options = list(pageLength = 10))
Insight: The branch with the highest loan distribution is Rosario, with a total of ₱191,048,000 in loans. It also has 8,835 borrowers and 9 members, indicating a highly active lending operation. Following closely is San Carlos City, which has ₱173,541,000 in loans and 8,601 borrowers, also with 9 members.
In contrast, Tagum City 2 has the lowest loan distribution among the top 10, at ₱116,255,000, despite having a similar number of borrowers (5,397) and members (5). This suggests a more conservative lending profile or smaller loan sizes per borrower.
This comparison highlights Rosario and San Carlos City as high-performing branches in terms of both outreach and loan volume, while branches like Tagum City 2 may benefit from strategies to boost loan disbursement or expand their membership base.
Based on the initial EDA, the final dashboard could explore:
-How Portfolio at Risk varies across branches and time.
-The relationship between deposit balance and loan amount.
-Seasonal trends in income and borrowing behavior.
-Member-to-borrower conversion rates over time.
This Phase 1 EDA provided an overview of key microfinance indicators using a representative sample of 2024 data. The findings guide the development of a more interactive and detailed dashboard.
saveRDS(sample_data, "sampled_data_2024.rds")