Microfinance Exploratory Data Analysis (EDA) Report

1. Setup and Data Loading

# Load required packages

library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(tidyr)
library(ggplot2)
library(plotly)         
## 
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
## 
##     last_plot
## The following object is masked from 'package:stats':
## 
##     filter
## The following object is masked from 'package:graphics':
## 
##     layout
library(DT)
library(flexdashboard) 
library(lubridate)   
## 
## Attaching package: 'lubridate'
## The following objects are masked from 'package:base':
## 
##     date, intersect, setdiff, union
dataset <- readRDS("dataset.rds")

2. Sampling Methodology and Justification

We applied simple random sampling using sample_frac(0.2) to create a 20% sample of the original dataset. This method ensures that every record has an equal chance of being selected, helping retain the key characteristics and variability of the full dataset while keeping the size manageable for R Markdown and RPubs.

set.seed(123)
sample_data <- dataset |>
  sample_frac(0.2)

3. Filtering for 2024 and Ensuring All Months Are Present

We focused on the analysis on data from the year 2024. Below, We filtered the sample accordingly and confirmed that all months are represented.

sample_data <- sample_data |>
  filter(year(As_Of_Date) == 2024) |>
  mutate(Date = As_Of_Date,
         Month = month(Date, label = TRUE, abbr = FALSE))

# Check month coverage
unique_months <- sort(unique(sample_data$Month))
unique_months
##  [1] January   February  March     April     May       June      July     
##  [8] August    September October   November  December 
## 12 Levels: January < February < March < April < May < June < ... < December
DT::datatable(head(sample_data), options = list(pageLength = 5))

4. Initial Business Questions

4.1 What is the total loan amount disbursed each month in 2024?

loan_months <- sample_data |>
  group_by(Month) |>
  summarise(Total_Loan = sum(Loan_amount, na.rm = TRUE)) |>
  mutate(Month = factor(Month, levels = month.name))

ggplot(loan_months, aes(x = Month, y = Total_Loan)) +
  geom_col(fill = "skyblue") +
  labs(title = "Monthly Loan Amounts in 2024",
       x = "Month", y = "Total Loan Amount") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

Insight:

Loan amounts vary month to month, with May having the highest total loan amount of approximately ₱2.6 billion, and January having the lowest at around ₱1.9 billion. This can help assess seasonal borrowing patterns.

4.2 How do branches compare in terms of members, borrowers, income, and loans?

branch_summary <- sample_data |>
  group_by(Branch_name) |>
  summarise(
    Members = n_distinct(Identifier),
    Borrowers = sum(Total_borrower, na.rm = TRUE),
    Total_Income = sum(Monthly_income, na.rm = TRUE),
    Total_Loan = sum(Loan_amount, na.rm = TRUE)
  ) |>
  arrange(desc(Total_Loan))

DT::datatable(branch_summary, options = list(pageLength = 10))

Insight: The branch with the highest loan distribution is Rosario, with a total of ₱191,048,000 in loans. It also has 8,835 borrowers and 9 members, indicating a highly active lending operation. Following closely is San Carlos City, which has ₱173,541,000 in loans and 8,601 borrowers, also with 9 members.

In contrast, Tagum City 2 has the lowest loan distribution among the top 10, at ₱116,255,000, despite having a similar number of borrowers (5,397) and members (5). This suggests a more conservative lending profile or smaller loan sizes per borrower.

This comparison highlights Rosario and San Carlos City as high-performing branches in terms of both outreach and loan volume, while branches like Tagum City 2 may benefit from strategies to boost loan disbursement or expand their membership base.

5. Further Questions for Dashboard Development

Based on the initial EDA, the final dashboard could explore:

-How Portfolio at Risk varies across branches and time.

-The relationship between deposit balance and loan amount.

-Seasonal trends in income and borrowing behavior.

-Member-to-borrower conversion rates over time.

6. Conclusion

This Phase 1 EDA provided an overview of key microfinance indicators using a representative sample of 2024 data. The findings guide the development of a more interactive and detailed dashboard.

7. Appendix: Save Sampled Data

saveRDS(sample_data, "sampled_data_2024.rds")