Dataset Description

loan_data <- read.csv("D:/R work folder/R project/Loan_Data.csv")

str(loan_data)
## 'data.frame':    289 obs. of  13 variables:
##  $ Date             : chr  "10/31/2022" "7/27/2022" "5/11/2022" "2/23/2022" ...
##  $ Loan_ID          : chr  "LP001015" "LP001022" "LP001031" "LP001051" ...
##  $ Gender           : chr  "Male" "Male" "Male" "Male" ...
##  $ Married          : chr  "Yes" "Yes" "Yes" "No" ...
##  $ Dependents       : chr  "0" "1" "2" "0" ...
##  $ Education        : chr  "Graduate" "Graduate" "Graduate" "Not Graduate" ...
##  $ Self_Employed    : chr  "No" "No" "No" "No" ...
##  $ ApplicantIncome  : int  5720 3076 5000 3276 2165 2226 3881 2400 3091 4666 ...
##  $ CoapplicantIncome: int  0 1500 1800 0 3422 0 0 2400 0 0 ...
##  $ LoanAmount       : int  110 126 208 78 152 59 147 123 90 124 ...
##  $ Loan_Amount_Term : int  360 360 360 360 360 360 360 360 360 360 ...
##  $ Credit_History   : int  1 1 1 1 1 1 0 1 1 1 ...
##  $ Property_Area    : chr  "Urban" "Urban" "Urban" "Urban" ...
head(loan_data)
##         Date  Loan_ID Gender Married Dependents    Education Self_Employed
## 1 10/31/2022 LP001015   Male     Yes          0     Graduate            No
## 2  7/27/2022 LP001022   Male     Yes          1     Graduate            No
## 3  5/11/2022 LP001031   Male     Yes          2     Graduate            No
## 4  2/23/2022 LP001051   Male      No          0 Not Graduate            No
## 5  4/30/2022 LP001054   Male     Yes          0 Not Graduate           Yes
## 6  6/10/2022 LP001055 Female      No          1 Not Graduate            No
##   ApplicantIncome CoapplicantIncome LoanAmount Loan_Amount_Term Credit_History
## 1            5720                 0        110              360              1
## 2            3076              1500        126              360              1
## 3            5000              1800        208              360              1
## 4            3276                 0         78              360              1
## 5            2165              3422        152              360              1
## 6            2226                 0         59              360              1
##   Property_Area
## 1         Urban
## 2         Urban
## 3         Urban
## 4         Urban
## 5         Urban
## 6     Semiurban

Dataset variables

The dataset contains information about loan applications. It includes 13 columns that provide details about the applicants and their loan requests. The key variables in this dataset are as follows:

  • Date: The date the loan application was made (in MM/DD/YYYY format).
  • Loan_ID: A unique identifier for each loan application.
  • Gender: The gender of the applicant (Male/Female).
  • Married: Marital status of the applicant (Yes/No).
  • Dependents: The number of dependents the applicant has.
  • Education: The education level of the applicant (Graduate/Not Graduate).
  • Self_Employed: Whether the applicant is self-employed (Yes/No).
  • ApplicantIncome: The income of the applicant.
  • CoapplicantIncome: The income of the coapplicant (if any).
  • LoanAmount: The amount of loan requested by the applicant.
  • Loan_Amount_Term: The duration of the loan in months.
  • Credit_History: A binary variable indicating whether the applicant has a credit history (1 for yes, NA for no).
  • Property_Area: The area type of the property (Urban/Semiurban/Rural).

Dataset Overview

The dataset allows to analyze relationships between applicant financial background (income, loan amount, credit history) and the loan terms. The goal of this project is to analyze how applicant characteristics influence their loan eligibility.

Static Visualization

library(ggplot2)
## Warning: package 'ggplot2' was built under R version 4.4.3
ggplot(loan_data, aes(x = ApplicantIncome, y = LoanAmount, color = as.character(Credit_History))) +
  geom_point(alpha = 0.5) + 
  geom_smooth(method = "lm", se = FALSE, color = "red") + 
  labs(title = "Loan Amount vs Applicant Income",
       subtitle = "Colored by Credit History",
       x = "Applicant Income",
       y = "Loan Amount",
       color = "Credit History") +
  theme_minimal()
## `geom_smooth()` using formula = 'y ~ x'

Description of Static Visualization

The scatter plot showcases x- asxis representing the Applicant Income and the y-axis representing the Loan Amount. The point colors differs based on the Credit History of the applicant. Here red color idicates the applicants with credit history (Credit History = 1) and green color is for the applicants with no credit history (Credit History =0 ). The regressio line shows the relationship between Applicant Income and Loan Amount.

Interactive Chart

library(plotly)
## Warning: package 'plotly' was built under R version 4.4.3
## 
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
## 
##     last_plot
## The following object is masked from 'package:stats':
## 
##     filter
## The following object is masked from 'package:graphics':
## 
##     layout
interactive_plot <- ggplot(loan_data, aes(x = ApplicantIncome, y = LoanAmount, color = as.character(Credit_History))) +
  geom_point(alpha = 0.5) + 
  geom_smooth(method = "lm", se = FALSE, color = "red") +
  labs(title = "Loan Amount vs Applicant Income", 
       subtitle = "Colored by Credit History",
       x = "Applicant Income",
       y = "Loan Amount")

ggplotly(interactive_plot)
## `geom_smooth()` using formula = 'y ~ x'

Description of Interactive Chart

The interactive chart showcasing the regression line shows that there is no significant trend between the Applicant Income and Loan Amount.

Animation

The animation shows how the relationship between Applicant Income and Loan Amount changes over time. This is based on the Date column from the dataset, showing how loan application evolves and the date changes.

library(ggplot2)
library(gganimate)
## Warning: package 'gganimate' was built under R version 4.4.3
library(gifski)
## Warning: package 'gifski' was built under R version 4.4.3
library(av)
## Warning: package 'av' was built under R version 4.4.3
loan_data$Date <- as.Date(loan_data$Date, format = "%m/%d/%Y")


animated_plot <- ggplot(loan_data, aes(x = ApplicantIncome, y = LoanAmount, color = as.character(Credit_History), size = LoanAmount)) +
  geom_point(alpha = 0.7) +  
  labs(title = 'Date: {frame_time}', 
       x = "Applicant Income",
       y = "Loan Amount",
       color = "Credit History",
       size = "Loan Amount") +
  transition_time(Date) +  
  ease_aes('linear') +
  scale_size_continuous(range = c(1, 10)) +  
  scale_color_manual(values = c("red", "green"))  


anim_save("loan_animation_fixed.gif", animated_plot, renderer = gifski_renderer())


animate(animated_plot, nframes = 100, fps = 10, renderer = av_renderer())

Description of the Animation

In the animation, the size of the points changes based on the Loan Amount, with larger loans represented by larger points. As the animation moves through the dates, it can be seen how the distribution of Loan Amount and Applicant Income chnages. The color of the points indicates if the applicant has a Credit History with red indicating yes and green indicating no. The animaton shows that applicants with **No Credit History* mostly requested for larger loan amounts though it is not a frequent incident. Also applicants with no credit histories tend to request for loan more than applicants with a credit history. Moreover, during a certain preiod of time the request for loan is very frequent, it can be due to an economic shift and change of demand.