Lending Club connects people who need money (borrowers) with people who have money (investors). But in this case, we want to know more related purposes of the lenders.
library(glue)
library(ggrepel)
## Loading required package: ggplot2
library(ggridges)
library(ggthemes)
library(leaflet)
library(lubridate)
##
## Attaching package: 'lubridate'
## The following objects are masked from 'package:base':
##
## date, intersect, setdiff, union
library(scales)
library(tidyverse)
## -- Attaching packages --------------------------------------- tidyverse 1.3.0 --
## v tibble 3.1.0 v dplyr 1.0.5
## v tidyr 1.1.3 v stringr 1.4.0
## v readr 1.4.0 v forcats 0.5.1
## v purrr 0.3.4
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x lubridate::as.difftime() masks base::as.difftime()
## x readr::col_factor() masks scales::col_factor()
## x dplyr::collapse() masks glue::collapse()
## x lubridate::date() masks base::date()
## x purrr::discard() masks scales::discard()
## x dplyr::filter() masks stats::filter()
## x lubridate::intersect() masks base::intersect()
## x dplyr::lag() masks stats::lag()
## x lubridate::setdiff() masks base::setdiff()
## x lubridate::union() masks base::union()
library(sf)
## Linking to GEOS 3.9.0, GDAL 3.2.1, PROJ 7.2.1
library(rnaturalearth)
library(padr)
library(plotly)
##
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
##
## last_plot
## The following object is masked from 'package:stats':
##
## filter
## The following object is masked from 'package:graphics':
##
## layout
library(magick)
## Linking to ImageMagick 6.9.12.3
## Enabled features: cairo, freetype, fftw, ghostscript, heic, lcms, pango, raw, rsvg, webp
## Disabled features: fontconfig, x11
loan <- read.csv(file = "Clsssifier-which-classified-whether-a-borrower-paid-the-loan-in-full-Lending-Club.com-master/loan_data.csv")
loan
legend:
- credit policy: 1 if the customer meets the credit underwriting criteria of LendingClub.com, and 0 otherwise
- purpose: The purpose of the loan (takes values “credit_card”, “debt_consolidation”, “educational”, “major_purchase”, “small_business”, and “all_other”)
- int.rate: The interest rate of the loan, as a proportion (a rate of 11% would be stored as 0.11). Borrowers judged by LendingClub.com to be more risky are assigned higher interest rates
- installment: The monthly installments owed by the borrower if the loan is funded.
- log.annual.inc : The natural log of the self-reported annual income of the borrower.
- dti : The debt-to-income ratio of the borrower (amount of debt divided by annual income).
- fico : The FICO credit score of the borrower.
- days.with.cr.line : The number of days the borrower has had a credit line.
- revor.bal : The borrower’s revolving balance (amount unpaid at the end of the credit card billing cycle).
- revol.util : The borrower’s revolving line utilization rate (the amount of the credit line used relative to total credit available).
- inq.last.6mths : The borrower’s number of inquiries by creditors in the last 6 months. - delinq.2yrs : The number of times the borrower had been 30+ days past due on a payment in the past 2 years.
- pub.rec : The borrower’s number of derogatory public records (bankruptcy filings, tax liens, or judgments)
- not.fully.paid: 0 unpaid, 1 paid
Clean
No N/AcolSums(is.na(loan))
## credit.policy purpose int.rate installment
## 0 0 0 0
## log.annual.inc dti fico days.with.cr.line
## 0 0 0 0
## revol.bal revol.util inq.last.6mths delinq.2yrs
## 0 0 0 0
## pub.rec not.fully.paid
## 0 0
Insight
: 1. The data is clean, no N/A
str(loan)
## 'data.frame': 9578 obs. of 14 variables:
## $ credit.policy : int 1 1 1 1 1 1 1 1 1 1 ...
## $ purpose : chr "debt_consolidation" "credit_card" "debt_consolidation" "debt_consolidation" ...
## $ int.rate : num 0.119 0.107 0.136 0.101 0.143 ...
## $ installment : num 829 228 367 162 103 ...
## $ log.annual.inc : num 11.4 11.1 10.4 11.4 11.3 ...
## $ dti : num 19.5 14.3 11.6 8.1 15 ...
## $ fico : int 737 707 682 712 667 727 667 722 682 707 ...
## $ days.with.cr.line: num 5640 2760 4710 2700 4066 ...
## $ revol.bal : int 28854 33623 3511 33667 4740 50807 3839 24220 69909 5630 ...
## $ revol.util : num 52.1 76.7 25.6 73.2 39.5 51 76.8 68.6 51.1 23 ...
## $ inq.last.6mths : int 0 0 1 1 0 0 0 0 1 1 ...
## $ delinq.2yrs : int 0 0 0 0 1 0 0 0 0 0 ...
## $ pub.rec : int 0 0 0 0 0 0 1 0 0 0 ...
## $ not.fully.paid : int 0 0 0 0 0 0 1 1 0 0 ...
Insight
: 2. The data has the right category per each column
First, check the fico score of people with different credit policies.People with credit policy 1 meet the credit criteria whereas people with credit score 0 do not meet the criteria
ggplot(loan, aes(x=fico)) +
geom_histogram(data = loan[loan$credit.policy == 0, ], fill= "blue", alpha=0.8, position="identity", bins = 30)+
geom_histogram(data = loan[loan$credit.policy == 1, ], fill= "red", alpha=0.8, position="identity", bins = 30)
Insight
: 3. Based on this visualization, most of the borrower meet the criteria
boxplot
boxplot(loan$fico)
Insight
: 4. Based on this visualization, we got only 2 data has significancy unnormal or outliers, Median around 700.
data_nfp <- loan %>%
group_by(loan$not.fully.paid) %>%
summarise(n = n()) %>%
ungroup()
data_nfp
*not.fully.paid: 0 unpaid, 1 paid
not fully paid
g <- ggplot(data_nfp, aes(x =`loan$not.fully.paid`, y = n)) +
geom_col(aes(fill = n)) +
geom_text(aes(y = n + max(n) * 0.05, label = n))+
scale_y_continuous(expand = expansion(mult = c(0, 0.1))) +
guides(fill = FALSE) +
labs(
title = "Approved and rejected from Lending Club",
subtitle = "Not Fully Paid",
caption = "Source: Felicia Haliman ~ Priyam1464(github)",
x = NULL,
y = NULL
)
g
Insight
: 5. From this we got insight, mostly borrower cant pay their loan, based on the data, we got 8,045 borrower unpaid and 1,533 paid.
purpose
data_p <- loan %>%
group_by(loan$purpose) %>%
summarise(n = n()) %>%
ungroup()
data_p
purpose
p <- ggplot(data_p, aes(x = `loan$purpose`, y = n)) +
geom_col(aes(fill = n)) +
geom_text(aes(y = n + max(n) * 0.05, label = n))+
scale_y_continuous(expand = expansion(mult = c(0, 0.1))) +
guides(fill = FALSE) +
coord_flip() +
labs(
title = "Approved and rejected from Lending Club",
subtitle = "Purpose",
caption = "Source: Felicia Haliman ~ Priyam1464(github)",
x = NULL,
y = NULL
)
p
Insight
: 6. Most the lenders have purpose for debt_consolidation (Based on Economic Times, definition of debt consolidation means combining more than one debt obligation into a new loan with a favourable term structure such as lower interest rate structure,tenure,etc).
log.annual.inc
ggplot(loan, aes(x = log.annual.inc)) +
geom_bar(fill = "Red", colour = "black") +
geom_density(fill = "dark blue", alpha = 0.7, colour = FALSE) +
scale_x_continuous(
expand = expansion(mult = c(0, 0)),
labels = dollar_format(suffix = "K")
) +
scale_y_continuous(expand = expansion(mult = c(0, 0.1))) +
labs(
title = "Approved and rejected from Lending Club",
subtitle = "Annual Income",
caption = "Source: Felicia Haliman ~ Priyam1464(github)",
x = NULL,
y = NULL
)
Insight
: 7. 300 peoples have income $10k - $12K
$10k - $12K
.# This data show the how the underwriting works, most of this data meet the criteria, but most of them cant pay their loan, the purposes is to restructure the loan to get lower interest, get longer tenure.
Source:
Economic Times, 2021, “Definition of Debt Consolidation”, link: https://economictimes.indiatimes.com/definition/debt-consolidation\ Priyam1464,2017,“Clsssifier-which-classified-whether-a-borrower-paid-the-loan-in-full-Lending-Club.com”,link:https://github.com/Priyam1464/Clsssifier-which-classified-whether-a-borrower-paid-the-loan-in-full-Lending-Club.com