This data set contains information on mortgage approvals. The
variable descriptions are as follows.
Data Dictionary:
Loan_ID: Unique loan ID
Gender:
Male/Female
Married: Yes/No
Dependents: Number of dependents
Education: Graduate/Not Graduate
Self_Employed: Yes/No
ApplicantIncome: Monthly income
CoapplicantIncome: Monthly income
of the coapplicant
LoanAmount: Applied loan amount 1000 dollars
Loan_Amount_Term: Loan term in months
Credit_History: 1 if
the applicant has a credit history, 0 otherwise
Property_Area:
Urban, Semiurban, or Rural
Loan_Approved: 1 = Yes, 0 = No
rm(list=ls())
library(data.table)
library(ggplot2)
library(curl)
## Using libcurl 7.84.0 with Schannel
File: https://raw.githubusercontent.com/dratnadiwakara/fin4820/master/loan_approval_data.csv
## Loan_ID Gender Married Dependents Education Self_Employed
## 1: LP001002 Male No 0 Graduate No
## 2: LP001003 Male Yes 1 Graduate No
## 3: LP001005 Male Yes 0 Graduate Yes
## 4: LP001006 Male Yes 0 Not Graduate No
## 5: LP001008 Male No 0 Graduate No
## 6: LP001011 Male Yes 2 Graduate Yes
## ApplicantIncome CoapplicantIncome LoanAmount Loan_Amount_Term Credit_History
## 1: 5849 0 NA 360 1
## 2: 4583 1508 128 360 1
## 3: 3000 0 66 360 1
## 4: 2583 2358 120 360 1
## 5: 6000 0 141 360 1
## 6: 5417 4196 267 360 1
## Property_Area Loan_Approved
## 1: Urban 1
## 2: Rural 0
## 3: Urban 1
## 4: Urban 1
## 5: Urban 1
## 6: Urban 1
## Loan_Approved N
## 1: 1 422
## 2: 0 192
I have added a line showing the mean loan amount using geom_vline command.
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## Warning: Removed 22 rows containing non-finite values (`stat_bin()`).
## [1] 146.4122
## Gender approval_rate
## 1: Male 0.6932515
## 2: Female 0.6696429
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.