In this research article, I have examined and analyzed the credit bureau and demographic datasets. These datasets have been collected and documented to be used for the purpose of analyzing credit card details of clients. These datasets can be analyzed used predictive modeling based on factors such as credit risk, acquisition and other personal details of clients.
#install.packages('knitr')
#install.packages('dplyr')
#install.packages('ggplot2')
library(dplyr)
## Warning: package 'dplyr' was built under R version 3.6.3
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(knitr)
## Warning: package 'knitr' was built under R version 3.6.3
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.6.1
credit <- read.csv("~/Credit_Bureau.csv")
demo <- read.csv("~/demogs.csv")
merged<-merge(credit,demo,"Application.ID")
duplicatedData <- merged[duplicated(merged$Application.ID)
|duplicated(merged$Application.ID, fromLast = TRUE),]
kable(duplicatedData)
| Application.ID | No.of.times.90.DPD.or.worse.in.last.6.months | No.of.times.60.DPD.or.worse.in.last.6.months | No.of.times.30.DPD.or.worse.in.last.6.months | No.of.times.90.DPD.or.worse.in.last.12.months | No.of.times.60.DPD.or.worse.in.last.12.months | No.of.times.30.DPD.or.worse.in.last.12.months | Avgas.CC.Utilization.in.last.12.months | No.of.trades.opened.in.last.6.months | No.of.trades.opened.in.last.12.months | No.of.PL.trades.opened.in.last.6.months | No.of.PL.trades.opened.in.last.12.months | No.of.Inquiries.in.last.6.months..excluding.home…auto.loans. | No.of.Inquiries.in.last.12.months..excluding.home…auto.loans. | Presence.of.open.home.loan | Outstanding.Balance | Total.No.of.Trades | Presence.of.open.auto.loan | Performance.Tag.x | Age | Gender | Marital.Status..at.the.time.of.application. | No.of.dependents | Income | Education | Profession | Type.of.residence | No.of.months.in.current.residence | No.of.months.in.current.company | Performance.Tag.y | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 46613 | 653287861 | 0 | 0 | 0 | 0 | 0 | 0 | 15 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 10494 | 2 | 0 | 0 | 26 | M | Married | 3 | 25.0 | Bachelor | SAL | Rented | 112 | 58 | 0 |
| 46614 | 653287861 | 0 | 0 | 0 | 0 | 0 | 0 | 15 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 10494 | 2 | 0 | 0 | 40 | M | Married | 5 | 32.0 | Phd | SE | Rented | 45 | 46 | 1 |
| 46615 | 653287861 | 1 | 1 | 1 | 2 | 2 | 2 | 113 | 2 | 5 | 1 | 3 | 1 | 3 | 0 | 628075 | 6 | 0 | 1 | 26 | M | Married | 3 | 25.0 | Bachelor | SAL | Rented | 112 | 58 | 0 |
| 46616 | 653287861 | 1 | 1 | 1 | 2 | 2 | 2 | 113 | 2 | 5 | 1 | 3 | 1 | 3 | 0 | 628075 | 6 | 0 | 1 | 40 | M | Married | 5 | 32.0 | Phd | SE | Rented | 45 | 46 | 1 |
| 47931 | 671989187 | 1 | 2 | 3 | 2 | 2 | 3 | 63 | 2 | 6 | 1 | 4 | 3 | 5 | 0 | 806473 | 7 | 1 | 0 | 27 | M | Married | 2 | 35.0 | Professional | SAL | Living with Parents | 108 | 7 | 0 |
| 47932 | 671989187 | 1 | 2 | 3 | 2 | 2 | 3 | 63 | 2 | 6 | 1 | 4 | 3 | 5 | 0 | 806473 | 7 | 1 | 0 | 57 | M | Married | 4 | 7.0 | Professional | SE | Rented | 42 | 3 | 0 |
| 47933 | 671989187 | 0 | 1 | 2 | 0 | 2 | 3 | 76 | 3 | 7 | 1 | 4 | 2 | 6 | 0 | 822298 | 7 | 0 | 0 | 27 | M | Married | 2 | 35.0 | Professional | SAL | Living with Parents | 108 | 7 | 0 |
| 47934 | 671989187 | 0 | 1 | 2 | 0 | 2 | 3 | 76 | 3 | 7 | 1 | 4 | 2 | 6 | 0 | 822298 | 7 | 0 | 0 | 57 | M | Married | 4 | 7.0 | Professional | SE | Rented | 42 | 3 | 0 |
| 54595 | 765011468 | 0 | 0 | 0 | 0 | 0 | 0 | 11 | 1 | 2 | 0 | 1 | 2 | 2 | 0 | 212656 | 3 | 0 | 0 | 57 | M | Single | 4 | 4.5 | Bachelor | SAL | Company provided | 6 | 56 | 0 |
| 54596 | 765011468 | 0 | 0 | 0 | 0 | 0 | 0 | 11 | 1 | 2 | 0 | 1 | 2 | 2 | 0 | 212656 | 3 | 0 | 0 | 38 | M | Married | 4 | 4.5 | Professional | SAL | Rented | 6 | 72 | 0 |
| 54597 | 765011468 | 0 | 0 | 0 | 0 | 0 | 0 | 11 | 1 | 3 | 0 | 0 | 1 | 3 | 0 | 29817 | 6 | 0 | 0 | 57 | M | Single | 4 | 4.5 | Bachelor | SAL | Company provided | 6 | 56 | 0 |
| 54598 | 765011468 | 0 | 0 | 0 | 0 | 0 | 0 | 11 | 1 | 3 | 0 | 0 | 1 | 3 | 0 | 29817 | 6 | 0 | 0 | 38 | M | Married | 4 | 4.5 | Professional | SAL | Rented | 6 | 72 | 0 |
The two datasets had missing values in their original existence. Some of variables with missing values include:
missingData <- colSums(is.na((merged)[colSums(is.na(merged)) >=50]))
kable(missingData)
| x | |
|---|---|
| Avgas.CC.Utilization.in.last.12.months | 1058 |
| Presence.of.open.home.loan | 272 |
| Outstanding.Balance | 272 |
| Performance.Tag.x | 1425 |
| Performance.Tag.y | 1425 |
#Data Ommit
completedata<-na.omit(merged)
View(completedata)
The boxplot diagram below shows the outstanding balance based on gender. One can use this boxplot to make a comparison of the outstanding balance of the customers according to their gender.
Below is a histogram showing the frequency of the age of the customers. The histogram shows the distribution of customers according to their age.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
After completing this assignment, I was able to learn several things about data exploration and visualization using R studio. The whole exercise was fun as I was able to make several observations as far as the analysis of credit card and demographic data are concerned. The strength of the R programming language has been proven by its ability to handle cases of missing and incomplete data. Additionally, the diagrams obtained from this tool are visually impressive and can be used to make observations from the data. However, there are a few challenges that I experienced while handling this data analysis process. For example, merging the datasets was difficult to join the two datasets into one.